Transformer FLOPs Calculator

Results

Total FLOPs:349.63B FLOPs

Per Layer Breakdown:

ComponentPer LayerAll Layers
Q, K, V Projections:3.62B (1.0%)43.49B (12.4%)
Q×K Attention:1.61B (0.5%)19.33B (5.5%)
Attention×V:1.61B (0.5%)19.33B (5.5%)
Output Projection:1.21B (0.3%)14.50B (4.1%)
Feed-Forward Network:14.50B (4.1%)173.95B (49.8%)
Total per Block:22.55B
All Blocks (12 layers):270.58B (77.4%)
LM Head:79.05B (22.6%)

Component Summary:

Attention (MHA):27.6%
Feed-Forward (FFN):49.8%
Language Model Head:22.6%