Example A — LLaMA 3 70B (published per-token KV size)
- KV heads: K = 8
- Head dim: H = 128
- Layers: L = 80
KV per token ≈ 160 kB/token under the framing:
2 × 8 × 128 × 80 values → ~160 kB/token
At 32k tokens, KV ≈ ~5.3 GB per sequence.
Rule of thumb:
LLaMA 3-70B KV ≈ 0.00016 GB/token ≈ 0.16 MB/token
Notes: Values above are reproduced as presented in the referenced example and are suitable for back-of-the-envelope planning.