KV Cache Memory Costs in Long-Context Inference
A back-of-the-envelope model for KV cache memory usage as a function of context length and concurrency, plus a simple “semantic pre-scope” reduction term to estimate VRAM freed by better input structure.