Value-Aware Stochastic KV Cache Eviction for Reasoning Models
Value-Aware Stochastic KV Cache Eviction for Reasoning Models
要約
Reasoning models improve accuracy through extended chains of thought, but their long outputs create a memory and compute bottleneck. KV cache eviction methods reduce this cost by evicting unimportant key-value pairs from the cache, yet they often yield worse accuracy than selection-based sparse atte…