論文 Hugging Face 発表: 2026-06-07 HF ↑6

End-to-End Context Compression at Scale

著者: Ang Li, Sean McLeish, Haozhe Chen, Nimit Kalra, Zaiqian Chen ほか10名

要約

Long-context language model inference is bottlenecked by memory, as the KV cache grows with context length. Recent techniques to compress the KV cache fall short: they either degrade model quality substantially or require considerable time and compute to compress a single long prompt. Furthermore, m…

#agent

End-to-End Context Compression at Scale

要約

同じカテゴリの記事

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

World-R1: テキストから動画生成における3D制約の強化学習による整合