論文 arXiv 発表: 2026-06-04

Benchmark Everything Everywhere All at Once

Benchmark Everything Everywhere All at Once

著者: Shiyun Xiong, Dongming Wu, Peiwen Sun, Yuang Ai, Bokang Yang ほか3名

要約

Benchmarks are fundamental for evaluating and advancing LLMs and MLLMs by providing standardized and explicit measures of performance. However, their construction is labor-intensive and hard to reuse, raising concerns about sustainability and scalability. Moreover, existing benchmarks often quickly …

#benchmark#agent#llm#multimodal

同じカテゴリの記事