2026-06-05

14件

← アーカイブ一覧

論文 arXiv 2026-06-04

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE agents suffer from inter-branch information isolation, memoryless searc...

#agent#llm#coding#benchmark
論文 arXiv 2026-06-04

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

Long-context inference in modern LLMs is increasingly constrained by decoding efficiency, especially in reasoning-heavy settings where models generate long intermediate chains of thought. Existing sparse attention methods often face a practical efficiency-quality trade-off. Structured block sparse m...

#coding#llm#benchmark
論文 arXiv 2026-06-04

Benchmark Everything Everywhere All at Once

Benchmarks are fundamental for evaluating and advancing LLMs and MLLMs by providing standardized and explicit measures of performance. However, their construction is labor-intensive and hard to reuse, raising concerns about sustainability and scalability. Moreover, existing benchmarks often quickly ...

#benchmark#agent#llm#multimodal
論文 arXiv 2026-06-04

Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents

Sparse attention is becoming increasingly important for serving large language models (LLMs) as generation lengths continue to grow. However, deploying and evaluating new sparse attention algorithms at scale remains highly engineering-intensive, slowing both human researchers and AI agents in explor...

#agent#llm#benchmark
論文 深掘り arXiv 2026-06-04

Improving Answer Extraction in Context-based Question Answering Systems Using LLMs

Question answering (QA) systems have achieved notable progress with the advent of large language models (LLMs). However, they still face challenges in accurately extracting and generating precise answers from given contexts, particularly when dealing with complex or ambiguous queries. Existing appro...

#llm#fine-tuning#benchmark
論文 arXiv 2026-06-04

USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding

Audio encoders are critical to modern audio applications as large language models (LLMs) increasingly rely on a single encoder for diverse inputs. While self-supervised learning (SSL) has yielded strong domain-specific encoders like speech or music experts, multi-domain approaches like USAD and SPEA...

#llm#benchmark#speech
企業動向 NVIDIA 2026-06-05

Seoul Purpose: How NVIDIA and South Korea Are Building the Future of AI

Home to cutting-edge sovereign AI infrastructure and robotics innovators, as well as one of the world’s most passionate gaming communities, South Korea is one of the world’s centers of AI. NVIDIA founder and CEO Jensen Huang is in Seoul this week to meet the partners and builders behind that work. S...

#robotics
論文 arXiv 2026-06-04

RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

Recent advancements in reasoning language models have been driven by Reinforcement Learning (RL) fine-tuning. Most often, these rely on the Group Relative Policy Optimization (GRPO) algorithm or modifications thereof to steer the models to produce Chain-of-Thought (CoT) traces. The final answer can ...

#rl#fine-tuning
論文 arXiv 2026-06-04

Self-Augmenting Retrieval for Diffusion Language Models

Discrete diffusion language models generate text by iteratively denoising an entire response in parallel. At each step, they predict tentative tokens for every masked position, committing the confident predictions to the output and discarding the unconfident ones. We show that the discarded tokens a...

#diffusion#rag#benchmark