2026-06-12

20件

← アーカイブ一覧

論文 深掘り Hugging Face 2026-06-10 HF ↑56

MiniMax Sparse Attention

Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundreds of thousands to millions of tokens, yet the quadratic cost of softmax attention makes this untena...

#multimodal#llm#agent#coding#benchmark
論文 深掘り Hugging Face 2026-06-10 HF ↑67

InterleaveThinker: Reinforcing Agentic Interleaved Generation

Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. However, constrained by their architectures, they cannot achieve interleaved generation (text-image sequence), which has crucial applications in visual nar...

#agent#benchmark#rl#multimodal#robotics
論文 深掘り Hugging Face 2026-06-10 HF ↑92

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior with changing envir...

#agent#benchmark#llm
論文 Hugging Face 2026-06-10 HF ↑23

HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers

Holistic visual tokenizers are fundamental to unified multimodal models (UMMs) as they map diverse visual inputs into a unified representation space. In this paper, we present HYDRA-X, the first UMM that unifies image and video tokenization within a single Vision Transformer (ViT). Our design is dri...

#multimodal#llm#vision
論文 Hugging Face 2026-06-10 HF ↑71

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attempt to address this by augmenting VLMs with specialist perception modules, yet their effectiveness is ...

#agent#multimodal#benchmark
論文 Hugging Face 2026-06-10 HF ↑22

On Subquadratic Architectures: From Applications to Principles

Transformers dominate modern sequence modeling, but their quadratic attention incurs substantial computational cost. Subquadratic architectures offer a scalable alternative. However, it remains unclear which designs yield the most effective sequence models. We compare three leading approaches: xLSTM...

#llm#benchmark
論文 Hugging Face 2026-06-10 HF ↑3

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Search Agents -- large language models augmented with search tools -- have intensified the need for future-proof evaluation benchmarks. Existing benchmarks such as BrowseComp rely on static knowledge, making them vulnerable to test-set contamination and parametric memorization. Consequently, models ...

#agent#benchmark#llm
論文 Hugging Face 2026-06-10 HF ↑10

VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

We introduce VideoMDM, a diffusion-based framework that trains 3D human motion priors directly from accurate 2D poses extracted from monocular videos, without any 3D ground truth. A pretrained 2D-to-3D lifter provides approximate 3D pose sequences that serve as a noisy teacher: these are diffused, d...

#diffusion#alignment
論文 深掘り arXiv 2026-06-11

Recursive Agent Harnesses

Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code that spawns subagents at scale, most recently in Anthropic's dynamic workflows. We name and study the pattern between the...

#agent#coding#benchmark
論文 深掘り arXiv 2026-06-11

AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility

Agent systems are advancing quickly across domains, but their evaluation remains fragmented. Most benchmarks rely on fixed, LLM-centric harnesses that require heavy integration, create test-production mismatch, and limit fair comparison across diverse agent designs. The root problem is the lack of a...

#agent#benchmark#coding#llm
論文 深掘り arXiv 2026-06-11

AgentRivet: an automated system for producing Rivet routines from journal publications

Particle physics collider experiments provide Rivet routines as part of the analysis preservation strategy for model-independent measurements. Rivet is a C++ toolkit that allow new theoretical models to be compared to the measurements, thus aiding the development and tuning of Monte Carlo event gene...

#agent#llm
企業動向 OpenAI 2026-06-11

OpenAI to acquire Ona

OpenAI plans to acquire Ona to expand Codex with secure, persistent cloud environments, enabling long-running AI agents across enterprise workflows....

#agent