2026-06-12

20件

論文深掘り Hugging Face 2026-06-10 HF ↑56

MiniMax Sparse Attention

Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundreds of thousands to millions of tokens, yet the quadratic cost of softmax attention makes this untena...

#multimodal#llm#agent#coding#benchmark

論文深掘り Hugging Face 2026-06-10 HF ↑67

InterleaveThinker: Reinforcing Agentic Interleaved Generation

Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. However, constrained by their architectures, they cannot achieve interleaved generation (text-image sequence), which has crucial applications in visual nar...

#agent#benchmark#rl#multimodal#robotics

論文深掘り Hugging Face 2026-06-10 HF ↑92

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior with changing envir...

#agent#benchmark#llm

論文 Hugging Face 2026-06-10 HF ↑23

HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers

Holistic visual tokenizers are fundamental to unified multimodal models (UMMs) as they map diverse visual inputs into a unified representation space. In this paper, we present HYDRA-X, the first UMM that unifies image and video tokenization within a single Vision Transformer (ViT). Our design is dri...

#multimodal#llm#vision

論文 Hugging Face 2026-06-10 HF ↑61

MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verification, and critique-conditioned proof repair -- using a defense-in-depth generati...

#rl

論文 Hugging Face 2026-06-10 HF ↑71

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attempt to address this by augmenting VLMs with specialist perception modules, yet their effectiveness is ...

#agent#multimodal#benchmark

論文 Hugging Face 2026-06-10 HF ↑22

On Subquadratic Architectures: From Applications to Principles

Transformers dominate modern sequence modeling, but their quadratic attention incurs substantial computational cost. Subquadratic architectures offer a scalable alternative. However, it remains unclear which designs yield the most effective sequence models. We compare three leading approaches: xLSTM...

#llm#benchmark

論文 Hugging Face 2026-06-10 HF ↑3

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Search Agents -- large language models augmented with search tools -- have intensified the need for future-proof evaluation benchmarks. Existing benchmarks such as BrowseComp rely on static knowledge, making them vulnerable to test-set contamination and parametric memorization. Consequently, models ...

#agent#benchmark#llm

論文 Hugging Face 2026-06-10 HF ↑10

VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

We introduce VideoMDM, a diffusion-based framework that trains 3D human motion priors directly from accurate 2D poses extracted from monocular videos, without any 3D ground truth. A pretrained 2D-to-3D lifter provides approximate 3D pose sequences that serve as a noisy teacher: these are diffused, d...

#diffusion#alignment

論文 Hugging Face 2026-06-10 HF ↑15

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that outperform human-designed approaches. As model capabilities cont...

#agent#llm#benchmark

企業動向 OpenAI 2026-06-12

New OpenAI Academy courses for the next era of work

OpenAI introduces three Academy courses that help people build practical AI skills, create repeatable workflows, and apply agents in everyday work....

#agent

企業動向 OpenAI 2026-06-12

How Preply combines AI and human tutors to personalize learning

Preply uses OpenAI to launch AI-generated lesson summaries, providing personalised feedback and language learning exercises....

企業動向 OpenAI 2026-06-11

BBVA puts AI at the core of banking with OpenAI

Learn how BBVA scaled ChatGPT Enterprise to 100,000 employees and partnered with OpenAI to accelerate AI-powered banking transformation worldwide....

論文深掘り arXiv 2026-06-11

Recursive Agent Harnesses

Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code that spawns subagents at scale, most recently in Anthropic's dynamic workflows. We name and study the pattern between the...

#agent#coding#benchmark

論文深掘り arXiv 2026-06-11

AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility

Agent systems are advancing quickly across domains, but their evaluation remains fragmented. Most benchmarks rely on fixed, LLM-centric harnesses that require heavy integration, create test-production mismatch, and limit fair comparison across diverse agent designs. The root problem is the lack of a...

#agent#benchmark#coding#llm

論文深掘り arXiv 2026-06-11

AgentRivet: an automated system for producing Rivet routines from journal publications

Particle physics collider experiments provide Rivet routines as part of the analysis preservation strategy for model-independent measurements. Rivet is a C++ toolkit that allow new theoretical models to be compared to the measurements, thus aiding the development and tuning of Monte Carlo event gene...

#agent#llm

企業動向 OpenAI 2026-06-11

OpenAI to acquire Ona

OpenAI plans to acquire Ona to expand Codex with secure, persistent cloud environments, enabling long-running AI agents across enterprise workflows....

#agent

企業動向 OpenAI 2026-06-11

2026-06-12

MiniMax Sparse Attention

InterleaveThinker: Reinforcing Agentic Interleaved Generation

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers

MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

On Subquadratic Architectures: From Applications to Principles

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

New OpenAI Academy courses for the next era of work

How Preply combines AI and human tutors to personalize learning

BBVA puts AI at the core of banking with OpenAI

Recursive Agent Harnesses

AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility

AgentRivet: an automated system for producing Rivet routines from journal publications

OpenAI to acquire Ona

Supporting Europe’s work in ensuring a trustworthy AI ecosystem

Research into how AI can help users understand skin conditions

A low-carbon computing platform from your retired phones