2026-06-10

16件

論文深掘り Hugging Face 2026-06-09 HF ↑73

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Although Large Language Model (LLM) agents have demonstrated strong performance on complex tasks, their learning is often limited by inefficient interaction feedback and static training environments, which hinder broader generalization. To address these limitations, this paper introduces Role-Agent,...

#agent#llm#alignment#benchmark

論文深掘り Hugging Face 2026-06-08 HF ↑165

Kwai Keye-VL-2.0 Technical Report

We introduce Kwai Keye-VL-2.0-30B-A3B, an open-source Mixture-of-Experts (MoE) multimodal foundation model designed to advance long-video understanding and agentic intelligence. To address the challenges of ultra-long contexts, information redundancy, and prohibitive computational costs inherent in ...

#multimodal#agent#alignment#benchmark

論文深掘り Hugging Face 2026-06-09 HF ↑16

EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

In this paper, we propose EEVEE, the first multi-dataset test-time prompt learning framework for LLM agents, enabling test-time prompt learning under real-world task streams. Existing methods are largely designed for single-dataset settings, while real-world applications require models to handle het...

#agent#benchmark#llm

論文深掘り Hugging Face 2026-06-09 HF ↑13

One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA

External memory effectively grounds large language models (LLMs) and vision-language models (VLMs)-based question answering (QA) in relevant multimodal evidence. However, existing memory paradigms represent each memory item in raw text and image forms, so retrieval-based systems must pass the retrie...

#llm#multimodal#benchmark#rag

論文 Hugging Face 2026-06-08 HF ↑12

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including HypeNet and Jet-Nemotron, retrieval performance on Needle-In-A-Haystac...

#fine-tuning#benchmark#llm

論文 Hugging Face 2026-06-08 HF ↑26

WorldOlympiad: Can Your World Model Survive a Triathlon?

We introduce WorldOlympiad, a benchmark for diagnosing video-based world models across physical faithfulness, geometric consistency, and interaction fidelity. While existing benchmarks often focus on visual quality, semantic alignment, or short-term temporal coherence, they provide limited insight i...

#alignment#benchmark#llm#robotics

論文 Hugging Face 2026-06-08 HF ↑28

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

Data tells stories that shape society; the data journalist's job is to turn raw information into stories non-experts can trust. A high-quality news feature takes a newsroom team weeks: hunting for context, running statistics, choosing an angle, and designing visuals. Recent agents handle individual ...

#agent#multimodal#coding#benchmark

論文 Hugging Face 2026-06-09 HF ↑27

Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Synchronization

Diffusion-based lip synchronization models achieve strong visual quality and audio-visual alignment, but full-sequence bidirectional attention and many denoising steps make them impractical for real-time inference. We present Lip Forcing, to our knowledge the first autoregressive diffusion method fo...

#diffusion#alignment

論文 Hugging Face 2026-06-09 HF ↑6

How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs

Token-level credit assignment remains a key obstacle for reinforcement learning (RL) in large language models (LLMs), where RL recipes typically treat all tokens equally, failing to distinguish decisive reasoning steps from routine formatting or fluent filler. Recent attempts leverage model-internal...

#llm#rl

論文 Hugging Face 2026-06-08 HF ↑29

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

Recent work has demonstrated that online reinforcement learning (RL) can substantially improve the quality and alignment of flow matching models for image and video generation. Methods such as Flow-GRPO and CPS cast the denoising process as a Markov Decision Process and apply PPO-style ratio clippin...

#rl#alignment

論文 Hugging Face 2026-06-08 HF ↑13

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

Recent years have witnessed the rapid evolution of AI agents toward handling increasingly complex, real-world tasks. However, existing benchmarks rarely evaluate whether agents can operate graphical user interfaces to complete long-horizon, high-value professional workflows across diverse domains. C...

#agent#benchmark

企業動向 OpenAI 2026-06-10

Access OpenAI models and Codex through your Oracle cloud commitment

Access OpenAI models and Codex through Oracle Cloud, using existing commitments to build and deploy AI with enterprise security and governance....

企業動向 OpenAI 2026-06-10

PRC-linked influence operations are targeting AI debates in the US

A new report from OpenAI details PRC-linked influence operations using AI to target U.S. tech debates, data center narratives, tariffs, and false claims about ChatGPT....

企業動向 OpenAI 2026-06-10

From data to decisions: how LSEG is scaling trusted AI

See how LSEG uses OpenAI to scale trusted AI across its global business, accelerating insights, shrinking release cycles, and empowering 4,000 employees....

論文 Hugging Face 2026-06-09 HF ↑1

When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models

Failures in multi-turn reasoning models are largely invisible to terminal-score evaluation. A model can lock onto an unsafe stance early in a long dialogue, yet its final-turn refusal rate may appear indistinguishable from a robustly aligned baseline. To expose these hidden temporal dynamics, we pro...

#alignment#benchmark

企業動向 NVIDIA 2026-06-10

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

Today, Google DeepMind released DiffusionGemma — an experimental open model built for exceptionally fast text generation. NVIDIA has optimized DiffusionGemma to run even faster across NVIDIA GeForce RTX GPUs, the NVIDIA RTX PRO platform and NVIDIA DGX Spark systems, from local PCs to the cloud. Rath...

#diffusion