2026-06-10

16件

← アーカイブ一覧

論文 深掘り Hugging Face 2026-06-09 HF ↑73

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Although Large Language Model (LLM) agents have demonstrated strong performance on complex tasks, their learning is often limited by inefficient interaction feedback and static training environments, which hinder broader generalization. To address these limitations, this paper introduces Role-Agent,...

#agent#llm#alignment#benchmark
論文 深掘り Hugging Face 2026-06-08 HF ↑165

Kwai Keye-VL-2.0 Technical Report

We introduce Kwai Keye-VL-2.0-30B-A3B, an open-source Mixture-of-Experts (MoE) multimodal foundation model designed to advance long-video understanding and agentic intelligence. To address the challenges of ultra-long contexts, information redundancy, and prohibitive computational costs inherent in ...

#multimodal#agent#alignment#benchmark
論文 深掘り Hugging Face 2026-06-09 HF ↑16

EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

In this paper, we propose EEVEE, the first multi-dataset test-time prompt learning framework for LLM agents, enabling test-time prompt learning under real-world task streams. Existing methods are largely designed for single-dataset settings, while real-world applications require models to handle het...

#agent#benchmark#llm
論文 深掘り Hugging Face 2026-06-09 HF ↑13

One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA

External memory effectively grounds large language models (LLMs) and vision-language models (VLMs)-based question answering (QA) in relevant multimodal evidence. However, existing memory paradigms represent each memory item in raw text and image forms, so retrieval-based systems must pass the retrie...

#llm#multimodal#benchmark#rag
論文 Hugging Face 2026-06-08 HF ↑26

WorldOlympiad: Can Your World Model Survive a Triathlon?

We introduce WorldOlympiad, a benchmark for diagnosing video-based world models across physical faithfulness, geometric consistency, and interaction fidelity. While existing benchmarks often focus on visual quality, semantic alignment, or short-term temporal coherence, they provide limited insight i...

#alignment#benchmark#llm#robotics
論文 Hugging Face 2026-06-08 HF ↑28

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

Data tells stories that shape society; the data journalist's job is to turn raw information into stories non-experts can trust. A high-quality news feature takes a newsroom team weeks: hunting for context, running statistics, choosing an angle, and designing visuals. Recent agents handle individual ...

#agent#multimodal#coding#benchmark
論文 Hugging Face 2026-06-09 HF ↑27

Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Synchronization

Diffusion-based lip synchronization models achieve strong visual quality and audio-visual alignment, but full-sequence bidirectional attention and many denoising steps make them impractical for real-time inference. We present Lip Forcing, to our knowledge the first autoregressive diffusion method fo...

#diffusion#alignment
論文 Hugging Face 2026-06-08 HF ↑29

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

Recent work has demonstrated that online reinforcement learning (RL) can substantially improve the quality and alignment of flow matching models for image and video generation. Methods such as Flow-GRPO and CPS cast the denoising process as a Markov Decision Process and apply PPO-style ratio clippin...

#rl#alignment
論文 Hugging Face 2026-06-09 HF ↑1

When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models

Failures in multi-turn reasoning models are largely invisible to terminal-score evaluation. A model can lock onto an unsafe stance early in a long dialogue, yet its final-turn refusal rate may appear indistinguishable from a robustly aligned baseline. To expose these hidden temporal dynamics, we pro...

#alignment#benchmark
企業動向 NVIDIA 2026-06-10

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

Today, Google DeepMind released DiffusionGemma — an experimental open model built for exceptionally fast text generation. NVIDIA has optimized DiffusionGemma to run even faster across NVIDIA GeForce RTX GPUs, the NVIDIA RTX PRO platform and NVIDIA DGX Spark systems, from local PCs to the cloud. Rath...

#diffusion