2026-06-03

17件

← アーカイブ一覧

論文 深掘り Hugging Face 2026-06-02 HF ↑21

World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning

World models and multimodal large language models (MLLMs) provide complementary capabilities for predicting future outcomes from static visual observations. World models can generate concrete visual rollouts of possible futures, while MLLMs can reason abstractly over questions, goals, and rules. How...

#llm#multimodal#benchmark
論文 深掘り Hugging Face 2026-06-01 HF ↑10

Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learni...

#llm#rl
論文 深掘り Hugging Face 2026-06-01 HF ↑28

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

We introduce Humanoid-GPT, a GPT-style Transformer with causal attention trained on a billion-scale motion corpus for whole-body control. Unlike prior shallow MLP trackers constrained by scarce data and an agility-generalization trade-off, Humanoid-GPT is pre-trained on a 2B-frame retargeted corpus ...

論文 Hugging Face 2026-06-01 HF ↑4

Benchmarking Visual State Tracking in Multimodal Video Understanding

Understanding a video requires more than recognizing isolated moments, as humans continuously track entities, states, and events over time. This capacity for visual state tracking is fundamental to video understanding, yet remains underexplored in current evaluations of Multimodal Large Language Mod...

#llm#benchmark#agent#multimodal#coding
企業動向 OpenAI 2026-06-03

OpenAI public policy agenda

OpenAI outlines its public policy agenda for AI, including safety, youth protection, workforce transition, and global standards to ensure AI benefits society....

#alignment
論文 Hugging Face 2026-06-02 HF ↑6

Value-Aware Stochastic KV Cache Eviction for Reasoning Models

Reasoning models improve accuracy through extended chains of thought, but their long outputs create a memory and compute bottleneck. KV cache eviction methods reduce this cost by evicting unimportant key-value pairs from the cache, yet they often yield worse accuracy than selection-based sparse atte...

論文 Hugging Face 2026-06-01 HF ↑2

Bootstrap Your Generator: Unpaired Visual Editing with Flow Matching

Modern generative models possess a deep understanding of visual content, yet training them for image editing typically requires massive datasets of paired examples. This limits scalability, especially for video editing where collecting paired data is prohibitively expensive. We propose Bootstrap You...

#benchmark
論文 Hugging Face 2026-06-01 HF ↑1

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference batches many short requests and resets them, amortizing an attention cache across a crowd. Embodied agents instead run one long, non-resetting episode on bandwidth-limited edge hardware, where high...

#robotics#agent#benchmark
モデル OpenAI 2026-06-03

Introducing new capabilities to GPT-Rosalind

GPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities....