論文 Hugging Face 発表: 2026-06-02 HF ↑40

Audio Interaction Model

著者: Zhifei Xie, Zihang Liu, Ze An, Xiaobin Hu, Yue Liao ほか6名

要約

Audio is an inherently interactive modality, yet today’s Large Audio Language Models (LALMs) are offline, and streaming audio models each handle only a single task such as streaming ASR or voice chatting. It is time to unify them into one online LALM: a model that, through an always-on perceive-deci…

#speech#benchmark

Audio Interaction Model

要約

同じカテゴリの記事

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

World-R1: テキストから動画生成における3D制約の強化学習による整合