論文 Hugging Face 発表: 2026-06-02 HF ↑40

Audio Interaction Model

Audio Interaction Model

著者: Zhifei Xie, Zihang Liu, Ze An, Xiaobin Hu, Yue Liao ほか6名

要約

Audio is an inherently interactive modality, yet today’s Large Audio Language Models (LALMs) are offline, and streaming audio models each handle only a single task such as streaming ASR or voice chatting. It is time to unify them into one online LALM: a model that, through an always-on perceive-deci…

#speech#benchmark

同じカテゴリの記事