2026.06.04 | 全模态统一框架；音频实时主动交互 - HuggingFace 每日AI论文速递

【目录】
本期的 15 篇论文如下：

[00:31] 🌌 Cosmos 3: Omnimodal World Models for Physical AI（宇宙3：面向物理AI的全模态世界模型）
[01:36] 🎧 Audio Interaction Model（音频交互模型）
[02:31] 🔍 Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories（深度研究型智能体错在哪里？智能体轨迹中的跨度级错误定位）
[03:30] 🔍 Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning（在基于评分标准的强化学习中复现、分析与检测奖励作弊行为）
[04:25] 🧭 OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs（OVO-S-Bench：面向多模态大语言模型流式空间智能的分层基准）
[05:27] ⚡ Qwen-Image-Flash: Beyond Objective Design（Qwen-Image-Flash：超越客观设计）
[06:18] 🧠 M$^3$Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks（M$^3$Eval：基于认知视频任务的多模态记忆评估）
[07:13] 🎥 Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation（回声无限：面向实时无限视频生成的可学习演化记忆）
[08:14] 🧠 ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning（思维折叠：通过内省偏好学习折叠推理链）
[09:08] 🧪 Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems（基准测试并不足够：用于生产系统中智能体模型运行时评估的RAMP框架）
[10:15] ⚡ Streaming Communication in Multi-Agent Reasoning（多智能体推理中的流式通信）
[11:08] 🎯 Self-Distilled Policy Gradient（自蒸馏策略梯度）
[12:13] 🧠 MemTrain: Self-Supervised Context Memory Training（MemTrain：自监督上下文记忆训练）
[13:05] 🧩 Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching（通过宽基线匹配激发多模态大语言模型中的复杂空间推理能力）
[14:11] 🤖 MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?（MMG2Skill：智能体能否从野外指南中蒸馏出自我进化的技能？）

【关注我们】
您还可以在以下平台找到我们，获得播客内容以外更多信息
小红书: AI速递

【赞助商】
OpenClaw快报
每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论
传送门 www.xiaoyuzhoufm.com