2025.05.07 | 多模态思维链提升模型性能；零数据自博弈强化推理能力。 - HuggingFace 每日AI论文速递

本期的 14 篇论文如下：

00:24 🧠 Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning（基于强化微调的统一多模态思维链奖励模型）

01:10 🤖 Absolute Zero: Reinforced Self-play Reasoning with Zero Data（绝对零度：零数据下的强化自博弈推理）

01:52 🤸 FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios（FlexiAct：面向异构场景的灵活动作控制）

02:33 🚀 RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale（RADLADS：大规模线性注意力解码器的快速注意力蒸馏）

03:07 🚀 RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference（RetroInfer：一种用于可扩展长文本LLM推理的向量存储方法）

03:45 👁 Decoding Open-Ended Information Seeking Goals from Eye Movements in Reading（从阅读中的眼动解码开放式信息搜寻目标）

04:30 🗜 An Empirical Study of Qwen3 Quantization（Qwen3量化的实证研究）

05:09 ⚽ Multi-Agent System for Comprehensive Soccer Understanding（用于综合足球理解的多智能体系统）

05:52 🗣 VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model（VITA-Audio：用于高效大型语音-语言模型的快速交错跨模态Token生成）

06:36 🗺 Geospatial Mechanistic Interpretability of Large Language Models（大型语言模型的地理空间机制可解释性）

07:12 🧑 InfoVids: Reimagining the Viewer Experience with Alternative Visualization-Presenter Relationships（InfoVids：通过另类可视化-演示者关系重塑观看者体验）

07:54 🤖 Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering（仅在需要时调用接口：用于问答中大语言模型的自适应调用）

08:32 🥽 HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation（HoloTime：驾驭视频扩散模型生成全景4D场景）

09:18 🤖 Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant（Auto-SLURP：一个用于评估智能个人助理中多智能体框架的基准数据集）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递