2026.01.22 | LLM变数字特工；视频模型先考后练 - HuggingFace 每日AI论文速递

【赞助商】

通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事

【目录】

本期的 15 篇论文如下：

00:30 🤖 Agentic Reasoning for Large Language Models（大语言模型的智能体推理）

01:05 🤖 Rethinking Video Generation Model for the Embodied World（为具身世界重新思考视频生成模型）

01:43 🤖 Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance（Paper2Rebuttal：一个用于透明作者回复辅助的多智能体框架）

02:34 📊 MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents（MMDeepResearch-Bench：面向多模态深度研究智能体的基准测试）

03:24 🧠 Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning（思维渲染：将文本链式思维渲染为图像以进行视觉潜在推理）

04:03 📄 Typhoon OCR: Open Vision-Language Model For Thai Document Extraction（台风OCR：面向泰语文档提取的开放视觉语言模型）

04:51 🛡 FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments（FinVault：面向执行环境基准测试的金融智能体安全性评估）

05:41 ⚡ Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition（台风ASR实时系统：面向泰语自动语音识别的FastConformer-Transducer模型）

06:45 🔍 XR: Cross-Modal Agents for Composed Image Retrieval（XR：用于组合图像检索的跨模态智能体）

07:29 🔊 Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis（量化口音语音合成中说话人嵌入与音系规则的交互作用）

08:19 🤖 Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics（Numina-Lean-Agent：一个开放通用的形式数学智能体推理系统）

09:15 🤖 RoboBrain 2.5: Depth in Sight, Time in Mind（RoboBrain 2.5：洞见深度，心系时序）

10:16 🔍 Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models（迷失于提示顺序：揭示语言模型中因果注意力的局限性）

10:59 🧠 AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization（AgentEHR：通过回顾性摘要推进自主临床决策）

11:43 🕳 The Responsibility Vacuum: Organizational Failure in Scaled Agent Systems（责任真空：规模化智能体系统中的组织性失效）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递