2026.01.22 | LLM变数字特工;视频模型先考后练

2026.01.22 | LLM变数字特工;视频模型先考后练

13分钟 ·
播放数203
·
评论数0

【赞助商】

通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事

传送门 🔗www.xiaoyuzhoufm.com

【目录】

本期的 15 篇论文如下:

00:30 🤖 Agentic Reasoning for Large Language Models(大语言模型的智能体推理)

01:05 🤖 Rethinking Video Generation Model for the Embodied World(为具身世界重新思考视频生成模型)

01:43 🤖 Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance(Paper2Rebuttal:一个用于透明作者回复辅助的多智能体框架)

02:34 📊 MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents(MMDeepResearch-Bench:面向多模态深度研究智能体的基准测试)

03:24 🧠 Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning(思维渲染:将文本链式思维渲染为图像以进行视觉潜在推理)

04:03 📄 Typhoon OCR: Open Vision-Language Model For Thai Document Extraction(台风OCR:面向泰语文档提取的开放视觉语言模型)

04:51 🛡 FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments(FinVault:面向执行环境基准测试的金融智能体安全性评估)

05:41 ⚡ Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition(台风ASR实时系统:面向泰语自动语音识别的FastConformer-Transducer模型)

06:45 🔍 XR: Cross-Modal Agents for Composed Image Retrieval(XR:用于组合图像检索的跨模态智能体)

07:29 🔊 Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis(量化口音语音合成中说话人嵌入与音系规则的交互作用)

08:19 🤖 Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics(Numina-Lean-Agent:一个开放通用的形式数学智能体推理系统)

09:15 🤖 RoboBrain 2.5: Depth in Sight, Time in Mind(RoboBrain 2.5:洞见深度,心系时序)

10:16 🔍 Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models(迷失于提示顺序:揭示语言模型中因果注意力的局限性)

10:59 🧠 AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization(AgentEHR:通过回顾性摘要推进自主临床决策)

11:43 🕳 The Responsibility Vacuum: Organizational Failure in Scaled Agent Systems(责任真空:规模化智能体系统中的组织性失效)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递