2026.03.18 | 验证求精代理破局；工业代码模型一次过 - HuggingFace 每日AI论文速递

【赞助商】

通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事

传送门 🔗www.xiaoyuzhoufm.com

【目录】

本期的 15 篇论文如下：

00:29 🤖 MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification（MiroThinker-1.7与H1：通过验证迈向重型研究智能体）

01:10 🏭 InCoder-32B: Code Foundation Model for Industrial Scenarios（InCoder-32B：面向工业场景的代码基础模型）

02:08 🧠 Qianfan-OCR: A Unified End-to-End Model for Document Intelligence（千帆OCR：一个用于文档智能的统一端到端模型）

02:50 🤖 Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation（Kinema4D：面向时空具身仿真的运动学4D世界建模）

03:28 🧠 Demystifing Video Reasoning（揭秘视频推理机制）

04:26 🎮 WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation（WorldCam：以相机位姿为统一几何表示的交互式自回归3D游戏世界）

05:26 🧠 TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas（TRUST-SQL：面向未知模式的文本到SQL工具集成多轮强化学习）

06:12 🤔 Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding（在不确定性中思考：通过潜在熵感知解码缓解多模态大推理模型的幻觉问题）

07:02 🔄 Online Experiential Learning for Language Models（语言模型的在线体验式学习）

07:54 📊 FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use（FinToolBench：评估面向现实世界金融工具使用的大语言模型智能体）

08:47 🚀 Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training（重新思考统一多模态模型视觉生成：基于掩码建模的高效纯图像预训练）

09:30 🧭 WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation（WiT：基于轨迹冲突导航的路径点扩散Transformer）

10:20 🔍 AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents（AgentProcessBench：诊断工具使用智能体的步骤级过程质量）

11:03 🎨 SegviGen: Repurposing 3D Generative Model for Part Segmentation（SegviGen：重新利用3D生成模型进行部件分割）

11:59 🗣 SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models（SocialOmni：全模态模型中视听社交交互能力的基准测试）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递