【赞助商】
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗www.xiaoyuzhoufm.com
【目录】
本期的 15 篇论文如下:
00:29 🤖 MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification(MiroThinker-1.7与H1:通过验证迈向重型研究智能体)
01:10 🏭 InCoder-32B: Code Foundation Model for Industrial Scenarios(InCoder-32B:面向工业场景的代码基础模型)
02:08 🧠 Qianfan-OCR: A Unified End-to-End Model for Document Intelligence(千帆OCR:一个用于文档智能的统一端到端模型)
02:50 🤖 Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation(Kinema4D:面向时空具身仿真的运动学4D世界建模)
03:28 🧠 Demystifing Video Reasoning(揭秘视频推理机制)
04:26 🎮 WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation(WorldCam:以相机位姿为统一几何表示的交互式自回归3D游戏世界)
05:26 🧠 TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas(TRUST-SQL:面向未知模式的文本到SQL工具集成多轮强化学习)
06:12 🤔 Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding(在不确定性中思考:通过潜在熵感知解码缓解多模态大推理模型的幻觉问题)
07:02 🔄 Online Experiential Learning for Language Models(语言模型的在线体验式学习)
07:54 📊 FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use(FinToolBench:评估面向现实世界金融工具使用的大语言模型智能体)
08:47 🚀 Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training(重新思考统一多模态模型视觉生成:基于掩码建模的高效纯图像预训练)
09:30 🧭 WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation(WiT:基于轨迹冲突导航的路径点扩散Transformer)
10:20 🔍 AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents(AgentProcessBench:诊断工具使用智能体的步骤级过程质量)
11:03 🎨 SegviGen: Repurposing 3D Generative Model for Part Segmentation(SegviGen:重新利用3D生成模型进行部件分割)
11:59 🗣 SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models(SocialOmni:全模态模型中视听社交交互能力的基准测试)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
