2026.03.25 | 扩散OCR并行降噪；WildWorld动作数据集测AI一致性 - HuggingFace 每日AI论文速递

【赞助商】

通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事

【目录】

本期的 15 篇论文如下：

00:29 🔍 MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding（MinerU-Diffusion：将文档OCR重新思考为通过扩散解码的逆向渲染）

01:18 🎮 WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG（WildWorld：面向生成式动作角色扮演游戏的大规模动态世界建模数据集，包含动作与显式状态）

02:10 ⚡ SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning（SpecEyes：通过推测式感知与规划加速智能体多模态大语言模型）

02:59 🎥 PEARL: Personalized Streaming Video Understanding Model（PEARL：个性化流式视频理解模型）

03:46 🔍 DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models（DA-Flow：基于扩散模型的退化感知光流估计）

04:30 📊 From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents（从静态模板到动态运行时图：LLM智能体工作流优化综述）

05:13 🤖 SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM（SIMART：通过大语言模型将整体网格分解为仿真就绪的关节化资产）

05:52 🧠 UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation（UniGRPO：面向推理驱动视觉生成的统一策略优化）

06:45 🎬 RealMaster: Lifting Rendered Scenes into Photorealistic Video（RealMaster：将渲染场景提升为逼真视频）

07:32 🤖 2Xplat: Two Experts Are Better Than One Generalist（2Xplat：两个专家胜过一个通才）

08:15 🔍 Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought（重新思考多模态思维链的令牌级策略优化）

09:03 👁 Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing（先注视后注意：通过自回归凝视实现高效可扩展的视频理解）

09:57 🎯 VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models（VP-VLA：视觉提示作为视觉-语言-动作模型的接口）

10:48 🧠 ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model（ThinkJEPA：赋能潜在世界模型的大型视觉-语言推理模型）

11:40 🤖 AgentSLR: Automating Systematic Literature Reviews in Epidemiology with Agentic AI（AgentSLR：基于智能体人工智能的流行病学系统文献综述自动化）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递