2026.03.25 | 扩散OCR并行降噪;WildWorld动作数据集测AI一致性

2026.03.25 | 扩散OCR并行降噪;WildWorld动作数据集测AI一致性

13分钟 ·
播放数110
·
评论数0

【赞助商】

通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事

传送门 🔗www.xiaoyuzhoufm.com

【目录】

本期的 15 篇论文如下:

00:29 🔍 MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding(MinerU-Diffusion:将文档OCR重新思考为通过扩散解码的逆向渲染)

01:18 🎮 WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG(WildWorld:面向生成式动作角色扮演游戏的大规模动态世界建模数据集,包含动作与显式状态)

02:10 ⚡ SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning(SpecEyes:通过推测式感知与规划加速智能体多模态大语言模型)

02:59 🎥 PEARL: Personalized Streaming Video Understanding Model(PEARL:个性化流式视频理解模型)

03:46 🔍 DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models(DA-Flow:基于扩散模型的退化感知光流估计)

04:30 📊 From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents(从静态模板到动态运行时图:LLM智能体工作流优化综述)

05:13 🤖 SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM(SIMART:通过大语言模型将整体网格分解为仿真就绪的关节化资产)

05:52 🧠 UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation(UniGRPO:面向推理驱动视觉生成的统一策略优化)

06:45 🎬 RealMaster: Lifting Rendered Scenes into Photorealistic Video(RealMaster:将渲染场景提升为逼真视频)

07:32 🤖 2Xplat: Two Experts Are Better Than One Generalist(2Xplat:两个专家胜过一个通才)

08:15 🔍 Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought(重新思考多模态思维链的令牌级策略优化)

09:03 👁 Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing(先注视后注意:通过自回归凝视实现高效可扩展的视频理解)

09:57 🎯 VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models(VP-VLA:视觉提示作为视觉-语言-动作模型的接口)

10:48 🧠 ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model(ThinkJEPA:赋能潜在世界模型的大型视觉-语言推理模型)

11:40 🤖 AgentSLR: Automating Systematic Literature Reviews in Epidemiology with Agentic AI(AgentSLR:基于智能体人工智能的流行病学系统文献综述自动化)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递