【赞助商】
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗www.xiaoyuzhoufm.com
【目录】
本期的 15 篇论文如下:
00:29 🔍 MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding(MinerU-Diffusion:将文档OCR重新思考为通过扩散解码的逆向渲染)
01:18 🎮 WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG(WildWorld:面向生成式动作角色扮演游戏的大规模动态世界建模数据集,包含动作与显式状态)
02:10 ⚡ SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning(SpecEyes:通过推测式感知与规划加速智能体多模态大语言模型)
02:59 🎥 PEARL: Personalized Streaming Video Understanding Model(PEARL:个性化流式视频理解模型)
03:46 🔍 DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models(DA-Flow:基于扩散模型的退化感知光流估计)
04:30 📊 From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents(从静态模板到动态运行时图:LLM智能体工作流优化综述)
05:13 🤖 SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM(SIMART:通过大语言模型将整体网格分解为仿真就绪的关节化资产)
05:52 🧠 UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation(UniGRPO:面向推理驱动视觉生成的统一策略优化)
06:45 🎬 RealMaster: Lifting Rendered Scenes into Photorealistic Video(RealMaster:将渲染场景提升为逼真视频)
07:32 🤖 2Xplat: Two Experts Are Better Than One Generalist(2Xplat:两个专家胜过一个通才)
08:15 🔍 Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought(重新思考多模态思维链的令牌级策略优化)
09:03 👁 Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing(先注视后注意:通过自回归凝视实现高效可扩展的视频理解)
09:57 🎯 VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models(VP-VLA:视觉提示作为视觉-语言-动作模型的接口)
10:48 🧠 ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model(ThinkJEPA:赋能潜在世界模型的大型视觉-语言推理模型)
11:40 🤖 AgentSLR: Automating Systematic Literature Reviews in Epidemiology with Agentic AI(AgentSLR:基于智能体人工智能的流行病学系统文献综述自动化)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
