2025.12.29 | 鸟瞰式检索提效小模型；4D扩散一键插入逼真物体 - HuggingFace 每日AI论文速递

本期的 13 篇论文如下：

00:27 🧠 Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding（面向提升长文本理解的思维景观感知检索增强生成）

01:07 🎬 InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion（InsertAnywhere：连接4D场景几何与扩散模型以实现逼真的视频对象插入）

01:46 🤖 MAI-UI Technical Report: Real-World Centric Foundation GUI Agents（MAI-UI技术报告：面向真实世界的通用图形用户界面智能体）

02:22 👁 UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture（UniPercept：迈向跨美学、质量、结构与纹理的统一感知级图像理解）

03:04 🎨 ProEdit: Inversion-based Editing From Prompts Done Right（ProEdit：基于反演的提示编辑的正确方法）

03:58 ⏱ TimeBill: Time-Budgeted Inference for Large Language Models（TimeBill：面向大语言模型的时间预算推理框架）

04:37 🧠 See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning（少看，看对：用于多模态推理的双向感知塑造）

05:16 🌦 Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding（Omni-Weather：用于天气生成与理解的多模态统一基础模型）

05:48 🧠 SVBench: Evaluation of Video Generation Models on Social Reasoning（SVBench：视频生成模型在社会推理能力上的评估）

06:27 🔍 InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search（InSight-o3：赋能多模态基础模型实现广义视觉搜索）

07:15 🎨 SlideTailor: Personalized Presentation Slide Generation for Scientific Papers（SlideTailor：面向科研论文的个性化演示文稿幻灯片生成）

08:11 🤖 SWE-RM: Execution-free Feedback For Software Engineering Agents（SWE-RM：面向软件工程智能体的无执行反馈机制）

08:48 ⚡ A 58-Addition, Rank-23 Scheme for General 3x3 Matrix Multiplication（一种用于通用3x3矩阵乘法的58次加法、秩23方案）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递