2025.12.29 | 鸟瞰式检索提效小模型;4D扩散一键插入逼真物体

2025.12.29 | 鸟瞰式检索提效小模型;4D扩散一键插入逼真物体

10分钟 ·
播放数95
·
评论数0

本期的 13 篇论文如下:

00:27 🧠 Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding(面向提升长文本理解的思维景观感知检索增强生成)

01:07 🎬 InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion(InsertAnywhere:连接4D场景几何与扩散模型以实现逼真的视频对象插入)

01:46 🤖 MAI-UI Technical Report: Real-World Centric Foundation GUI Agents(MAI-UI技术报告:面向真实世界的通用图形用户界面智能体)

02:22 👁 UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture(UniPercept:迈向跨美学、质量、结构与纹理的统一感知级图像理解)

03:04 🎨 ProEdit: Inversion-based Editing From Prompts Done Right(ProEdit:基于反演的提示编辑的正确方法)

03:58 ⏱ TimeBill: Time-Budgeted Inference for Large Language Models(TimeBill:面向大语言模型的时间预算推理框架)

04:37 🧠 See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning(少看,看对:用于多模态推理的双向感知塑造)

05:16 🌦 Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding(Omni-Weather:用于天气生成与理解的多模态统一基础模型)

05:48 🧠 SVBench: Evaluation of Video Generation Models on Social Reasoning(SVBench:视频生成模型在社会推理能力上的评估)

06:27 🔍 InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search(InSight-o3:赋能多模态基础模型实现广义视觉搜索)

07:15 🎨 SlideTailor: Personalized Presentation Slide Generation for Scientific Papers(SlideTailor:面向科研论文的个性化演示文稿幻灯片生成)

08:11 🤖 SWE-RM: Execution-free Feedback For Software Engineering Agents(SWE-RM:面向软件工程智能体的无执行反馈机制)

08:48 ⚡ A 58-Addition, Rank-23 Scheme for General 3x3 Matrix Multiplication(一种用于通用3x3矩阵乘法的58次加法、秩23方案)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递