2026.03.20 | 生成模型解锁3D空间理解;SAMA零试指令编辑追平Kling

2026.03.20 | 生成模型解锁3D空间理解;SAMA零试指令编辑追平Kling

13分钟 ·
播放数115
·
评论数0

【赞助商】

通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事

传送门 🔗www.xiaoyuzhoufm.com

【目录】

本期的 15 篇论文如下:

00:29 🧠 Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding(生成模型懂空间:释放隐式3D先验用于场景理解)

01:09 🎬 SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing(SAMA:基于分解式语义锚定与运动对齐的指令引导视频编辑)

01:45 ⚡ FASTER: Rethinking Real-Time Flow VLAs(FASTER:重新思考实时流视觉语言动作模型)

02:30 🎬 3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model(3DreamBooth:高保真三维主体驱动视频生成模型)

03:31 🤖 Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer(基于扩散的离散运动分词器:连接语义与运动学条件)

04:21 🤖 MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction(MonoArt:基于渐进式结构推理的单目铰接三维重建)

05:13 🧩 Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens(立方离散扩散:基于高维表示令牌的离散视觉生成)

05:47 📊 LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs(LVOmniBench:面向全模态大语言模型的长音频视频理解评估新基准)

06:42 🧠 Memento-Skills: Let Agents Design Agents(Memento-Skills:让智能体设计智能体)

07:18 🌍 F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World(F2LLM-v2:面向多语言世界的包容性、高性能且高效的嵌入模型)

08:00 🧠 Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation(Nemotron-Cascade 2:通过级联强化学习和多领域同策略蒸馏进行大语言模型的后训练)

08:54 🧠 Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding(多模态大语言模型在离散符号理解中的认知错配)

09:45 🎬 EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing(EffectErase:面向高质量效果擦除的视频对象联合移除与插入)

10:58 🔧 VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining(VTC-Bench:通过组合式视觉工具链评估代理式多模态模型)

11:39 🗣 MOSS-TTS Technical Report(MOSS-TTS技术报告)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递