2026.03.20 | 生成模型解锁3D空间理解；SAMA零试指令编辑追平Kling - HuggingFace 每日AI论文速递

【赞助商】

通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事

【目录】

本期的 15 篇论文如下：

00:29 🧠 Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding（生成模型懂空间：释放隐式3D先验用于场景理解）

01:09 🎬 SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing（SAMA：基于分解式语义锚定与运动对齐的指令引导视频编辑）

01:45 ⚡ FASTER: Rethinking Real-Time Flow VLAs（FASTER：重新思考实时流视觉语言动作模型）

02:30 🎬 3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model（3DreamBooth：高保真三维主体驱动视频生成模型）

03:31 🤖 Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer（基于扩散的离散运动分词器：连接语义与运动学条件）

04:21 🤖 MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction（MonoArt：基于渐进式结构推理的单目铰接三维重建）

05:13 🧩 Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens（立方离散扩散：基于高维表示令牌的离散视觉生成）

05:47 📊 LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs（LVOmniBench：面向全模态大语言模型的长音频视频理解评估新基准）

06:42 🧠 Memento-Skills: Let Agents Design Agents（Memento-Skills：让智能体设计智能体）

07:18 🌍 F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World（F2LLM-v2：面向多语言世界的包容性、高性能且高效的嵌入模型）

08:00 🧠 Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation（Nemotron-Cascade 2：通过级联强化学习和多领域同策略蒸馏进行大语言模型的后训练）

08:54 🧠 Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding（多模态大语言模型在离散符号理解中的认知错配）

09:45 🎬 EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing（EffectErase：面向高质量效果擦除的视频对象联合移除与插入）

10:58 🔧 VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining（VTC-Bench：通过组合式视觉工具链评估代理式多模态模型）

11:39 🗣 MOSS-TTS Technical Report（MOSS-TTS技术报告）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递