2026.06.25 | 智能体记忆尚待完善；主体穿越与场景解耦。 - HuggingFace 每日AI论文速递

【赞助商】
OpenClaw快报
每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论
传送门 www.xiaoyuzhoufm.com

【目录】
本期的 15 篇论文如下：

[00:33] 🧠 Are We Ready For An Agent-Native Memory System?（我们准备好构建智能体原生内存系统了吗？）
[01:25] 🎥 DomainShuttle: Freeform Open Domain Subject-driven Text-to-video Generation（DomainShuttle：自由形式开放域主题驱动的文生视频生成）
[02:14] 📸 ShutterMuse: Capture-Time Photography Guidance with MLLMs（ShutterMuse：基于多模态大语言模型的拍摄时摄影指导）
[03:13] ⚡ Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models（Wan-Streamer v0.1：端到端实时交互基础模型）
[04:13] 🧠 Improved Large Language Diffusion Models（改进的大型语言扩散模型）
[05:17] 🧑 Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence（超越NL2Code：多模态代码智能的结构化综述）
[06:21] 🎥 MVTrack4Gen: Multi-View Point Tracking as Geometric Supervision for 4D Video Generation（MVTrack4Gen：多视角点跟踪作为4D视频生成的几何监督）
[07:14] 🔍 V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning（V-Zero：基于对比证据门控的无答案标签在线策略蒸馏用于细粒度视觉推理）
[08:27] 🎬 UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating（UnityShots：基于记忆驱动与边界感知门控的多镜头音视频生成）
[09:34] 🧠 IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation（隐式视觉思维链：面向结构感知文本到图像生成的潜在视觉推理框架）
[10:40] 🔧 EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies（EBench：通用移动操作策略的要素诊断）
[11:39] 🎥 Causal-rCM: A Unified Teacher-Forcing and Self-Forcing Open Recipe for Autoregressive Diffusion Distillation in Streaming Video Generation and Interactive World Models（因果-rCM：自回归扩散蒸馏中统一教师强制与自我强制的开放方案，用于流式视频生成与交互式世界模型）
[12:38] 🤖 The Hitchhiker's Guide to Agentic AI: From Foundations to Systems（《银河系漫游指南：从基础到系统的智能体AI》）
[13:31] 🤖 Autodata: An agentic data scientist to create high quality synthetic data（Autodata：一种创建高质量合成数据的智能数据科学家代理）
[14:22] 🧠 Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do（目光轻浅，思考深沉：多模态链式思维推理能做什么与不能做什么）

【关注我们】
您还可以在以下平台找到我们，获得播客内容以外更多信息
小红书: AI速递