2026.06.25 | 智能体记忆尚待完善;主体穿越与场景解耦。

2026.06.25 | 智能体记忆尚待完善;主体穿越与场景解耦。

16分钟 ·
播放数42
·
评论数0

【赞助商】
OpenClaw快报
每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论
传送门 www.xiaoyuzhoufm.com

【目录】
本期的 15 篇论文如下:

[00:33] 🧠 Are We Ready For An Agent-Native Memory System?(我们准备好构建智能体原生内存系统了吗?)
[01:25] 🎥 DomainShuttle: Freeform Open Domain Subject-driven Text-to-video Generation(DomainShuttle:自由形式开放域主题驱动的文生视频生成)
[02:14] 📸 ShutterMuse: Capture-Time Photography Guidance with MLLMs(ShutterMuse:基于多模态大语言模型的拍摄时摄影指导)
[03:13] ⚡ Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models(Wan-Streamer v0.1:端到端实时交互基础模型)
[04:13] 🧠 Improved Large Language Diffusion Models(改进的大型语言扩散模型)
[05:17] 🧑 Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence(超越NL2Code:多模态代码智能的结构化综述)
[06:21] 🎥 MVTrack4Gen: Multi-View Point Tracking as Geometric Supervision for 4D Video Generation(MVTrack4Gen:多视角点跟踪作为4D视频生成的几何监督)
[07:14] 🔍 V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning(V-Zero:基于对比证据门控的无答案标签在线策略蒸馏用于细粒度视觉推理)
[08:27] 🎬 UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating(UnityShots:基于记忆驱动与边界感知门控的多镜头音视频生成)
[09:34] 🧠 IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation(隐式视觉思维链:面向结构感知文本到图像生成的潜在视觉推理框架)
[10:40] 🔧 EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies(EBench:通用移动操作策略的要素诊断)
[11:39] 🎥 Causal-rCM: A Unified Teacher-Forcing and Self-Forcing Open Recipe for Autoregressive Diffusion Distillation in Streaming Video Generation and Interactive World Models(因果-rCM:自回归扩散蒸馏中统一教师强制与自我强制的开放方案,用于流式视频生成与交互式世界模型)
[12:38] 🤖 The Hitchhiker's Guide to Agentic AI: From Foundations to Systems(《银河系漫游指南:从基础到系统的智能体AI》)
[13:31] 🤖 Autodata: An agentic data scientist to create high quality synthetic data(Autodata:一种创建高质量合成数据的智能数据科学家代理)
[14:22] 🧠 Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do(目光轻浅,思考深沉:多模态链式思维推理能做什么与不能做什么)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递