【赞助商】
OpenClaw快报
每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论
传送门 www.xiaoyuzhoufm.com
【目录】
本期的 15 篇论文如下:
[] 🧠 Are We Ready For An Agent-Native Memory System?(我们准备好构建智能体原生内存系统了吗?)
[] 🎥 DomainShuttle: Freeform Open Domain Subject-driven Text-to-video Generation(DomainShuttle:自由形式开放域主题驱动的文生视频生成)
[] 📸 ShutterMuse: Capture-Time Photography Guidance with MLLMs(ShutterMuse:基于多模态大语言模型的拍摄时摄影指导)
[] ⚡ Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models(Wan-Streamer v0.1:端到端实时交互基础模型)
[] 🧠 Improved Large Language Diffusion Models(改进的大型语言扩散模型)
[] 🧑 Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence(超越NL2Code:多模态代码智能的结构化综述)
[] 🎥 MVTrack4Gen: Multi-View Point Tracking as Geometric Supervision for 4D Video Generation(MVTrack4Gen:多视角点跟踪作为4D视频生成的几何监督)
[] 🔍 V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning(V-Zero:基于对比证据门控的无答案标签在线策略蒸馏用于细粒度视觉推理)
[] 🎬 UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating(UnityShots:基于记忆驱动与边界感知门控的多镜头音视频生成)
[] 🧠 IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation(隐式视觉思维链:面向结构感知文本到图像生成的潜在视觉推理框架)
[] 🔧 EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies(EBench:通用移动操作策略的要素诊断)
[] 🎥 Causal-rCM: A Unified Teacher-Forcing and Self-Forcing Open Recipe for Autoregressive Diffusion Distillation in Streaming Video Generation and Interactive World Models(因果-rCM:自回归扩散蒸馏中统一教师强制与自我强制的开放方案,用于流式视频生成与交互式世界模型)
[] 🤖 The Hitchhiker's Guide to Agentic AI: From Foundations to Systems(《银河系漫游指南:从基础到系统的智能体AI》)
[] 🤖 Autodata: An agentic data scientist to create high quality synthetic data(Autodata:一种创建高质量合成数据的智能数据科学家代理)
[] 🧠 Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do(目光轻浅,思考深沉:多模态链式思维推理能做什么与不能做什么)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
