【赞助商】
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗www.xiaoyuzhoufm.com
【目录】
本期的 15 篇论文如下:
00:32 🧠 Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training(Spatial-TTT:基于测试时训练的流式视觉空间智能)
01:17 🤔 Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections(策略性导航还是随机搜索?智能体与人类在文档集合上的推理方式研究)
02:11 ⚡ IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse(IndexCache:通过跨层索引复用加速稀疏注意力)
02:54 🎬 Video-Based Reward Modeling for Computer-Use Agents(基于视频的计算机使用智能体奖励建模)
03:55 🎬 DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning(DreamVideo-Omni:基于潜在身份强化学习的全运动控制多主体视频定制)
04:46 🎯 Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation(信任你的评判者:用于忠实图像编辑与生成的鲁棒奖励建模与强化学习)
05:40 🎬 DVD: Deterministic Video Depth Estimation with Generative Priors(DVD:基于生成先验的确定性视频深度估计)
06:29 🖼 WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing(WeEdit:面向文本中心图像编辑的数据集、基准与字形引导框架)
07:29 🎬 ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation(ShotVerse:面向文本驱动多镜头视频创作的电影级摄像机控制技术)
08:24 🧠 GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing(GRADE:基准测试学科知识驱动的图像编辑推理能力)
09:08 🎬 EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation(EVATok:面向高效视觉自回归生成的自适应长度视频分词)
09:55 ⚡ One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers(一模型,多预算:用于扩散变换器的弹性潜在接口)
10:46 🤖 OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams(OmniStream:在连续流中掌握感知、重建与行动)
11:29 🧠 EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models(EndoCoT:在扩散模型中扩展内生思维链推理)
12:37 🧠 XSkill: Continual Learning from Experience and Skills in Multimodal Agents(XSkill:多模态智能体从经验与技能中的持续学习)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
