【赞助商】
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗www.xiaoyuzhoufm.com
【目录】
本期的 15 篇论文如下:
00:32 🤖 BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries(BayesianVLA:通过潜在动作查询对视觉语言动作模型进行贝叶斯分解)
01:22 ⚠ The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models(灵活性陷阱:为何任意顺序生成会限制扩散语言模型的推理潜力)
02:26 🎥 HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding(HERMES:将KV缓存作为分层内存以实现高效流式视频理解)
03:14 🚀 EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience(EvoCUA:通过从可扩展合成经验中学习来演化计算机使用智能体)
04:02 🧪 LLM-in-Sandbox Elicits General Agentic Intelligence(沙盒中的LLM激发通用智能体智能)
04:54 🚀 Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model(Stable-DiffCoder:推进代码扩散大语言模型的前沿)
05:34 🎭 SAMTok: Representing Any Mask with Two Words(SAMTok:用两个词表示任意掩码)
06:30 🚀 Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders(使用表征自编码器扩展文本到图像扩散变换器)
07:23 🔬 Learning to Discover at Test Time(在测试时学习发现)
08:08 🔍 Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing(重新思考组合图像检索评估:一个源自图像编辑的细粒度基准)
09:06 ⚙ Towards Automated Kernel Generation in the Era of LLMs(大语言模型时代的自动化内核生成研究)
09:48 🔄 OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation(OpenVision 3:一个用于理解和生成的统一视觉编码器家族)
10:45 💻 Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces(终端基准测试:在命令行界面中对智能体进行困难、现实任务的基准评估)
11:29 🗣 Qwen3-TTS Technical Report(Qwen3-TTS技术报告)
12:13 🤖 Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning(Cosmos策略:通过微调视频模型实现视觉运动控制与规划)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
