【赞助商】
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗www.xiaoyuzhoufm.com
【目录】
本期的 15 篇论文如下:
00:31 ⚠ The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies(魔书背后的魔鬼:在自我进化的AI社会中,人类安全价值总是趋于消失)
01:24 🎵 MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models(MOSS-Audio-Tokenizer:为未来音频基础模型扩展音频分词器)
02:28 🧠 Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation(超越教师的学习:基于奖励外推的广义策略蒸馏)
03:05 🤖 GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning(GigaBrain-0.5M*:一种通过世界模型强化学习训练的视觉-语言-动作模型)
03:56 ⚖ LawThinker: A Deep Research Legal Agent in Dynamic Environments(LawThinker:动态环境中的深度研究法律智能体)
04:33 🔍 Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning(思之愈久,探之愈深:通过长度激励强化学习实现上下文内探索)
05:16 🎨 Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching(惊喜之笔:矢量草图绘制中的渐进式语义错觉)
06:01 🚀 DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing(DeepGen 1.0:一个用于推进图像生成与编辑的轻量级统一多模态模型)
06:55 🧩 Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models(Composition-RL:为大型语言模型强化学习组合可验证提示)
07:38 🧠 Thinking with Drafting: Optical Decompression via Logical Reconstruction(思维与草稿:通过逻辑重构实现光学解压缩)
08:17 🗳 dVoting: Fast Voting for dLLMs(dVoting:面向扩散大语言模型的快速投票推理方法)
09:09 🤖 RISE: Self-Improving Robot Policy with Compositional World Model(RISE:基于组合世界模型的机器人策略自改进框架)
09:54 🤖 $χ_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies(χ₀:通过驯服分布不一致实现资源感知的鲁棒机器人操作)
10:48 🤖 EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration(EgoHumanoid:利用无机器人自我中心演示解锁野外移动操作)
11:45 🔍 Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation(揭示隐式优势对称性:为何GRPO在探索与难度适应中举步维艰)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
