2026.06.18 | 多模态大模型记忆成短板;语言指令驱动3D轨迹预测

2026.06.18 | 多模态大模型记忆成短板;语言指令驱动3D轨迹预测

14分钟 ·
播放数83
·
评论数0

【赞助商】
OpenClaw快报
每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论
传送门 www.xiaoyuzhoufm.com

【目录】
本期的 15 篇论文如下:

[00:32] 🧠 Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games(超越当前观测:评估多模态大语言模型在可控非马尔可夫博弈中的表现)
[01:29] 🎯 MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction(MolmoMotion:基于语言指令的3D点轨迹预测)
[02:15] 🌍 Kairos: A Native World Model Stack for Physical AI(Kairos:面向物理智能的原生世界模型栈)
[03:05] 🛠 Guava: An Effective and Universal Harness for Embodied Manipulation(番石榴:一种有效且通用的具身操作框架)
[03:58] ⚡ EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts(高效展开:面向强化学习展开的系统感知自推测解码)
[04:45] 🎯 The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL(奖励一直就在你的数据中:用判别器引导的强化学习纠正流匹配)
[05:51] 🔍 SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior(SAE干预不可靠:抑制行为在干预后的恢复)
[06:36] 🤖 From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning(从受训者到训练者:面向多智能体推理的LLM设计训练环境强化学习框架)
[07:27] 🧠 Reinforcing Dual-Path Reasoning in Spatial Vision Language Models(增强空间视觉语言模型中的双路径推理)
[08:25] 🎯 Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding(信任正确的教师:面向GUI定位的质量感知自蒸馏方法)
[09:25] 👁 Native Active Perception as Reasoning for Omni-Modal Understanding(原生主动感知作为全模态理解的推理)
[10:15] 🐱 MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model(缅因猫:追求实时的音视频社交世界模型)
[11:08] 🖌 Sumi: Open Uniform Diffusion Language Model from Scratch(Sumi:从头构建的开放均匀扩散语言模型)
[11:51] 🎯 STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability(STARE:基于惊异度的令牌级优势重加权以实现策略熵稳定性)
[12:46] 🌍 Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems(超越对齐:价值多样性作为多元文化智能体系统中的集体属性)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递