2026.06.09 | 代码探索短板凸显;策略内蒸馏几何特性揭示

2026.06.09 | 代码探索短板凸显;策略内蒸馏几何特性揭示

15分钟 ·
播放数78
·
评论数0

【目录】
本期的 15 篇论文如下:

[00:32] 🔍 SWE-Explore: Benchmarking How Coding Agents Explore Repositories(SWE-Explore:基准测试编码代理如何探索代码仓库)
[01:34] 🔍 On the Geometry of On-Policy Distillation(论策略内蒸馏的几何特性)
[02:26] 🧠 Latent Spatial Memory for Video World Models(面向视频世界模型的潜在空间记忆)
[03:20] 🎬 CoVEBench: Can Video Editing Models Handle Complex Instructions?(CoVEBench:视频编辑模型能否处理复杂指令?)
[04:20] 🧠 LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents(潜在技能:从上下文文本技能到LLM智能体的权重内潜在技能)
[05:10] ⚡ FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention(闪存-深度求索V4:通过前向稀疏注意力实现闪电般超长上下文处理)
[06:06] 🌍 SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks(空间世界:真实世界任务中多模态智能体交互式空间推理的基准测试)
[07:10] 🧠 Human Psychometric Questionnaires Mischaracterize LLM Behavior(人类心理测量问卷误判LLM行为)
[08:19] 🧠 Echo-Memory: A Controlled Study of Memory in Action World Models(回响记忆:动作世界模型中记忆机制的受控研究)
[09:08] 🎮 OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics(OmniGameArena:一个统一的UE5基准测试,用于具备改进动态的VLM游戏智能体)
[10:03] 🤖 AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing(AHA-WAM:异步自适应时域世界-动作建模与观测引导上下文路由)
[11:08] 🎥 SwiftVR: Real-Time One-Step Generative Video Restoration(SwiftVR:实时一步生成式视频修复)
[12:12] 🧠 Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses(贝叶斯智能体:基于后验引导的技能演化用于LLM智能体框架)
[13:02] 🎬 OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning(OmniCap-IF:全方位视频字幕生成的指令遵循能力基准测试与改进)
[14:14] 🎯 Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill(技能奖励模型:通过智能体技能统一异构评估标准)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

【赞助商】
OpenClaw快报
每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论
传送门 www.xiaoyuzhoufm.com