2026.06.03 | 信任区域教小模型；人形GPT追踪动作 - HuggingFace 每日AI论文速递

【目录】
本期的 15 篇论文如下：

[00:31] 🎯 Trust Region On-Policy Distillation（信任区域同策略蒸馏）
[01:17] 🤖 Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking（人形GPT：扩展数据与结构实现零样本运动追踪）
[02:07] 🧠 A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL（多领域强化学习中跨域干扰与恢复的局部微扰理论）
[03:06] 🧠 World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning（世界模型与语言模型：具体与抽象推理的互补性）
[03:57] 🏥 AutoMedBench: Towards Medical AutoResearch with Agentic AI Models（AutoMedBench：面向医疗自主研究的智能体AI模型基准）
[05:09] 🖼 Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation（解耦残差去噪扩散模型用于统一且数据高效的图像到图像翻译）
[06:12] 😴 Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories（语言模型需要睡眠：学习自我修改与记忆巩固）
[07:09] 🧩 TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL（TRON：面向视觉推理强化学习的目标驱动、规则可验证的在线环境）
[08:07] 💬 $Ψ$-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues（Ψ-Bench：评估说服性对话中个性感知影响能力）
[09:08] 🧩 Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging（去中心化指令微调：冲突感知分割与权重合并）
[10:05] 🎯 Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling（小型强化学习控制器与大型语言模型：基于强化学习引导的自适应采样实现测试时扩展）
[11:09] 📄 PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training（PaddleOCR-VL-1.6：通过欠优化区域精炼与渐进式后训练扩展文档解析前沿）
[12:14] 🗺 PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps（柏拉图导航：利用柏拉图拓扑图揭示导航中的语义对应关系）
[13:16] 🔍 Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces（诊断正确答案长链思维训练轨迹中的有害延续）
[14:05] 🎵 MERIT: Learning Disentangled Music Representations for Audio Similarity（MERIT：学习用于音频相似性的解耦音乐表示）

【关注我们】
您还可以在以下平台找到我们，获得播客内容以外更多信息
小红书: AI速递

【赞助商】
OpenClaw快报
每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论
传送门 www.xiaoyuzhoufm.com