2026.06.03 | 信任区域教小模型;人形GPT追踪动作

2026.06.03 | 信任区域教小模型;人形GPT追踪动作

15分钟 ·
播放数77
·
评论数0

【目录】
本期的 15 篇论文如下:

[00:31] 🎯 Trust Region On-Policy Distillation(信任区域同策略蒸馏)
[01:17] 🤖 Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking(人形GPT:扩展数据与结构实现零样本运动追踪)
[02:07] 🧠 A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL(多领域强化学习中跨域干扰与恢复的局部微扰理论)
[03:06] 🧠 World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning(世界模型与语言模型:具体与抽象推理的互补性)
[03:57] 🏥 AutoMedBench: Towards Medical AutoResearch with Agentic AI Models(AutoMedBench:面向医疗自主研究的智能体AI模型基准)
[05:09] 🖼 Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation(解耦残差去噪扩散模型用于统一且数据高效的图像到图像翻译)
[06:12] 😴 Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories(语言模型需要睡眠:学习自我修改与记忆巩固)
[07:09] 🧩 TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL(TRON:面向视觉推理强化学习的目标驱动、规则可验证的在线环境)
[08:07] 💬 $Ψ$-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues(Ψ-Bench:评估说服性对话中个性感知影响能力)
[09:08] 🧩 Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging(去中心化指令微调:冲突感知分割与权重合并)
[10:05] 🎯 Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling(小型强化学习控制器与大型语言模型:基于强化学习引导的自适应采样实现测试时扩展)
[11:09] 📄 PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training(PaddleOCR-VL-1.6:通过欠优化区域精炼与渐进式后训练扩展文档解析前沿)
[12:14] 🗺 PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps(柏拉图导航:利用柏拉图拓扑图揭示导航中的语义对应关系)
[13:16] 🔍 Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces(诊断正确答案长链思维训练轨迹中的有害延续)
[14:05] 🎵 MERIT: Learning Disentangled Music Representations for Audio Similarity(MERIT:学习用于音频相似性的解耦音乐表示)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

【赞助商】
OpenClaw快报
每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论
传送门 www.xiaoyuzhoufm.com