2026.05.28 | ProRL主动引导推荐；γ-World实现多智能体零样本泛化 - HuggingFace 每日AI论文速递

【目录】
本期的 15 篇论文如下：
[00:24] 🎯 ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation（ProRL：通过修正策略梯度估计实现主动推荐的有效强化学习）
[01:27] 🌍 Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players（Gamma-World：超越双玩家的生成式多智能体世界建模）
[02:28] 🤖 Agent Explorative Policy Optimization for Multimodal Agentic Reasoning（面向多模态智能体推理的智能体探索性策略优化）
[03:24] 👁 From Pixels to Words -- Towards Native One-Vision Models at Scale（从像素到文字——迈向规模化的原生单视觉模型）
[04:19] 🔍 Self-Improving Language Models with Bidirectional Evolutionary Search（基于双向进化搜索的自我改进语言模型）
[05:01] 🧮 ResearchMath-14K: Scaling Research-Level Mathematics via Agents（ResearchMath-14K：通过智能体扩展研究级数学问题）
[06:03] 🔍 MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems（MemTrace：大型语言模型记忆系统中的错误追踪与归因）
[06:58] 🛠 DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes（DenoiseRL：引导推理模型从噪声前缀中恢复的自举强化学习）
[07:54] 🤖 GEM: Generative Supervision Helps Embodied Intelligence（GEM：生成式监督助力具身智能）
[08:41] 🎯 Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents（从弱点中学习：小型计算机使用智能体的自动化领域专精）
[09:30] 🔗 ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence（ScientistOne：通过证据链迈向人类级别的自主研究）
[10:27] 🔬 AI Research Agents Narrow Scientific Exploration（AI研究代理缩小科学探索范围）
[11:17] 🧠 Rethinking Memory as Continuously Evolving Connectivity（重新思考记忆作为持续演化的连接性）
[12:15] 🎥 OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning（OSP-Next：基于稀疏序列并行、HiF8量化和强化学习的高效高质量视频生成）
[13:08] ⚖ Long Live The Balance: Information Bottleneck Driven Tree-based Policy Optimization（长久平衡：信息瓶颈驱动的树形策略优化）

【关注我们】
您还可以在以下平台找到我们，获得播客内容以外更多信息
小红书: AI速递