2026.05.28 | ProRL主动引导推荐;γ-World实现多智能体零样本泛化

2026.05.28 | ProRL主动引导推荐;γ-World实现多智能体零样本泛化

14分钟 ·
播放数51
·
评论数0

【目录】
本期的 15 篇论文如下:
[00:24] 🎯 ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation(ProRL:通过修正策略梯度估计实现主动推荐的有效强化学习)
[01:27] 🌍 Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players(Gamma-World:超越双玩家的生成式多智能体世界建模)
[02:28] 🤖 Agent Explorative Policy Optimization for Multimodal Agentic Reasoning(面向多模态智能体推理的智能体探索性策略优化)
[03:24] 👁 From Pixels to Words -- Towards Native One-Vision Models at Scale(从像素到文字——迈向规模化的原生单视觉模型)
[04:19] 🔍 Self-Improving Language Models with Bidirectional Evolutionary Search(基于双向进化搜索的自我改进语言模型)
[05:01] 🧮 ResearchMath-14K: Scaling Research-Level Mathematics via Agents(ResearchMath-14K:通过智能体扩展研究级数学问题)
[06:03] 🔍 MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems(MemTrace:大型语言模型记忆系统中的错误追踪与归因)
[06:58] 🛠 DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes(DenoiseRL:引导推理模型从噪声前缀中恢复的自举强化学习)
[07:54] 🤖 GEM: Generative Supervision Helps Embodied Intelligence(GEM:生成式监督助力具身智能)
[08:41] 🎯 Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents(从弱点中学习:小型计算机使用智能体的自动化领域专精)
[09:30] 🔗 ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence(ScientistOne:通过证据链迈向人类级别的自主研究)
[10:27] 🔬 AI Research Agents Narrow Scientific Exploration(AI研究代理缩小科学探索范围)
[11:17] 🧠 Rethinking Memory as Continuously Evolving Connectivity(重新思考记忆作为持续演化的连接性)
[12:15] 🎥 OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning(OSP-Next:基于稀疏序列并行、HiF8量化和强化学习的高效高质量视频生成)
[13:08] ⚖ Long Live The Balance: Information Bottleneck Driven Tree-based Policy Optimization(长久平衡:信息瓶颈驱动的树形策略优化)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递