本期的 15 篇论文如下:
00:24 ⚖ Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning(Pref-GRPO:基于成对偏好奖励的GRPO用于稳定的文本到图像强化学习)
00:57 🧠 rStar2-Agent: Agentic Reasoning Technical Report(rStar2-Agent:智能体推理技术报告)
01:28 🎨 USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning(USO: 通过解耦和奖励学习的统一风格与主题驱动生成)
01:56 🚀 AWorld: Orchestrating the Training Recipe for Agentic AI(AWorld:编排智能体AI的训练配方)
02:26 🎯 TCIA: A Task-Centric Instruction Augmentation Method for Instruction Finetuning(TCIA:一种用于指令微调的任务中心式指令增强方法)
02:54 🧠 Mixture of Contexts for Long Video Generation(上下文混合用于长视频生成)
03:17 🧠 CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification(CogVLA:基于指令驱动路由与稀疏化的认知对齐视觉-语言-动作模型)
03:51 🔍 MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers(MCP-Bench: 通过MCP服务器使用复杂现实世界任务对工具使用LLM代理进行基准测试)
04:23 🎨 OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning(OneReward:通过多任务人类偏好学习实现统一的掩码引导图像生成)
04:54 🛡 Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection(扭转局面:通过秩一安全注入实现轻量级对齐增强)
05:21 🧠 Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD(大型语言模型中的说服动态:使用DuET-PD研究知识和安全方面的鲁棒性和适应性)
05:56 💃 Dress&Dance: Dress up and Dance as You Like It - Technical Preview(着装与舞蹈:随心着装与舞蹈 - 技术预览)
06:18 🎯 OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models(OnGoal:在大型语言模型多轮对话中跟踪和可视化对话目标)
06:42 📷 Multi-View 3D Point Tracking(多视图3D点跟踪)
07:10 🎭 FakeParts: a New Family of AI-Generated DeepFakes(FakeParts:一种新型AI生成的深度伪造家族)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
