2025.08.29 | 稳定文本到图像生成；高效数学推理 - HuggingFace 每日AI论文速递

本期的 15 篇论文如下：

00:24 ⚖ Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning（Pref-GRPO：基于成对偏好奖励的GRPO用于稳定的文本到图像强化学习）

00:57 🧠 rStar2-Agent: Agentic Reasoning Technical Report（rStar2-Agent：智能体推理技术报告）

01:28 🎨 USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning（USO: 通过解耦和奖励学习的统一风格与主题驱动生成）

01:56 🚀 AWorld: Orchestrating the Training Recipe for Agentic AI（AWorld：编排智能体AI的训练配方）

02:26 🎯 TCIA: A Task-Centric Instruction Augmentation Method for Instruction Finetuning（TCIA：一种用于指令微调的任务中心式指令增强方法）

02:54 🧠 Mixture of Contexts for Long Video Generation（上下文混合用于长视频生成）

03:17 🧠 CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification（CogVLA：基于指令驱动路由与稀疏化的认知对齐视觉-语言-动作模型）

03:51 🔍 MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers（MCP-Bench: 通过MCP服务器使用复杂现实世界任务对工具使用LLM代理进行基准测试）

04:23 🎨 OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning（OneReward：通过多任务人类偏好学习实现统一的掩码引导图像生成）

04:54 🛡 Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection（扭转局面：通过秩一安全注入实现轻量级对齐增强）

05:21 🧠 Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD（大型语言模型中的说服动态：使用DuET-PD研究知识和安全方面的鲁棒性和适应性）

05:56 💃 Dress&Dance: Dress up and Dance as You Like It - Technical Preview（着装与舞蹈：随心着装与舞蹈 - 技术预览）

06:18 🎯 OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models（OnGoal：在大型语言模型多轮对话中跟踪和可视化对话目标）

06:42 📷 Multi-View 3D Point Tracking（多视图3D点跟踪）

07:10 🎭 FakeParts: a New Family of AI-Generated DeepFakes（FakeParts：一种新型AI生成的深度伪造家族）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递