2026.05.11 | 音乐驱舞拆分专家；流匹配蒸馏全科状元 - HuggingFace 每日AI论文速递

【目录】
本期的 15 篇论文如下：
[00:29] 💃 MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation（MACE-Dance：音乐驱动舞蹈视频生成的运动与外观级联专家模型）
[01:07] 🎯 Flow-OPD: On-Policy Distillation for Flow Matching Models（Flow-OPD：面向流匹配模型的在线策略蒸馏）
[01:58] 🎯 Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex（列表式策略优化：基于组的RLVR作为LLM响应单纯形上的目标投影）
[02:52] 🔍 HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents（HyperEyes：面向并行多模态搜索代理的双粒度效率感知强化学习）
[03:37] 🤖 LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling（大语言模型自我进化：面向测试时扩展的智能体发现框架）
[04:20] 🎥 HumanNet: Scaling Human-centric Video Learning to One Million Hours（HumanNet：将人类中心视频学习扩展到一百万小时）
[05:09] 🧠 Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers（均值模式尖叫：面向千层扩散Transformer的均值-方差分裂残差）
[06:06] 🔍 Beyond Retrieval: A Multitask Benchmark and Model for Code Search（超越检索：面向代码搜索的多任务基准与模型）
[07:06] 🧩 Anisotropic Modality Align（各向异性模态对齐）
[07:58] 🤖 AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning（AEM：面向多轮智能体强化学习的自适应熵调制）
[08:49] 📜 TextLDM: Language Modeling with Continuous Latent Diffusion（TextLDM：基于连续潜在扩散的语言建模）
[09:41] 🧠 4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding（4D思考者：利用4D图像进行动态空间理解的思考）
[10:25] 🎬 A$^2$RD: Agentic Autoregressive Diffusion for Long Video Consistency（A²RD：用于长视频一致性的智能自回归扩散模型）
[11:04] 🛡 DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents（DecodingTrust-Agent平台（DTap）：一个可控且可交互的AI智能体红队测试平台）
[11:52] 🔍 MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference（MISA：面向长上下文大语言模型推理的混合索引器稀疏注意力机制）

【关注我们】
您还可以在以下平台找到我们，获得播客内容以外更多信息
小红书: AI速递