2026.05.11 | 音乐驱舞拆分专家;流匹配蒸馏全科状元

2026.05.11 | 音乐驱舞拆分专家;流匹配蒸馏全科状元

13分钟 ·
播放数75
·
评论数0

【目录】
本期的 15 篇论文如下:
[00:29] 💃 MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation(MACE-Dance:音乐驱动舞蹈视频生成的运动与外观级联专家模型)
[01:07] 🎯 Flow-OPD: On-Policy Distillation for Flow Matching Models(Flow-OPD:面向流匹配模型的在线策略蒸馏)
[01:58] 🎯 Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex(列表式策略优化:基于组的RLVR作为LLM响应单纯形上的目标投影)
[02:52] 🔍 HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents(HyperEyes:面向并行多模态搜索代理的双粒度效率感知强化学习)
[03:37] 🤖 LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling(大语言模型自我进化:面向测试时扩展的智能体发现框架)
[04:20] 🎥 HumanNet: Scaling Human-centric Video Learning to One Million Hours(HumanNet:将人类中心视频学习扩展到一百万小时)
[05:09] 🧠 Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers(均值模式尖叫:面向千层扩散Transformer的均值-方差分裂残差)
[06:06] 🔍 Beyond Retrieval: A Multitask Benchmark and Model for Code Search(超越检索:面向代码搜索的多任务基准与模型)
[07:06] 🧩 Anisotropic Modality Align(各向异性模态对齐)
[07:58] 🤖 AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning(AEM:面向多轮智能体强化学习的自适应熵调制)
[08:49] 📜 TextLDM: Language Modeling with Continuous Latent Diffusion(TextLDM:基于连续潜在扩散的语言建模)
[09:41] 🧠 4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding(4D思考者:利用4D图像进行动态空间理解的思考)
[10:25] 🎬 A$^2$RD: Agentic Autoregressive Diffusion for Long Video Consistency(A²RD:用于长视频一致性的智能自回归扩散模型)
[11:04] 🛡 DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents(DecodingTrust-Agent平台(DTap):一个可控且可交互的AI智能体红队测试平台)
[11:52] 🔍 MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference(MISA:面向长上下文大语言模型推理的混合索引器稀疏注意力机制)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递