2026.05.19 | 长视频生成提速降显存；轻量多模态模型超越大参数模型 - HuggingFace 每日AI论文速递

【目录】
本期的 15 篇论文如下：
[00:23] 🎬 LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation（LongLive-2.0：用于长视频生成的NVFP4并行基础设施）
[01:17] 🎨 Lance: Unified Multimodal Modeling by Multi-Task Synergy（Lance：通过多任务协同实现统一多模态建模）
[02:24] 🤖 AI for Auto-Research: Roadmap & User Guide（人工智能自动研究：路线图与用户指南）
[03:26] 🛠 SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution（SkillsVote：从收集、推荐到演化的智能体技能全生命周期治理）
[04:20] 🎬 KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration（KVPO：基于KV语义探索的ODE原生GRPO自回归视频对齐方法）
[05:18] 🏠 Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis（代码即房间：通过智能体代码合成从俯视图生成三维房间）
[06:15] 🤖 OProver: A Unified Framework for Agentic Formal Theorem Proving（OProver：面向智能体形式定理证明的统一框架）
[07:14] ⚡ Post-Trained MoE Can Skip Half Experts via Self-Distillation（通过自蒸馏实现后训练MoE跳过半数专家）
[07:57] 🎥 LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs（LiteFrame：高效视觉编码器解锁视频大语言模型中的帧缩放）
[08:47] 🛑 Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models（当推理收敛时停止：面向推理模型的语义保持型早停方法）
[09:42] 🔀 Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement（扩散应进入语言模型的何处？基于几何引导的隐状态替换）
[10:39] 🧠 Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use（模型自适应工具必要性揭示大语言模型工具使用中的知行差距）
[11:40] 🛡 StableVLA: Towards Robust Vision-Language-Action Models without Extra Data（稳定视觉-语言-动作模型：无需额外数据实现鲁棒性）
[12:43] ⚡ CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection（紧凑注意力：通过块联合KV选择加速分块预填充）
[13:41] 🧪 From Runnable to Shippable: Multi-Agent Test-Driven Development for Generating Full-Stack Web Applications from Requirements（从可运行到可交付：面向全栈Web应用生成的多智能体测试驱动开发）

【关注我们】
您还可以在以下平台找到我们，获得播客内容以外更多信息
小红书: AI速递