2025.02.17 | RAS加速扩散变换器，视频生成提升质量 - HuggingFace 每日AI论文速递

本期的 21 篇论文如下：

00:22 🌐 Region-Adaptive Sampling for Diffusion Transformers（区域自适应采样扩散变换器）

01:05 🎥 Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model（步进视频生成技术报告：视频基础模型的实践、挑战与未来）

01:48 🌊 Large Language Diffusion Models（大规模语言扩散模型）

02:31 🧠 ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models（零基准：当代大型多模态模型的不可视觉基准）

03:15 🌟 MM-RLHF: The Next Step Forward in Multimodal LLM Alignment（MM-RLHF：多模态大语言模型对齐的下一步进展）

03:58 🖼 Precise Parameter Localization for Textual Generation in Diffusion Models（扩散模型中文本生成精确参数定位）

04:40 🧠 Diverse Inference and Verification for Advanced Reasoning（高级推理的多重推断与验证）

05:22 🧬 DarwinLM: Evolutionary Structured Pruning of Large Language Models（达尔文LM：大型语言模型的进化结构剪枝）

06:02 📈 AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting（AdaPTS：将单变量基础模型适配到概率性多变量时间序列预测）

06:40 🖼 ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation（ImageRAG：动态图像检索用于引导图像生成）

07:23 🤖 We Can't Understand AI Using our Existing Vocabulary（我们无法用现有词汇理解人工智能）

08:03 📊 FoNE: Precise Single-Token Number Embeddings via Fourier Features（FoNE：通过傅里叶特征实现精确的单标记数字嵌入）

08:53 🌍 Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages（小模型，大影响：面向低资源语言的多语言小模型的有效语料库与基于图的适应）

09:41 🔓 Jailbreaking to Jailbreak（越狱以越狱）

10:23 🤖 STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning（STMA：一种用于长时程具身任务规划的时空记忆代理）

11:05 📊 Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding（文本引导的稀疏体素剪枝用于高效的三维视觉定位）

11:41 ⚡ MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers（基于ODE和SDE求解器的均值回归扩散快速采样器）

12:26 🚗 V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models（V2V-LLM：基于多模态大语言模型的车辆间协同自动驾驶）

13:06 🎵 CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages（CLaMP 3：跨模态与跨语言的通用音乐信息检索）

13:49 🧩 Cluster and Predict Latents Patches for Improved Masked Image Modeling（基于聚类与预测潜在补丁的改进掩码图像建模）

14:31 🧬 Agentic End-to-End De Novo Protein Design for Tailored Dynamics Using a Language Diffusion Model（基于语言扩散模型的端到端从头蛋白质设计以实现定制动力学）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递