2025.02.26 | OmniAlign-V提升多模态模型对齐，SpargeAttn加速注意力计算 - HuggingFace 每日AI论文速递

本期的 14 篇论文如下：

00:23 🤖 OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference（OmniAlign-V：迈向多模态大语言模型与人类偏好增强对齐）

01:06 ⚡ SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference（SpargeAttn：准确稀疏注意力加速任意模型推理）

01:53 🖼 KV-Edit: Training-Free Image Editing for Precise Background Preservation（KV-编辑：无需训练的图像编辑方法，实现精确背景保留）

02:32 🌈 ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation（匿名区域变换器：可变多层透明图像生成）

03:08 🤖 SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution（SWE-RL：通过开源软件演化数据强化学习提升LLM推理能力）

03:51 📊 Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective（揭示大语言模型下游性能扩展：基于聚类的视角）

04:30 🧠 Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models（尺度分布解耦：实现大型语言模型稳定有效训练）

05:11 🔄 K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs（K-LoRA：解锁无需训练的任意主题和风格LoRA融合）

05:51 🌐 WebGames: Challenging General-Purpose Web-Browsing AI Agents（WebGames：挑战通用网页浏览AI代理）

06:29 🧠 Introducing Visual Perception Token into Multimodal Large Language Model（引入视觉感知令牌的多模态大语言模型）

07:07 🎰 The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?（彩票LLM假说：重新思考LLM压缩应保留的能力）

07:47 🧠 AAD-LLM: Neural Attention-Driven Auditory Scene Understanding（AAD-LLM：神经注意力驱动的听觉场景理解）

08:26 🔍 LaTIM: Measuring Latent Token-to-Token Interactions in Mamba Models（LaTIM：测量Mamba模型中的潜在Token-to-Token交互）

09:07 🧠 Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI（Shakti-VLMs：企业级AI的可扩展视觉语言模型）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递