2025.09.30 | SLA稀疏注意力砍算力；StableToken抗噪不训模 - HuggingFace 每日AI论文速递

本期的 15 篇论文如下：

00:22 ⚡ SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention（SLA：通过可微调稀疏线性注意力突破扩散Transformer的稀疏性极限）

01:05 🗣 StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs（StableToken：一种面向韧性SpeechLLM的噪声鲁棒语义语音分词器）

01:54 🎮 Multiplayer Nash Preference Optimization（多玩家纳什偏好优化）

02:57 🔗 RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark（RealUnify：统一模型真的因“统一”而更强吗？综合基准揭晓答案）

03:44 🎨 OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing（OpenGPT-4o-Image：面向高级图像生成与编辑的大规模综合数据集）

04:28 🧠 Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR（超越探索-利用权衡：面向RLVR中LLM推理的隐状态方法）

05:05 🧩 Visual Jigsaw Post-Training Improves MLLMs（视觉拼图后训练提升多模态大模型）

05:37 🎬 SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer（SANA-Video：基于分块线性注意力Transformer的高效视频扩散生成模型）

06:15 🔬 Democratizing AI scientists using ToolUniverse（用ToolUniverse普及AI科学家）

06:59 🧠 When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance（推理何时真正奏效？对推理贡献度的受控研究）

07:31 📊 GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts（GSM8K-V：视觉语言模型能否解决视觉语境下的小学数学应用题？）

08:04 🖼 EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling（EditScore：借助高保真奖励建模解锁图像编辑在线强化学习）

08:54 🚀 SparseD: Sparse Attention for Diffusion Language Models（SparseD：面向扩散语言模型的稀疏注意力机制）

09:40 🎛 EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering（EasySteer：高性能可扩展LLM推理控制统一框架）

10:32 🧠 Towards Personalized Deep Research: Benchmarks and Evaluations（迈向个性化深度研究：基准与评估）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递