本期的 14 篇论文如下:
00:23 🤖 OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference(OmniAlign-V:迈向多模态大语言模型与人类偏好增强对齐)
01:06 ⚡ SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference(SpargeAttn:准确稀疏注意力加速任意模型推理)
01:53 🖼 KV-Edit: Training-Free Image Editing for Precise Background Preservation(KV-编辑:无需训练的图像编辑方法,实现精确背景保留)
02:32 🌈 ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation(匿名区域变换器:可变多层透明图像生成)
03:08 🤖 SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution(SWE-RL:通过开源软件演化数据强化学习提升LLM推理能力)
03:51 📊 Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective(揭示大语言模型下游性能扩展:基于聚类的视角)
04:30 🧠 Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models(尺度分布解耦:实现大型语言模型稳定有效训练)
05:11 🔄 K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs(K-LoRA:解锁无需训练的任意主题和风格LoRA融合)
05:51 🌐 WebGames: Challenging General-Purpose Web-Browsing AI Agents(WebGames:挑战通用网页浏览AI代理)
06:29 🧠 Introducing Visual Perception Token into Multimodal Large Language Model(引入视觉感知令牌的多模态大语言模型)
07:07 🎰 The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?(彩票LLM假说:重新思考LLM压缩应保留的能力)
07:47 🧠 AAD-LLM: Neural Attention-Driven Auditory Scene Understanding(AAD-LLM:神经注意力驱动的听觉场景理解)
08:26 🔍 LaTIM: Measuring Latent Token-to-Token Interactions in Mamba Models(LaTIM:测量Mamba模型中的潜在Token-to-Token交互)
09:07 🧠 Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI(Shakti-VLMs:企业级AI的可扩展视觉语言模型)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
