2024.10.04 每日AI论文 | 字幕类型影响模型表现，长视频生成技术突破。 - HuggingFace 每日AI论文速递

本期的 19 篇论文如下：

00:24 🔄 Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models（重新审视大规模图像-文本数据在多模态基础模型预训练中的作用）

01:04 🎥 Loong: Generating Minute-level Long Videos with Autoregressive Language Models（使用自回归语言模型生成分钟级长视频）

01:39 🎥 Video Instruction Tuning With Synthetic Data（使用合成数据进行视频指令调优）

02:18 🧐 LLaVA-Critic: Learning to Evaluate Multimodal Models（LLaVA-Critic：学习评估多模态模型）

02:56 🔍 Contrastive Localized Language-Image Pre-Training（对比本地化语言-图像预训练）

03:31 🌱 VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment（VinePPO：通过精细化的信用分配解锁LLM推理的RL潜力）

04:07 🌟 Depth Pro: Sharp Monocular Metric Depth in Less Than a Second（Depth Pro：不到一秒内实现锐利的单目度量深度）

04:51 🔗 Large Language Models as Markov Chains（大型语言模型作为马尔可夫链）

05:26 🧠 CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling（CLIP-MoE：通过多样化多重升级构建CLIP的专家混合模型）

06:03 🔄 Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models（消除扩散模型中高指导尺度引起的过饱和和伪影）

06:51 🔄 Training Language Models on Synthetic Edit Sequences Improves Code Synthesis（在合成编辑序列上训练语言模型改进代码合成）

07:36 ⚡ SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration（SageAttention：用于即插即用推理加速的精确8位注意力机制）

08:14 🌐 MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis（MVGS：多视角调节的高斯喷射用于新视角合成）

08:54 📚 L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?（L-CiteEval：长上下文模型是否真正利用上下文进行响应？）

09:38 🩺 MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation（利用预训练大型语言模型层增强医学图像分割）

10:24 🎥 Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos（Vinoground: 通过短视频密集时间推理审视大型多模态模型）

11:01 🗣 Distilling an End-to-End Voice Assistant Without Instruction Training Data（无需指令训练数据的端到端语音助手蒸馏）

11:46 ♟ Learning the Latent Rules of a Game from Data: A Chess Story（从数据中学习游戏的潜在规则：一个国际象棋的故事）

12:29 🎵 Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data（Synthio：使用合成数据增强小规模音频分类数据集）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递