2024.10.18 每日AI论文 | AI评估标准化，电影生成模型领先。 - HuggingFace 每日AI论文速递

本期的 31 篇论文如下：

00:23 📊 MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures（MixEval-X：从现实世界数据混合中进行任意到任意评估）

01:02 🎥 Movie Gen: A Cast of Media Foundation Models（电影生成：媒体基础模型集合）

01:35 📱 MobA: A Two-Level Agent System for Efficient Mobile Task Automation（MobA：一种高效移动任务自动化的两级代理系统）

02:18 🌐 Harnessing Webpage UIs for Text-Rich Visual Understanding（利用网页UI进行丰富的视觉理解）

02:59 🔄 Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation（雅努斯：解耦视觉编码以实现统一的多模态理解和生成）

03:29 🩺 MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models（多功能多模态RAG系统在医学视觉语言模型中的应用）

04:04 📊 A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models（大规模模型后训练中Delta参数编辑的统一视角）

04:46 🔄 PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment（PopAlign：多样化对比模式以实现更全面的模型对齐）

05:23 🔍 BenTo: Benchmark Task Reduction with In-Context Transferability（BenTo: 基于上下文迁移性的基准任务缩减）

06:03 🎥 DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control（DreamVideo-2：零样本主题驱动视频定制与精确运动控制）

06:49 🧠 MoH: Multi-Head Attention as Mixture-of-Head Attention（MoH：多头部注意力机制作为混合头部注意力机制）

07:28 🎥 VidPanos: Generative Panoramic Videos from Casual Panning Videos（VidPanos：从随意拍摄的平移视频生成全景视频）

08:03 📉 FlatQuant: Flatness Matters for LLM Quantization（FlatQuant：扁平化对LLM量化的重要性）

08:44 🔄 Retrospective Learning from Interactions（从交互中回顾学习）

09:22 🔄 Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation（向前失败：利用合成数据和检索增强改进ASR的生成错误校正）

10:06 🖼 Can MLLMs Understand the Deep Implication Behind Chinese Images?（多模态大语言模型能否理解中文图像的深层含义？）

10:43 📱 MedMobile: A mobile-sized language model with expert-level clinical capabilities（MedMobile：具备专家级临床能力的移动端语言模型）

11:22 🌍 WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines（世界美食：多语言多文化视觉问答的大规模基准）

12:04 🤖 Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant（记住、检索与生成：理解无限视觉概念作为个性化助手）

12:48 🔄 LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning（LoLDU：通过下三角-对角-上三角分解实现低秩适应的参数高效微调）

13:29 🔒 AERO: Softmax-Only LLMs for Efficient Private Inference（AERO：仅使用Softmax的LLM实现高效隐私推断）

14:12 🌐 $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models（γ-MoD：探索多模态大语言模型的深度混合适应）

14:45 🌐 Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats（长序列大重建模型：广覆盖高斯点云）

15:24 🎶 MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization（MuVi：视频到音乐生成与语义对齐及节奏同步）

16:05 🔒 Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems（大型语言模型是否具备政治正确性？分析AI系统中的伦理偏见与越狱漏洞）

16:48 📚 SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation（基于模式教学和检索增强生成的数学应用题解决方法）

17:27 🗺 Roadmap towards Superhuman Speech Understanding using Large Language Models（基于大型语言模型的超人类语音理解路线图）

18:05 🔄 Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment（面向无指导的AR视觉生成的条件对比对齐）

18:47 🤖 TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration（TransAgent：异构代理协作迁移视觉语言基础模型）

19:25 🔬 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models（开放材料2024（OMat24）无机材料数据集与模型）

20:05 📚 Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key（最小调整解锁LLM长输出：高质量数据的关键）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递