2024.10.17 每日AI论文 | 视觉推理能力待提升，自中心视频理解需改进 - HuggingFace 每日AI论文速递

本期的 19 篇论文如下：

00:28 🧠 HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks（HumanEval-V：通过编码任务评估大型多模态模型的视觉理解和推理能力）

01:15 🎥 VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI（VidEgoThink：评估具身AI的自中心视频理解能力）

01:50 🧠 The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio（多模态的诅咒：评估大型多模态模型在语言、视觉和音频中的幻觉）

02:31 🤖 Revealing the Barriers of Language Agents in Planning（揭示语言代理在规划中的障碍）

03:15 📄 DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception（DocLayout-YOLO：通过多样合成数据和全局到局部自适应感知增强文档布局分析）

03:56 ⚙ Large Language Model Evaluation via Matrix Nuclear-Norm（大型语言模型评估通过矩阵核范数）

04:38 🧬 Exploring Model Kinship for Merging Large Language Models（探索大型语言模型合并中的模型亲缘关系）

05:15 📊 ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs（ProSA：评估和理解大型语言模型的提示敏感性）

05:50 ⚡ ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression（ZipVL：动态令牌稀疏化和KV缓存压缩的高效大视觉-语言模型）

06:31 📄 Improving Long-Text Alignment for Text-to-Image Diffusion Models（改进文本到图像扩散模型的长文本对齐）

07:11 🔄 Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models（简化、稳定和扩展连续时间一致性模型）

07:55 🛡 Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements（可控安全对齐：推理时适应多样安全需求）

08:34 🔍 Tracking Universal Features Through Fine-Tuning and Model Merging（通过微调和模型合并追踪通用特征）

09:08 🔄 Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL（逆向洞察：通过逆向强化学习重构LLM训练目标）

09:46 🧠 Neural Metamorphosis（神经变形）

10:25 🌍 WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation（世界医学QA-V：多语言、多模态医学考试数据集用于多模态语言模型评估）

11:09 🌐 OMCAT: Omni Context Aware Transformer（全上下文感知变压器）

11:44 ⏳ ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains（ChroKnowledge：揭示语言模型在多领域中的时间知识）

12:22 📚 DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities（DyVo：动态词汇表用于实体学习的稀疏检索）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递