2024.10.17 每日AI论文 | 视觉推理能力待提升,自中心视频理解需改进

2024.10.17 每日AI论文 | 视觉推理能力待提升,自中心视频理解需改进

14分钟 ·
播放数86
·
评论数0

本期的 19 篇论文如下:

00:28 🧠 HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks(HumanEval-V:通过编码任务评估大型多模态模型的视觉理解和推理能力)

01:15 🎥 VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI(VidEgoThink:评估具身AI的自中心视频理解能力)

01:50 🧠 The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio(多模态的诅咒:评估大型多模态模型在语言、视觉和音频中的幻觉)

02:31 🤖 Revealing the Barriers of Language Agents in Planning(揭示语言代理在规划中的障碍)

03:15 📄 DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception(DocLayout-YOLO:通过多样合成数据和全局到局部自适应感知增强文档布局分析)

03:56 ⚙ Large Language Model Evaluation via Matrix Nuclear-Norm(大型语言模型评估通过矩阵核范数)

04:38 🧬 Exploring Model Kinship for Merging Large Language Models(探索大型语言模型合并中的模型亲缘关系)

05:15 📊 ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs(ProSA:评估和理解大型语言模型的提示敏感性)

05:50 ⚡ ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression(ZipVL:动态令牌稀疏化和KV缓存压缩的高效大视觉-语言模型)

06:31 📄 Improving Long-Text Alignment for Text-to-Image Diffusion Models(改进文本到图像扩散模型的长文本对齐)

07:11 🔄 Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models(简化、稳定和扩展连续时间一致性模型)

07:55 🛡 Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements(可控安全对齐:推理时适应多样安全需求)

08:34 🔍 Tracking Universal Features Through Fine-Tuning and Model Merging(通过微调和模型合并追踪通用特征)

09:08 🔄 Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL(逆向洞察:通过逆向强化学习重构LLM训练目标)

09:46 🧠 Neural Metamorphosis(神经变形)

10:25 🌍 WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation(世界医学QA-V:多语言、多模态医学考试数据集用于多模态语言模型评估)

11:09 🌐 OMCAT: Omni Context Aware Transformer(全上下文感知变压器)

11:44 ⏳ ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains(ChroKnowledge:揭示语言模型在多领域中的时间知识)

12:22 📚 DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities(DyVo:动态词汇表用于实体学习的稀疏检索)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递