2025.11.19 | 像素演员难推理；视觉误导测真章 - HuggingFace 每日AI论文速递

本期的 11 篇论文如下：

00:23 🧠 Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark（世界模拟器会推理吗？Gen-ViRe生成式视觉推理基准）

01:03 🕵 MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs（MVI-Bench：评估大型视觉语言模型对误导性视觉输入鲁棒性的综合基准）

01:49 🎞 REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding（REVISOR：超越文本反思，迈向长视频理解中的多模态内省推理）

03:02 🧪 ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning（ATLAS：面向通用人工智能的高难度跨学科科学推理基准）

03:43 🔍 Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework（大语言模型遇上极端多标签分类：可扩展多模态框架）

04:16 🤖 Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning（Agent-R1：以端到端强化学习训练强大语言模型智能体）

05:02 🤖 Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution（Orion：统一视觉智能体，实现多模态感知、高级视觉推理与执行）

05:32 ⚖ Mitigating Label Length Bias in Large Language Models（缓解大语言模型中的标签长度偏差）

06:14 🧠 Agent READMEs: An Empirical Study of Context Files for Agentic Coding（智能体README：面向代理编程的上下文文件实证研究）

06:49 🎧 Proactive Hearing Assistants that Isolate Egocentric Conversations（主动式听力助手：以自我为中心的对话自动分离技术）

07:20 🎯 Error-Driven Scene Editing for 3D Grounding in Large Language Models（面向3D大模型的误差驱动场景编辑实现精准视觉定位）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递