2025.12.22 | PhysBrain用第一人称视频让AI学会动手；大模型离科学家AI还差得远 - HuggingFace 每日AI论文速递

本期的 15 篇论文如下：

00:24 🧠 PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence（PhysBrain：以人类第一人称数据为桥梁，从视觉语言模型迈向物理智能）

01:05 🔬 Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows（通过科学家对齐的工作流程探究大语言模型的科学通用智能）

01:34 🧠 When Reasoning Meets Its Laws（当推理遇见其定律）

02:16 🧠 Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience（Seed-Prover 1.5：通过经验学习掌握本科级定理证明）

03:02 🧠 4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation（4D-RGPT：通过感知蒸馏实现区域级4D理解）

03:51 🎨 Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing（语义与重建皆重要：让表征编码器为文本到图像生成与编辑做好准备）

04:30 ⚖ Are We on the Right Way to Assessing LLM-as-a-Judge?（我们评估LLM作为评判者的方法正确吗？）

05:05 📡 RadarGen: Automotive Radar Point Cloud Generation from Cameras（RadarGen：基于摄像头的汽车雷达点云生成）

05:54 🔬 Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers（语言模型的物理学：第4.1部分，架构设计与Canon层的魔力）

06:41 🎬 HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering（HERBench：视频问答中多证据整合的基准测试）

07:26 🔍 GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation（GroundingME：通过多维评估揭示MLLMs中的视觉基础能力差距）

08:06 ⚙ SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories（SWE-Bench++：一种从开源仓库可扩展生成软件工程基准的框架）

08:39 🧠 Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs（Turn-PPO：基于回合级优势估计与PPO的智能体大语言模型多轮强化学习优化）

09:14 ⚡ StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models（StageVAR：面向视觉自回归模型的阶段感知加速）

09:48 🤖 An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges（视觉-语言-动作模型剖析：从模块、里程碑到挑战）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递