2025.12.04 | Qwen3-VL多模态超长上下文；PretrainZero强化主动预训练 - HuggingFace 每日AI论文速递

本期的 15 篇论文如下：

00:24 🧠 Qwen3-VL Technical Report（Qwen3-VL 技术报告）

00:57 🧠 PretrainZero: Reinforcement Active Pretraining（PretrainZero：强化主动预训练）

01:36 🎬 ViDiC: Video Difference Captioning（ViDiC：视频差异描述）

02:24 🧠 OneThinker: All-in-one Reasoning Model for Image and Video（OneThinker：面向图像与视频的全能推理模型）

03:07 🔄 Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation（重新思考文本到视觉生成中推理时扩展的提示设计）

03:59 ⚙ Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach（引导视觉-语言-动作模型作为反探索：一种测试时缩放方法）

04:46 🤖 SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL（SpaceTools：通过双重交互式强化学习实现工具增强的空间推理）

05:22 🔧 Thinking with Programming Vision: Towards a Unified View for Thinking with Images（以编程视觉思考：迈向图像思维的统一视角）

06:01 🔄 Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment（逆向流动：通过反向表征对齐改进标准化流）

06:51 🎮 RELIC: Interactive Video World Model with Long-Horizon Memory（RELIC：具备长时记忆的交互式视频世界模型）

07:34 🍳 CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation（CookAnything：灵活且一致的多步骤食谱图像生成框架）

08:26 🧠 SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment（SR-GRPO：将稳定秩作为大语言模型对齐的内在几何奖励）

09:01 📊 AlignBench: Benchmarking Fine-Grained Image-Text Alignment with Synthetic Image-Caption Pairs（AlignBench：基于合成图像-描述对评估细粒度图文对齐的基准）

09:38 🧠 SkillFactory: Self-Distillation For Learning Cognitive Behaviors（SkillFactory：用于学习认知行为的自蒸馏方法）

10:20 📱 UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs（UniQL：面向自适应边缘大语言模型的统一量化与低秩压缩）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递