本期的 15 篇论文如下:
00:24 🧠 Qwen3-VL Technical Report(Qwen3-VL 技术报告)
00:57 🧠 PretrainZero: Reinforcement Active Pretraining(PretrainZero:强化主动预训练)
01:36 🎬 ViDiC: Video Difference Captioning(ViDiC:视频差异描述)
02:24 🧠 OneThinker: All-in-one Reasoning Model for Image and Video(OneThinker:面向图像与视频的全能推理模型)
03:07 🔄 Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation(重新思考文本到视觉生成中推理时扩展的提示设计)
03:59 ⚙ Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach(引导视觉-语言-动作模型作为反探索:一种测试时缩放方法)
04:46 🤖 SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL(SpaceTools:通过双重交互式强化学习实现工具增强的空间推理)
05:22 🔧 Thinking with Programming Vision: Towards a Unified View for Thinking with Images(以编程视觉思考:迈向图像思维的统一视角)
06:01 🔄 Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment(逆向流动:通过反向表征对齐改进标准化流)
06:51 🎮 RELIC: Interactive Video World Model with Long-Horizon Memory(RELIC:具备长时记忆的交互式视频世界模型)
07:34 🍳 CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation(CookAnything:灵活且一致的多步骤食谱图像生成框架)
08:26 🧠 SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment(SR-GRPO:将稳定秩作为大语言模型对齐的内在几何奖励)
09:01 📊 AlignBench: Benchmarking Fine-Grained Image-Text Alignment with Synthetic Image-Caption Pairs(AlignBench:基于合成图像-描述对评估细粒度图文对齐的基准)
09:38 🧠 SkillFactory: Self-Distillation For Learning Cognitive Behaviors(SkillFactory:用于学习认知行为的自蒸馏方法)
10:20 📱 UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs(UniQL:面向自适应边缘大语言模型的统一量化与低秩压缩)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
