本期的 15 篇论文如下:
00:20 🚀 STEP3-VL-10B Technical Report(STEP3-VL-10B 技术报告)
01:01 🏙 Urban Socio-Semantic Segmentation with Vision-Language Reasoning(基于视觉语言推理的城市社会语义分割)
01:42 💡 Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs(奖励罕见:面向LLM创造性问题解决的独特性感知强化学习)
02:33 🤖 Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning(用于推理的协作式多智能体测试时强化学习)
03:14 🧬 Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning(超越静态工具:面向科学推理的测试时工具演化)
03:59 📊 DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset(丹青:一个最新的大规模中文视觉语言预训练数据集)
04:39 🎨 CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation(CoF-T2I:将视频模型作为纯视觉推理器用于文本到图像生成)
05:33 🧠 Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering(迈向超长视野的代理科学:机器学习工程中的认知积累)
06:12 🤔 Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders(先思后生:基于大语言模型编码器的推理感知文本到图像扩散方法)
06:48 🔧 MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching(MatchTIR:通过二分图匹配实现工具集成推理的细粒度监督)
07:29 🛡 A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5(关于GPT-5.2、Gemini 3 Pro、Qwen3-VL、Doubao 1.8、Grok 4.1 Fast、Nano Banana Pro和Seedream 4.5的安全性报告)
08:09 🛡 ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback(ToolSafe:通过主动的步骤级护栏与反馈增强基于LLM的智能体的工具调用安全性)
08:59 🎬 FlowAct-R1: Towards Interactive Humanoid Video Generation(FlowAct-R1:迈向交互式人形视频生成)
09:39 🎨 VIBE: Visual Instruction Based Editor(VIBE:基于视觉指令的编辑器)
10:09 ⚡ Transition Matching Distillation for Fast Video Generation(用于快速视频生成的过渡匹配蒸馏)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
