【赞助商】
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗www.xiaoyuzhoufm.com
【目录】
本期的 15 篇论文如下:
00:29 🧭 Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models(万物归位:文本到图像模型空间智能基准测试)
01:21 🧠 Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives(Idea2Story:将研究概念转化为完整科学叙事的自动化流程)
02:19 ⚡ Scaling Embeddings Outperforms Scaling Experts in Language Models(在语言模型中扩展嵌入层优于扩展专家混合)
02:58 🔍 OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models(OCRVerse:迈向端到端视觉语言模型中的整体OCR)
03:39 🤖 DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation(DynamicVLA:面向动态物体操作的视觉-语言-动作模型)
04:33 🧠 MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods(MMFineReason:通过开放数据为中心的方法弥合多模态推理鸿沟)
05:20 🔺 PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D Reconstruction(PLANING:一种用于流式三维重建的松散耦合三角-高斯框架)
06:08 🧠 ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation(ConceptMoE:面向隐式计算分配的自适应令牌到概念压缩)
07:01 🧩 AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts(AgentLongBench:通过环境推演实现可控的长上下文智能体基准测试)
07:43 🧠 Exploring Reasoning Reward Model for Agents(探索智能体推理奖励模型)
08:39 🎤 Qwen3-ASR Technical Report(Qwen3-ASR技术报告)
09:27 🚀 Language-based Trial and Error Falls Behind in the Era of Experience(经验时代下基于语言的试错方法已然落后)
10:16 🌐 Typhoon-S: Minimal Open Post-Training for Sovereign Large Language Models(台风-S:主权大语言模型的最小化开放后训练方法)
11:02 ⚡ Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening(可扩展的幂采样:通过分布锐化解锁LLM高效、免训练推理)
11:59 🧠 MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models(MAD:模态自适应解码用于缓解多模态大语言模型中的跨模态幻觉)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
