2026.04.10 | 群体外挂让AI升级;注意力内鬼兑现数字

2026.04.10 | 群体外挂让AI升级;注意力内鬼兑现数字

13分钟 ·
播放数191
·
评论数0

【赞助商】

通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事

传送门 🔗www.xiaoyuzhoufm.com

【目录】

本期的 15 篇论文如下:

00:33 🧬 SkillClaw: Let Skills Evolve Collectively with Agentic Evolver(SkillClaw:让技能在智能体演化器中集体进化)

01:24 🔢 When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models(当数字说话:在文本到视频扩散模型中实现文本数字与视觉实例的对齐)

02:22 🎨 MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping(MegaStyle:通过一致的文本到图像风格映射构建多样且可扩展的风格数据集)

03:15 🤖 HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents(HY-Embodied-0.5:面向现实世界智能体的具身基础模型)

04:07 🧠 Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability(重新审视推理监督微调中的泛化问题:关于优化、数据与模型能力的条件性分析)

04:52 🤖 ClawBench: Can AI Agents Complete Everyday Online Tasks?(ClawBench:AI智能体能否完成日常在线任务?)

05:31 📱 KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation(KnowU-Bench:迈向交互式、主动式与个性化的移动代理评估)

06:18 🧠 Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering(LLM智能体中的外部化:对记忆、技能、协议与治理工程的一体化综述)

07:09 🎭 LPM 1.0: Video-based Character Performance Model(LPM 1.0:基于视频的角色表演模型)

07:58 🧠 OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence(OpenSpatial:一个赋能空间智能的原则性数据引擎)

08:50 🧠 Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models(明智行动:在智能多模态模型中培养元认知工具使用能力)

09:41 ⚡ DMax: Aggressive Parallel Decoding for dLLMs(DMax:面向扩散语言模型的激进并行解码)

10:20 🧠 Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills(技能图谱:面向海量智能体技能的依赖感知结构化检索方法)

11:02 🧩 OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering(OmniJigsaw:通过模态编排重排序增强全模态推理)

11:41 🧠 OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks(OpenVLThinkerV2:一个面向多领域视觉任务的通用多模态推理模型)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递