2026.07.03 | 小模型本地化击败大模型;自主策略演化聚焦结构合成

2026.07.03 | 小模型本地化击败大模型;自主策略演化聚焦结构合成

14分钟 ·
播放数47
·
评论数0

【赞助商】
OpenClaw快报
每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论
传送门 www.xiaoyuzhoufm.com

【目录】
本期的 15 篇论文如下:

[00:31] 🧩 Program-as-Weights: A Programming Paradigm for Fuzzy Functions(程序即权重:面向模糊函数的编程范式)
[01:24] 🧠 EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments(EvoPolicyGym:在交互环境中评估自主策略演化)
[02:24] 🧠 AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents(AgenticSTS:面向长时程LLM智能体的有界内存测试平台)
[03:18] 🔍 Morphing into Hybrid Attention Models(变形为混合注意力模型)
[04:07] 📊 AgenticDataBench: A Comprehensive Benchmark for Data Agents(AgenticDataBench:面向数据智能体的综合性基准测试)
[05:12] ⚡ Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling(多分辨率流匹配:通过分阶段采样的无训练扩散加速)
[05:52] 🎬 WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory(世界导演:构建具有持久动态记忆的可控世界模拟器)
[06:49] 🏥 Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning(打破失败级联:面向医学多模态推理的步骤感知强化学习)
[07:37] 🎨 Optimizing Visual Generative Models via Distribution-wise Rewards(通过分布级奖励优化视觉生成模型)
[08:32] 🎯 SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use(SkillCoach:用于评估和增强智能体技能使用的自我演化评分标准)
[09:21] 🖐 AGVBench: A Reliability-Oriented Benchmark of Data Augmentation for Vein Recognition(AGVBench:面向静脉识别的可靠性导向数据增强基准)
[10:16] 🔬 From SRA to Self-Flow: Data Augmentation or Self-Supervision?(从SRA到Self-Flow:数据增强还是自监督?)
[11:10] 🧠 Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads(对数几率贡献评分识别非字面检索头)
[11:58] 🎯 AnyGroundBench: A Specialized-Domain Benchmark for Video Grounding in Vision-Language Models(AnyGroundBench:面向视觉语言模型中视频定位的专业领域基准)
[12:46] 🤔 When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search(搜索代理何时应提问:面向澄清感知的深度搜索基准DiscoBench)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递