2026.06.24 | Qwen-AgentWorld超越GPT-5.4;NatureBench揭示AI创新瓶颈

2026.06.24 | Qwen-AgentWorld超越GPT-5.4;NatureBench揭示AI创新瓶颈

15分钟 ·
播放数77
·
评论数0

【赞助商】
OpenClaw快报
每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论
传送门 www.xiaoyuzhoufm.com

【目录】
本期的 15 篇论文如下:

[00:31] 🌍 Qwen-AgentWorld: Language World Models for General Agents(Qwen-AgentWorld:面向通用智能体的语言世界模型)
[01:28] 🧪 NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?(NatureBench:编码智能体能否复现《自然》系列论文的已发表SOTA?)
[02:24] 🤖 AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction(AOHP:面向个性化、高效与安全交互的开源操作系统级智能体框架)
[03:16] 📱 MobileForge: Annotation-Free Adaptation for Mobile GUI Agents with Hierarchical Feedback-Guided Policy Optimization(移动锻造:基于分层反馈引导策略优化的无标注移动GUI智能体自适应方法)
[04:04] 🤖 MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management(MemGUI-Agent:一种具有主动上下文管理的端到端长时移动GUI智能体)
[04:57] 🧠 LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis(灵犀诊断基准:用于评估大语言模型中文精神科咨询与诊断能力的多智能体框架)
[06:01] 🔒 FedOT: Ownership Verification and Leakage Tracing via Watermarks for Federated LDMs(联邦扩散模型的所有权验证与泄漏追踪水印方法)
[06:57] 🧠 OpenThoughts-Agent: Data Recipes for Agentic Models(开放思维智能体:用于智能体模型的数据配方)
[08:03] 🤖 Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning(逃离自我确认陷阱:一种用于智能体经验学习的执行-提炼-验证范式)
[09:02] 🔺 FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation(FLAT:用于几何精确场景生成的前馈潜在三角面片喷溅)
[09:58] 🦃 Are Text-to-Image Models Inductivist Turkeys? A Counterfactual Benchmark for Causal Reasoning(文本到图像模型是归纳主义的火鸡吗?一个用于因果推理的反事实基准)
[11:03] 🧪 DiffusionBench: On Holistic Evaluation of Diffusion Transformers(DiffusionBench:扩散变换器的全面评估基准)
[11:49] 🚗 FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning(FlowR2A:学习多模态驾驶规划中奖励到动作的分布)
[12:53] 🔍 DREAM: Dense Retrieval Embeddings via Autoregressive Modeling(DREAM:通过自回归建模实现密集检索嵌入)
[13:49] 🔍 ReMMD: Realistic Multilingual Multi-Image Agentic Verification for Multimodal Misinformation Detection(ReMMD:面向多模态 misinformation 检测的真实多语言多图像智能验证框架)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递