2026.05.27 | 并行框解码提速十倍;空间评测揭示模型短板

2026.05.27 | 并行框解码提速十倍;空间评测揭示模型短板

15分钟 ·
播放数66
·
评论数2

【目录】
本期的 15 篇论文如下:
[00:24] 🎯 LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding(LocateAnything:基于并行框解码的快速高质量视觉-语言定位)
[01:13] 🧩 SpatialBench: Is Your Spatial Foundation Model an All-Round Player?(SpatialBench:你的空间基础模型是全能选手吗?)
[02:07] 🎬 EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation(EvalVerse:面向专业电影级视频生成的流水线感知与专家校准基准测试框架)
[03:06] 📱 MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research(MobileGym:一个可验证且高度并行的移动图形用户界面智能体研究仿真平台)
[04:05] 🏗 Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction(几何感知表示去噪:面向鲁棒的多视图三维重建)
[05:00] 🎬 LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV(LongAV-Compass:面向分钟级音视频生成的统一评估框架,涵盖文本到音视频、图像到音视频及视频到音视频)
[05:59] 🛡 $D^2$-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing(D²-Monitor:基于犹豫感知路由的扩散大语言模型动态安全监控)
[06:54] 🤖 The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence(MiniMax-M2系列:微型激活释放最大现实智能)
[07:51] 🤝 Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling(多分享,少搜索:面向高效测试时扩展的协作式并行思考)
[08:46] 🎬 Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration(Soap2Soap:基于多智能体协作的长篇影视视频重制)
[09:46] 👁 LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence(LLaVA-OneVision-2:迈向下一代感知智能)
[10:37] 🤖 VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions(VitaBench 2.0:评估长期用户交互中的个性化与主动型智能体)
[11:42] 👁 Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning(看见更多就意味着知道更多吗?面向多源视觉推理的单锚点优势归一化方法)
[12:34] 🔮 JLT: Clean-Latent Prediction in Latent Diffusion Transformers(JLT:潜在扩散Transformer中的干净潜在预测)
[13:17] 🧠 Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement(基于策略的内在知识边界增强的高效智能体强化学习)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

展开Show Notes
想提个建议,发现音色变化大概有两周左右。正文音色个人还是更倾向于之前的更轻快年轻的音色。早起听会觉得更抓人耳朵,听的过程会更沉浸。另外在同一篇文章中尽量还是统一用一个音色好一些~
拨号上网
:
非常好的建议,感谢您的支持❤️