【目录】
本期的 15 篇论文如下:
[] 🎯 LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding(LocateAnything:基于并行框解码的快速高质量视觉-语言定位)
[] 🧩 SpatialBench: Is Your Spatial Foundation Model an All-Round Player?(SpatialBench:你的空间基础模型是全能选手吗?)
[] 🎬 EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation(EvalVerse:面向专业电影级视频生成的流水线感知与专家校准基准测试框架)
[] 📱 MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research(MobileGym:一个可验证且高度并行的移动图形用户界面智能体研究仿真平台)
[] 🏗 Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction(几何感知表示去噪:面向鲁棒的多视图三维重建)
[] 🎬 LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV(LongAV-Compass:面向分钟级音视频生成的统一评估框架,涵盖文本到音视频、图像到音视频及视频到音视频)
[] 🛡 $D^2$-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing(D²-Monitor:基于犹豫感知路由的扩散大语言模型动态安全监控)
[] 🤖 The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence(MiniMax-M2系列:微型激活释放最大现实智能)
[] 🤝 Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling(多分享,少搜索:面向高效测试时扩展的协作式并行思考)
[] 🎬 Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration(Soap2Soap:基于多智能体协作的长篇影视视频重制)
[] 👁 LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence(LLaVA-OneVision-2:迈向下一代感知智能)
[] 🤖 VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions(VitaBench 2.0:评估长期用户交互中的个性化与主动型智能体)
[] 👁 Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning(看见更多就意味着知道更多吗?面向多源视觉推理的单锚点优势归一化方法)
[] 🔮 JLT: Clean-Latent Prediction in Latent Diffusion Transformers(JLT:潜在扩散Transformer中的干净潜在预测)
[] 🧠 Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement(基于策略的内在知识边界增强的高效智能体强化学习)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
