本期的 15 篇论文如下:
00:25 🔍 InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields(InfiniDepth:基于神经隐式场的任意分辨率与细粒度深度估计)
01:07 🎙 MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization(MOSS转录与说话人分离:带说话人归属和时间戳的准确转录)
01:46 🔬 SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence(SciEvalKit:一个用于科学通用智能的开源评估工具包)
02:32 🎬 LTX-2: Efficient Joint Audio-Visual Foundation Model(LTX-2:高效的联合视听基础模型)
03:26 🦄 UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision(UniCorn:通过自生成监督实现自改进统一多模态模型)
04:06 🎨 DreamStyle: A Unified Framework for Video Stylization(DreamStyle:视频风格化的统一框架)
04:38 🧠 CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving(CogFlow:通过知识内化桥接感知与推理,用于视觉数学问题求解)
05:25 ⚡ MiMo-V2-Flash Technical Report(MiMo-V2-Flash 技术报告)
06:15 🎮 NitroGen: An Open Foundation Model for Generalist Gaming Agents(NitroGen:通用游戏智能体的开放基础模型)
06:58 🤖 SOP: A Scalable Online Post-Training System for Vision-Language-Action Models(SOP:一种可扩展的视觉-语言-动作模型在线后训练系统)
07:43 🛡 OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs(OpenRT:一个用于多模态大语言模型的开源红队测试框架)
08:31 📍 The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization(声纳时刻:音频语言模型在音频地理定位中的基准测试)
09:14 🔍 X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework(X-MuTeST:一个用于可解释仇恨言论检测的多语言基准及一种新颖的LLM咨询解释框架)
09:57 🧠 Parallel Latent Reasoning for Sequential Recommendation(并行潜在推理用于序列推荐)
10:27 🤖 WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks(WebGym:利用真实任务扩展视觉网络代理的训练环境)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
