本期的 14 篇论文如下:
00:25 🧠 Self-Rewarding Vision-Language Model via Reasoning Decomposition(通过推理分解的自奖励视觉语言模型)
00:49 🔍 Beyond Transcription: Mechanistic Interpretability in ASR(超越转录:自动语音识别中的机械可解释性)
01:22 🤖 Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies(离散扩散VLA:将离散扩散引入视觉-语言-动作策略中的动作解码)
01:52 🧠 CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning(CODA:基于解耦强化学习的双脑计算机代理协调大脑与小脑)
02:19 🤖 MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation(MIDAS:通过实时自回归视频生成的多模态交互式数字人合成)
02:51 🔮 Predicting the Order of Upcoming Tokens Improves Language Modeling(预测未来token顺序提升语言建模效果)
03:20 💓 Gaze into the Heart: A Multi-View Video Dataset for rPPG and Health Biomarkers Estimation(凝视心脏:用于rPPG和健康生物标志物估计的多视角视频数据集)
03:52 ⚡ Diffusion Language Models Know the Answer Before Decoding(扩散语言模型在解码前就知道答案)
04:16 👁 Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents(当心第三只眼!MLLM驱动的智能手机代理中的隐私意识基准测试)
04:38 🎧 AudioStory: Generating Long-Form Narrative Audio with Large Language Models(AudioStory:使用大型语言模型生成长篇叙事音频)
05:01 🧠 StepWiser: Stepwise Generative Judges for Wiser Reasoning(StepWiser:逐步生成式评判器以实现更明智的推理)
05:25 🔄 Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference(驯服混沌:异构与解耦大语言模型推理的协调自动扩展)
05:53 💃 MotionFlux: Efficient Text-Guided Motion Generation through Rectified Flow Matching and Preference Alignment(MotionFlux:基于整流流匹配和偏好优化的高效文本引导运动生成)
06:18 📊 DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis(DeepScholar-Bench:用于生成式研究综合的实时基准与自动化评估)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
