2025.08.28 | 推理分解减幻觉；可解释性编码信息 - HuggingFace 每日AI论文速递

本期的 14 篇论文如下：

00:25 🧠 Self-Rewarding Vision-Language Model via Reasoning Decomposition（通过推理分解的自奖励视觉语言模型）

00:49 🔍 Beyond Transcription: Mechanistic Interpretability in ASR（超越转录：自动语音识别中的机械可解释性）

01:22 🤖 Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies（离散扩散VLA：将离散扩散引入视觉-语言-动作策略中的动作解码）

01:52 🧠 CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning（CODA：基于解耦强化学习的双脑计算机代理协调大脑与小脑）

02:19 🤖 MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation（MIDAS：通过实时自回归视频生成的多模态交互式数字人合成）

02:51 🔮 Predicting the Order of Upcoming Tokens Improves Language Modeling（预测未来token顺序提升语言建模效果）

03:20 💓 Gaze into the Heart: A Multi-View Video Dataset for rPPG and Health Biomarkers Estimation（凝视心脏：用于rPPG和健康生物标志物估计的多视角视频数据集）

03:52 ⚡ Diffusion Language Models Know the Answer Before Decoding（扩散语言模型在解码前就知道答案）

04:16 👁 Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents（当心第三只眼！MLLM驱动的智能手机代理中的隐私意识基准测试）

04:38 🎧 AudioStory: Generating Long-Form Narrative Audio with Large Language Models（AudioStory：使用大型语言模型生成长篇叙事音频）

05:01 🧠 StepWiser: Stepwise Generative Judges for Wiser Reasoning（StepWiser：逐步生成式评判器以实现更明智的推理）

05:25 🔄 Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference（驯服混沌：异构与解耦大语言模型推理的协调自动扩展）

05:53 💃 MotionFlux: Efficient Text-Guided Motion Generation through Rectified Flow Matching and Preference Alignment（MotionFlux：基于整流流匹配和偏好优化的高效文本引导运动生成）

06:18 📊 DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis（DeepScholar-Bench：用于生成式研究综合的实时基准与自动化评估）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递