2024.10.28 每日AI论文 | 视觉-时间提示提升交互,连续扩散模型优化语音合成

2024.10.28 每日AI论文 | 视觉-时间提示提升交互,连续扩散模型优化语音合成

9分钟 ·
播放数95
·
评论数0

本期的 13 篇论文如下:

00:25 🚀 ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting(ROCKET-1:利用视觉-时间上下文提示掌握开放世界交互)

01:14 🗣 Continuous Speech Synthesis using per-token Latent Diffusion(基于每标记潜在扩散的连续语音合成)

01:55 ⚡ Teach Multimodal LLMs to Comprehend Electrocardiographic Images(教授多模态大语言模型理解心电图图像)

02:39 🌐 Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data(无限多模态:通过大规模高质量指令数据扩展多模态性能)

03:23 ⚡ FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality(FasterCache:无训练视频扩散模型加速与高质量生成)

03:56 🎧 MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark(大规模多任务音频理解与推理基准)

04:34 🧠 Counting Ability of Large Language Models and Impact of Tokenization(大型语言模型的计数能力及其对分词的影响)

05:08 🧠 Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning(通过先决学习利用虚构合成数据提升LLM事实性)

05:46 🤖 Reflection-Bench: probing AI intelligence with reflection(反射-基准:通过反射探测AI智能)

06:23 🤖 Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback(混合偏好:学习路由实例以进行人机反馈)

06:57 🔍 Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration(利用未标注的先验数据进行高效在线探索)

07:35 🔍 Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance(LLM是否优于报告?检测标签错误并减轻其对模型性能的影响)

08:15 🤖 Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling(基于图神经网络的动态三维高斯跟踪用于神经动力学建模)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递