本期的 43 篇论文如下:
00:23 🤖 GLEE: A Unified Framework and Benchmark for Language-based Economic Environments(GLEE:基于语言的经济环境统一框架与基准)
01:09 👤 Personalized Visual Instruction Tuning(个性化视觉指令微调)
01:48 🌍 Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation(迈向世界模拟器:基于物理常识的视频生成基准)
02:35 🖼 IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation(迭代组合感知反馈学习:从模型库中提升文本到图像生成)
03:17 🔍 Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate(解码大型视觉语言模型中的跨模态对齐与模态集成率)
03:54 🌐 Aria: An Open Multimodal Native Mixture-of-Experts Model(Aria:一个开放的多模态原生混合专家模型)
04:29 🌐 Pixtral 12B(Pixtral 12B)
05:09 🎥 Pyramidal Flow Matching for Efficient Video Generative Modeling(金字塔流匹配用于高效视频生成建模)
05:49 🔗 Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning(揭示视觉表示学习中的骨干-优化器耦合偏差)
06:29 🎥 MM-Ego: Towards Building Egocentric Multimodal LLMs(MM-Ego:构建以自我为中心的多模态大型语言模型)
07:07 🔄 One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation(一种初始化方法统治所有:通过解释方差适应进行微调)
07:51 📖 Story-Adapter: A Training-free Iterative Framework for Long Story Visualization(故事适配器:一种无需训练的迭代框架用于长故事可视化)
08:33 🚀 Self-Boosting Large Language Models with Synthetic Preference Data(利用合成偏好数据自我提升大型语言模型)
09:13 🚀 Falcon Mamba: The First Competitive Attention-free 7B Language Model(猎鹰曼巴:首个无注意力机制的7B语言模型)
09:53 🎨 TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation(TweedieMix:改进基于扩散的图像/视频生成中的多概念融合)
10:24 ⏳ Temporal Reasoning Transfer from Text to Video(从文本到视频的时间推理迁移)
10:54 🎥 TRACE: Temporal Grounding Video LLM via Causal Event Modeling(TRACE:通过因果事件建模实现视频时间定位的大型语言模型)
11:30 📊 Data Selection via Optimal Control for Language Models(通过最优控制进行语言模型数据选择)
12:07 🤖 Response Tuning: Aligning Large Language Models without Instruction(响应调优:无需指令对齐大型语言模型)
12:49 🤖 CursorCore: Assist Programming through Aligning Anything(CursorCore:通过对齐任何内容辅助编程)
13:36 🎥 ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler(ViBiDSampler:利用双向扩散采样器增强视频插值)
14:16 🗣 Mixed-Session Conversation with Egocentric Memory(带有自我中心记忆的混合会话)
14:57 🎮 ING-VP: MLLMs cannot Play Easy Vision-based Games Yet(ING-VP:多模态大语言模型在视觉游戏中的表现仍不尽人意)
15:41 🔓 AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs(AutoDAN-Turbo:一种用于策略自我探索以破解LLMs的终身代理)
16:26 🎥 T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design(T2V-Turbo-v2:通过数据、奖励和条件引导设计增强视频生成模型后训练)
17:00 📖 Collective Critics for Creative Story Generation(创意故事生成的集体批评框架)
17:36 🎵 Diversity-Rewarded CFG Distillation(多样性奖励的CFG蒸馏)
18:16 🧠 Retrieval-Augmented Decision Transformer: External Memory for In-context RL(检索增强决策变压器:上下文强化学习的外部记忆)
18:57 🎙 F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching(F5-TTS:基于流匹配生成流畅且忠实语音的童话生成器)
19:32 🎹 FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance(《致爱丽丝:捕捉并物理合成钢琴演奏手部动作》)
20:20 🧠 Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning(整体遗忘基准:文本到图像扩散模型遗忘的多方面评估)
21:01 🧬 Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning(多模态大语言模型用于逆向分子设计与逆合成规划)
21:38 🎥 BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way(BroadWay:无需训练提升文本到视频生成模型)
22:21 🚨 Multimodal Situational Safety(多模态情境安全)
22:56 💥 Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders(幻觉AI劫持攻击:大型语言模型与恶意代码推荐器)
23:38 🛠 Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach(Seeker:利用基于LLM的多代理方法增强代码中的异常处理)
24:18 🌐 Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control(联合生成多视角一致的PBR纹理:协作控制方法)
24:55 🤖 TinyEmo: Scaling down Emotional Reasoning via Metric Projection(TinyEmo:通过度量投影缩小情感推理)
25:29 🧠 MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders(心理竞技场:通过自我对弈训练语言模型用于心理健康障碍的诊断与治疗)
26:08 🎭 TextToon: Real-Time Text Toonify Head Avatar from Single Video(文本转卡通:从单视频实时生成卡通化头部虚拟形象)
26:49 🤖 Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA(伟大的思想是否一致?探究CAIMIRA框架下的人机问答互补性)
27:28 📊 MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering(MLE-bench:评估机器学习代理在机器学习工程中的表现)
28:03 🧠 Does Spatial Cognition Emerge in Frontier Models?(空间认知在前沿模型中是否出现?)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
