2024.10.10 每日AI论文 | LLMs经济游戏表现各异，个性化视觉指令提升AI互动。 - HuggingFace 每日AI论文速递

本期的 43 篇论文如下：

00:23 🤖 GLEE: A Unified Framework and Benchmark for Language-based Economic Environments（GLEE：基于语言的经济环境统一框架与基准）

01:09 👤 Personalized Visual Instruction Tuning（个性化视觉指令微调）

01:48 🌍 Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation（迈向世界模拟器：基于物理常识的视频生成基准）

02:35 🖼 IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation（迭代组合感知反馈学习：从模型库中提升文本到图像生成）

03:17 🔍 Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate（解码大型视觉语言模型中的跨模态对齐与模态集成率）

03:54 🌐 Aria: An Open Multimodal Native Mixture-of-Experts Model（Aria：一个开放的多模态原生混合专家模型）

04:29 🌐 Pixtral 12B（Pixtral 12B）

05:09 🎥 Pyramidal Flow Matching for Efficient Video Generative Modeling（金字塔流匹配用于高效视频生成建模）

05:49 🔗 Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning（揭示视觉表示学习中的骨干-优化器耦合偏差）

06:29 🎥 MM-Ego: Towards Building Egocentric Multimodal LLMs（MM-Ego：构建以自我为中心的多模态大型语言模型）

07:07 🔄 One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation（一种初始化方法统治所有：通过解释方差适应进行微调）

07:51 📖 Story-Adapter: A Training-free Iterative Framework for Long Story Visualization（故事适配器：一种无需训练的迭代框架用于长故事可视化）

08:33 🚀 Self-Boosting Large Language Models with Synthetic Preference Data（利用合成偏好数据自我提升大型语言模型）

09:13 🚀 Falcon Mamba: The First Competitive Attention-free 7B Language Model（猎鹰曼巴：首个无注意力机制的7B语言模型）

09:53 🎨 TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation（TweedieMix：改进基于扩散的图像/视频生成中的多概念融合）

10:24 ⏳ Temporal Reasoning Transfer from Text to Video（从文本到视频的时间推理迁移）

10:54 🎥 TRACE: Temporal Grounding Video LLM via Causal Event Modeling（TRACE：通过因果事件建模实现视频时间定位的大型语言模型）

11:30 📊 Data Selection via Optimal Control for Language Models（通过最优控制进行语言模型数据选择）

12:07 🤖 Response Tuning: Aligning Large Language Models without Instruction（响应调优：无需指令对齐大型语言模型）

12:49 🤖 CursorCore: Assist Programming through Aligning Anything（CursorCore：通过对齐任何内容辅助编程）

13:36 🎥 ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler（ViBiDSampler：利用双向扩散采样器增强视频插值）

14:16 🗣 Mixed-Session Conversation with Egocentric Memory（带有自我中心记忆的混合会话）

14:57 🎮 ING-VP: MLLMs cannot Play Easy Vision-based Games Yet（ING-VP：多模态大语言模型在视觉游戏中的表现仍不尽人意）

15:41 🔓 AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs（AutoDAN-Turbo：一种用于策略自我探索以破解LLMs的终身代理）

16:26 🎥 T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design（T2V-Turbo-v2：通过数据、奖励和条件引导设计增强视频生成模型后训练）

17:00 📖 Collective Critics for Creative Story Generation（创意故事生成的集体批评框架）

17:36 🎵 Diversity-Rewarded CFG Distillation（多样性奖励的CFG蒸馏）

18:16 🧠 Retrieval-Augmented Decision Transformer: External Memory for In-context RL（检索增强决策变压器：上下文强化学习的外部记忆）

18:57 🎙 F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching（F5-TTS：基于流匹配生成流畅且忠实语音的童话生成器）

19:32 🎹 FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance（《致爱丽丝：捕捉并物理合成钢琴演奏手部动作》）

20:20 🧠 Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning（整体遗忘基准：文本到图像扩散模型遗忘的多方面评估）

21:01 🧬 Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning（多模态大语言模型用于逆向分子设计与逆合成规划）

21:38 🎥 BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way（BroadWay：无需训练提升文本到视频生成模型）

22:21 🚨 Multimodal Situational Safety（多模态情境安全）

22:56 💥 Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders（幻觉AI劫持攻击：大型语言模型与恶意代码推荐器）

23:38 🛠 Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach（Seeker：利用基于LLM的多代理方法增强代码中的异常处理）

24:18 🌐 Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control（联合生成多视角一致的PBR纹理：协作控制方法）

24:55 🤖 TinyEmo: Scaling down Emotional Reasoning via Metric Projection（TinyEmo：通过度量投影缩小情感推理）

25:29 🧠 MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders（心理竞技场：通过自我对弈训练语言模型用于心理健康障碍的诊断与治疗）

26:08 🎭 TextToon: Real-Time Text Toonify Head Avatar from Single Video（文本转卡通：从单视频实时生成卡通化头部虚拟形象）

26:49 🤖 Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA（伟大的思想是否一致？探究CAIMIRA框架下的人机问答互补性）

27:28 📊 MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering（MLE-bench：评估机器学习代理在机器学习工程中的表现）

28:03 🧠 Does Spatial Cognition Emerge in Frontier Models?（空间认知在前沿模型中是否出现？）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递