2024.08.23 每日AI论文 | 大型语言模型提升文本生成质量，智人模型优化视觉任务表现 - HuggingFace 每日AI论文速递

大家好，欢迎收听“Hugging Face 每日AI论文速递”。今天是2024年8月23日，我们将带您快速浏览今日的19篇热门AI论文，涵盖了大型语言模型的可控文本生成、多模态理解和生成、高保真文本到视频合成等多个前沿领域。现在，让我们立即进入精彩的论文世界。

00:27 📚 Controllable Text Generation for Large Language Models: A Survey（大型语言模型的可控文本生成：综述）

01:00 🧠 Sapiens: Foundation for Human Vision Models（智人：人类视觉模型基础）

01:36 🌐 Show-o: One Single Transformer to Unify Multimodal Understanding and Generation（Show-o：一个统一的Transformer模型，实现多模态理解和生成）

02:12 🎥 xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations（xGen-VideoSyn-1：高保真文本到视频合成与压缩表示）

02:45 🎥 DreamCinema: Cinematic Transfer with Free Camera and 3D Character（DreamCinema：自由相机与3D角色的电影转移）

03:19 🖼 Scalable Autoregressive Image Generation with Mamba（基于Mamba架构的可扩展自回归图像生成）

03:54 🤖 Hermes 3 Technical Report（Hermes 3技术报告）

04:33 🚀 Jamba-1.5: Hybrid Transformer-Mamba Models at Scale（Jamba-1.5：大规模混合Transformer-Mamba模型）

05:10 🎥 Real-Time Video Generation with Pyramid Attention Broadcast（基于金字塔注意力广播的实时视频生成）

05:50 🌲 Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search（战略家：通过双层树搜索让LLMs学习战略技能）

06:30 🌉 SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs（SEA：多模态大型语言模型中令牌级视觉-文本集成监督嵌入对齐）

07:14 💼 Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications（开放式金融大型语言模型：面向金融应用的多模态大型语言模型）

07:49 📷 SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models（SPARK：大规模视觉语言模型的多视觉传感器感知与推理基准）

08:26 🇻 Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese（Vintern-1B：一个针对越南语的高效多模态大型语言模型）

08:56 🎥 Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound（视频-福莱：基于时序事件条件的两阶段视频到声音生成）

09:24 🎥 Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation（Anim-Director：一个利用大型多模态模型驱动的可控动画视频生成代理）

10:05 🧐 ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM（ConflictBank：评估大型语言模型中知识冲突影响的基准）

10:46 🌟 Subsurface Scattering for 3D Gaussian Splatting（3D高斯喷射中的次表面散射）

11:20 🇷 The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design（聚焦俄罗斯的嵌入模型探索：ruMTEB基准与俄语嵌入模型设计）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递