大家好,欢迎收听“Hugging Face 每日AI论文速递”。今天是2024年8月23日,我们将带您快速浏览今日的19篇热门AI论文,涵盖了大型语言模型的可控文本生成、多模态理解和生成、高保真文本到视频合成等多个前沿领域。现在,让我们立即进入精彩的论文世界。
00:27 📚 Controllable Text Generation for Large Language Models: A Survey(大型语言模型的可控文本生成:综述)
01:00 🧠 Sapiens: Foundation for Human Vision Models(智人:人类视觉模型基础)
01:36 🌐 Show-o: One Single Transformer to Unify Multimodal Understanding and Generation(Show-o:一个统一的Transformer模型,实现多模态理解和生成)
02:12 🎥 xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations(xGen-VideoSyn-1:高保真文本到视频合成与压缩表示)
02:45 🎥 DreamCinema: Cinematic Transfer with Free Camera and 3D Character(DreamCinema:自由相机与3D角色的电影转移)
03:19 🖼 Scalable Autoregressive Image Generation with Mamba(基于Mamba架构的可扩展自回归图像生成)
03:54 🤖 Hermes 3 Technical Report(Hermes 3技术报告)
04:33 🚀 Jamba-1.5: Hybrid Transformer-Mamba Models at Scale(Jamba-1.5:大规模混合Transformer-Mamba模型)
05:10 🎥 Real-Time Video Generation with Pyramid Attention Broadcast(基于金字塔注意力广播的实时视频生成)
05:50 🌲 Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search(战略家:通过双层树搜索让LLMs学习战略技能)
06:30 🌉 SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs(SEA:多模态大型语言模型中令牌级视觉-文本集成监督嵌入对齐)
07:14 💼 Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications(开放式金融大型语言模型:面向金融应用的多模态大型语言模型)
07:49 📷 SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models(SPARK:大规模视觉语言模型的多视觉传感器感知与推理基准)
08:26 🇻 Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese(Vintern-1B:一个针对越南语的高效多模态大型语言模型)
08:56 🎥 Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound(视频-福莱:基于时序事件条件的两阶段视频到声音生成)
09:24 🎥 Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation(Anim-Director:一个利用大型多模态模型驱动的可控动画视频生成代理)
10:05 🧐 ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM(ConflictBank:评估大型语言模型中知识冲突影响的基准)
10:46 🌟 Subsurface Scattering for 3D Gaussian Splatting(3D高斯喷射中的次表面散射)
11:20 🇷 The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design(聚焦俄罗斯的嵌入模型探索:ruMTEB基准与俄语嵌入模型设计)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递