2024.10.14 每日AI论文 | 多模态模型Baichuan-Omni开源,Meissonic提升文生图效率

2024.10.14 每日AI论文 | 多模态模型Baichuan-Omni开源,Meissonic提升文生图效率

11分钟 ·
播放数111
·
评论数0

本期的 16 篇论文如下:

00:25 🌐 Baichuan-Omni Technical Report(百川-Omni 技术报告)

00:59 🖼 Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis(Meissonic:高效高分辨率文本到图像生成的掩码生成Transformer复兴)

01:41 🔧 From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning(从通才到专家:通过任务特定视觉指令调整适应视觉语言模型)

02:17 🎨 EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models(进化导演:利用大规模视觉语言模型接近高级文本到图像生成)

02:53 🧠 StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization(结构化RAG:通过推理时混合信息结构化提升LLMs的知识密集型推理能力)

03:34 📏 PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness(大语言模型:具备显式位置感知的长度控制与复制粘贴)

04:11 🌐 Semantic Score Distillation Sampling for Compositional Text-to-3D Generation(语义分数蒸馏采样用于组合式文本到3D生成)

04:47 🧠 SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights(超级纠正:利用错误驱动的洞察力监督和纠正语言模型)

05:29 🔄 Mechanistic Permutability: Match Features Across Layers(机制可置换性:跨层匹配特征)

06:07 🤖 Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining(多智能体协作数据选择以提高LLM预训练效率)

06:45 ⚡ KV Prediction for Improved Time to First Token(KV预测提升首次输出时间)

07:30 🌐 ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion(零样本对象合成:基于扩散的图像内在特性)

08:13 🚨 MiRAGeNews: Multimodal Realistic AI-Generated News Detection(多模态现实AI生成新闻检测)

08:52 🤖 DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models(DA-Code:面向大型语言模型的代理数据科学代码生成基准)

09:30 📈 I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow(I-Max:最大化预训练校正流变换器的分辨率潜力与投影流)

10:12 🧠 Mentor-KD: Making Small Language Models Better Multi-step Reasoners(导师-KD:使小型语言模型成为更好的多步推理者)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递