如果一个AI能像武学奇才一样自我进化,创造出最强的攻击招式,而它最致命的弱点,竟然是几句古老的文言文,这会是怎样一幅奇特的攻防图景?当AI在我们眼皮底下藏着一座秘密的版权图书馆,一个不经意的操作就让它开始“背书”时,我们又该如何看待它的“记忆”?本期,我们就从几篇最新论文出发,看看这些“自我进化”、“文化奇袭”和“一体化创造”的研究,如何再次刷新我们对AI能力边界的认知。
00:00:34 AI内卷,当你的对手开始自我进化
00:06:05 AI的致命缺陷,竟然是文言文?
00:10:38 你的AI,藏着一座秘密图书馆
00:15:51 AI绘画新思路,当翻译官和小说家是同一个人
本期介绍的几篇论文:
[LG] Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs
[MATS & Imperial College London]
---
[CL] Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
[Nanyang Technological University & Northeast University & Renmin University of China]
---
[CL] Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models
[Stony Brook University & CMU & Columbia Law School]
---
[CV] End-to-End Training for Unified Tokenization and Latent Denoising
[MIT & Adobe]
![[人人能懂AI前沿] AI的自我修炼、致命盲区与隐藏记忆](https://image.xyzcdn.net/FuDP4HpAp8ezgVZMmEel3mblKCmJ.jpg@small)