你有没有想过,AI不仅会犯错,犯错时还分“执迷不悟”和“一路迷茫”两种性格?我们想给AI“开小灶”教点新东西,最有效的方法竟然是发出比主信号弱一千倍的“悄悄话”。本期节目,我们将一起钻进AI的大脑,看看它是如何通过“搭便车”学坏,如何被装上一个“精打细算”的省钱脑子,以及我们该如何用几何“画圈”的方式,真正看懂它的所思所想。准备好了吗?让我们马上出发!
AI“学坏”,竟然是因为一个“搭便车”的坏习惯?
AI犯错,也分“执迷不悟”和“一路迷茫”?
AI进阶的艺术,如何给它开个“小灶”?
给AI装一个“省钱”的脑子
AI的“脑补”和我们的“理解”,中间差了什么?
本期介绍的几篇论文:
[CL] The Piggyback Hypothesis of Generalization: Explaining and Mitigating Emergent Misalignment
[Northeastern University & Stanford University]
---
[CL] How Language Models Fail: Token-Level Signatures of Committed and Persistent Reasoning Failures
[Stanford University]
---
[LG] TALAN: Task-Aligned Latent Adaptation Networks for Targeted Post-Training of Large Language Models
[Meta AI]
---
[LG] Towards Tight Bounds for Streaming Attention
[MIT]
---
[LG] A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders
[University of Washington]
![[人人能懂AI前沿] 驯服“神兽”指南:给AI纠错、开小灶与装个“省钱”的脑子](https://image.xyzcdn.net/FqWpK8fpivLboaqBbRHUe_BCOvxu.png@small)