00:00:37 你的AI管家,靠谱吗?一份来自未来的安全报告
00:04:40 AI“发疯”?科学家找到了它的“性格开关”
00:09:33 比结果更重要的,是“想明白”的过程
00:14:09 AI的“降维打击”:复杂世界里的简单活法
00:18:23 AI的“暖男”人设,可能是个陷阱?
本期介绍的几篇论文:
[LG] Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition
[Gray Swan AI]
---
[CL] Persona Vectors: Monitoring and Controlling Character Traits in Language Models
[Anthropic Fellows Program & Constellation]
---
[LG] RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
[Tencent]
---
[LG] Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces
[Brown University & Amazon Web Services]
---
[CL] Training language models to be warm and empathetic makes them less reliable and more sycophantic
[University of Oxford]
---
[CL] On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
[Not explicitly stated, survey paper]
![[人人能懂] AI的“人设”与陷阱:它在对你撒谎吗?](https://image.xyzcdn.net/FuDP4HpAp8ezgVZMmEel3mblKCmJ.jpg@small)