Data Science x AI EP1 - Why & What to Evaluate

6分钟 ·1 年前

355

这周恰逢Amy和Stella这周都在外旅行，给大家分享一起不一样的内容。本期播客来自Stella最近新开的Substack Newsletter - Data Science x AI。因为是英文Newsletter，所以这一期的podcast也是全英内容。LLM/GenAI evaluation是很新的一个领域，希望和大家一起讨论！

Hey there! This is the first post in my series on evaluating LLM-powered products, part of my ongoing effort to rediscover what data science means in the AI era. If you're interested in this topic, subscribe to get updates!

datasciencexai.substack.com

展开Show Notes

holadorable

2025.6.25

evaluation，是只从技术方面讨论吗？令我很头疼的问题是，老板很爱问“评估一下Claude怎么样”，又不告诉我他想用来做什么，领导爱问“你设计的这个产品效果怎么样”，我又没有私有的benchmark dataset可以用来评估🥲

StellaxAmy

:如果是specific use case，最好是有自己的benchmark dataset，看看有没有现成的data可以用来curate。如果实在没有办法做dataset curation，那么AB testing也是一条路。可以告诉老板，认真做evaluation的话需要时间和resource，这不是一个adhoc analysis。另外，benchmark dataset size也是个有趣的话题，我现在也还在摸索。

四夕_lfQh

2025.6.25

这是喂了ai Stella的音频训练后ai念的稿吗？

shakalaka_:感觉Stella从第一期开始声音就自带电音感～就有点像处理后的声音，不知道是一种效果还是天赋异禀…

StellaxAmy

:冤枉啊大人！这是Stella肉身念的稿！

共4条回复

四夕_lfQh

2025.6.25

占住沙发 - 就爱看点儿不一样的：）

StellaxAmy

:❤️

在小宇宙打开