让AI自己给自己“立规矩”,结果会怎样?

让AI自己给自己“立规矩”,结果会怎样?

5分钟 ·
播放数74
·
评论数0

[LG] AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning

[CMU]

arxiv.org