[LG] AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning[CMU]arxiv.org