This article explores the creation and improvement of Large Language Models (LLMs) specifically designed for reasoning. It defines reasoning models, contrasts their strengths and weaknesses, and details four primary methods for developing them: inference-time scaling, pure reinforcement learning, supervised fine-tuning with reinforcement learning, and supervised fine-tuning with distillation. The article uses the DeepSeek R1 models as a case study and compares them to OpenAI's o1 model, also examining cost-effective alternatives for building reasoning models with limited resources. Finally, it discusses "journey learning," a novel approach to supervised fine-tuning.
Source: Understanding Reasoning LLMs
