【第220期】SWE-RL：读开源代码学成软件工程师

Seventy3：借助NotebookLM的能力进行论文解读，专注人工智能、大模型、机器人算法方向，让大家跟着AI一起进步。

进群添加小助手微信：seventy3_podcast

备注：小宇宙

今天的主题是：

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Summary

The provided text is a research paper introducing SWE-RL, a novel reinforcement learning approach to enhance large language models for software engineering tasks by training them on open-source software evolution data. This method enables the developed model, Llama3-SWE-RL-70B, to achieve state-of-the-art performance on solving real-world GitHub issues, even rivaling proprietary models. Surprisingly, training solely on software engineering data with SWE-RL also equips the model with improved general reasoning abilities applicable to diverse out-of-domain tasks like mathematics and code generation. The paper details the data curation process, the SWE-RL framework including its reward system and training methodology, and extensive evaluations demonstrating its effectiveness and generalizability.

研究论文介绍了SWE-RL，一种新颖的强化学习方法，通过在开源软件演变数据上训练大型语言模型，增强其在软件工程任务中的能力。该方法使开发出的模型Llama3-SWE-RL-70B在解决现实世界GitHub问题上达到了最先进的性能，甚至可与专有模型媲美。令人惊讶的是，仅在软件工程数据上使用SWE-RL训练的模型，还获得了适用于数学和代码生成等多样化域外任务的改进的通用推理能力。论文详细描述了数据整理过程、SWE-RL框架（包括其奖励系统和训练方法）以及广泛的评估，展示了其有效性和泛化能力。

原文链接：arxiv.org