

【英文】理解推理型大语言模型 - DeepSeek R1 and BeyondThis article explores the creation and improvement of Large Language Models (LLMs) specifically designed for reasoning. It defines reasoning models, contrasts their strengths and weaknesses, and details four primary methods for developing them: inference-time scaling, pure reinforcement learning, supervised fine-tuning with reinforcement learning, and supervised fine-tuning with distillation. The article uses the DeepSeek R1 models as a case study and compares them to OpenAI's o1 model, also examining cost-effective alternatives for building reasoning models with limited resources. Finally, it discusses "journey learning," a novel approach to supervised fine-tuning. Source: Understanding Reasoning LLMs
【英文】DeepSeek R1与Nvidia股价大跌全面分析Jeffrey Emanuel, a former hedge fund analyst and current AI developer, outlines the bull case for Nvidia, emphasizing its near-monopoly on AI infrastructure. However, he highlights emerging threats: innovative hardware from companies like Cerebras and Groq, the development of custom AI chips by major tech firms, advancements in software frameworks lessening Nvidia's CUDA dependency, and a surprising efficiency breakthrough by a small Chinese firm, DeepSeek, significantly reducing the compute cost of AI model training and inference. These factors, the author argues, cast doubt on Nvidia's ability to sustain its current growth trajectory and justify its high valuation. Source: The Short Case for Nvidia Stock
【英文】Anthropic 教你如何构建AgentsThis Anthropic article discusses building effective large language model (LLM) agents. It contrasts workflows(predefined LLM processes) with agents (LLMs dynamically controlling their actions and tool use). The article details various workflow patterns like prompt chaining, routing, and parallelization, and explains when to employ each. Finally, it emphasizes building simple, transparent agents with well-documented tool interfaces, advocating for starting with basic components before using frameworks.
【英文】CoALA: 通向AGI的LLM智能体认知架构框架CoALA builds on the history of cognitive architectures and production systems in artificial intelligence, which are systems that learn and reason using rules and logic. The paper discusses the analogy between these systems and large language models (LLMs), which are trained on massive datasets of text and can generate human-like text. By incorporating LLMs into a cognitive architecture, CoALA suggests that language agents can be designed with modular memory components, a structured action space for interacting with both internal memory and external environments, and a generalized decision-making process to choose actions. The paper uses CoALA to analyze existing language agents and outlines potential directions for future research.
【英文】Claude 3.5 Sonnet 在SWE-bench上取得49%成功率Anthropic has released an upgraded version of their Claude 3.5 Sonnet language model that achieves a state-of-the-art 49% score on the SWE-bench Verified benchmark, a challenging evaluation of an AI model's ability to solve real-world software engineering tasks. This article details how Anthropic built an "agent" system, including a simple prompt and two general-purpose tools, around Claude 3.5 Sonnet to enable it to achieve this high score. The authors also discuss the challenges they faced in using SWE-bench Verified, such as the duration and high token costs of complex tasks, the need to resolve system issues in grading, and the difficulty of evaluating models that cannot access files saved on the filesystem. They conclude by expressing confidence that developers building with the new Claude 3.5 Sonnet will find ways to improve SWE-bench scores even further.