Workflow
代码智能
icon
Search documents
量化圈重磅!百亿私募“开年大动作”,开源发布全新代码大模型!
Xin Lang Cai Jing· 2026-01-02 04:03
据券商中国记者了解,该模型具备能读懂、能写、能改代码的综合能力,可用于自动编程、Bug修复、 代码解释等多类任务,能够像一名24小时在线的编程专家,协助开发者完成复杂的软件工程工作。 2026年第一天,量化私募圈再度传来重磅消息。 1月1日,百亿量化九坤投资发布公告称,其发起设立的至知创新研究院团队,正式开源发布新一代代码 大语言模型IQuest-Coder-V1系列,该模型在自主性软件工程、竞赛编程等关键维度上,已跻身当前开源 代码模型的性能与技术领先行列。 新年第一天,DeepSeek 发布了一篇新论文,提出了一种名为mHC(流形约束超连接)的新架构。该研 究旨在解决传统超连接在大规模模型训练中的不稳定性问题,同时保持其显著的性能增益 。 这也意味着,在DeepSeek点燃行业想象空间之后,百亿级量化私募机构在大模型领域的探索,逐步迈 入更具含金量的技术竞争阶段。 开源发布全新代码大模型 2025年以来,至知创新研究院早期团队已在大语言模型、代码智能、医疗垂域模型、人工智能与数学等 方面,有过高质量的工作发表。例如,在刚刚过去的人工智能领域全球核心学术会议2025 NeurIPS,一 篇与耶鲁大学等团队合 ...
北航领衔发布300页代码智能综述:从基础模型到智能体,一次读懂Code LLM全景图
量子位· 2025-12-05 05:33
Core Insights - The article discusses a comprehensive review of the code intelligence field, detailing the evolution of programming paradigms and the development of foundational models, tasks, training methodologies, and applications in the industry [1][3]. Group 1: Evolution of Programming Paradigms - The paper outlines a clear evolutionary path in programming from manual coding to AI-assisted collaborative development, indicating a shift where developers increasingly express intentions in natural language for models to implement [4][6]. - This paradigm shift is more profound than any previous tool upgrade, marking a critical transition in programming methods [7][8]. Group 2: Code Foundation Models - The paper constructs an overall blueprint for code foundation models, comparing training processes of general LLMs and code-specific models, and identifying core datasets such as GitHub code, issue discussions, and API documentation that form the engineering world knowledge [10][12]. - The evolution of model structures, from CodeBERT and CodeT5 to current architectures, reflects ongoing adaptation to code task requirements [11]. Group 3: Code Tasks and Benchmarks - The evaluation system for code models has been fragmented; the paper organizes tasks by granularity, from function-level to engineering-level tasks, with corresponding benchmarks [14][18]. - While HumanEval and MBPP serve as basic indicators, they only reflect the models' foundational capabilities, with more complex tasks needed to assess real project understanding [15][16]. Group 4: Model Alignment and Enhancement - The paper summarizes methods for model alignment and capability enhancement, focusing on making models better understand engineering rather than just generating code-like text [19][20]. - Key aspects include repo-level training to ensure models comprehend module dependencies and project organization, which is crucial for stable performance in real scenarios [22]. Group 5: Software Engineering Agents - The potential of code intelligence expands when models participate as agents in the software engineering process, moving beyond mere code generation to continuous decision-making and real-time feedback utilization [27][28]. - The current bottleneck for these agents is not model capability but effectively leveraging environmental signals such as test results and tool feedback [28]. Group 6: Security and Governance - The paper discusses the complexities of security issues in code models, categorizing risks into data security, model security, and execution security, along with governance measures like data auditing and static/dynamic testing [34][35]. Group 7: Training Methodologies - The latter part of the paper summarizes valuable training experiences, presenting a systematic methodology for training code models, which can serve as a reference for teams preparing to develop large code models [36][40]. Group 8: Accelerating Applications - The paper concludes by highlighting the acceleration of applications in software engineering, with code models increasingly integrated into key processes such as IDE plugins, collaborative coding, and automated testing [41][42]. - The future of software engineering is likely to evolve towards intention-driven, collaborative coding, with models playing an increasingly significant role [43].
Agentic Coding表现创新高,全新KAT系列模型上榜SWE-Bench
机器之心· 2025-09-26 10:35
Core Insights - The article discusses the launch of two groundbreaking models in the Code Intelligence field by the Kuaipilot team: the open-source 32B parameter model KAT-Dev-32B and the closed-source flagship model KAT-Coder, showcasing their strong performance and capabilities in coding tasks [2][26]. Model Performance - KAT-Dev-32B achieved a 62.4% solution rate on the SWE-Bench Verified, ranking 5th among all open-source models of various sizes [2]. - KAT-Coder demonstrated an impressive 73.4% solution rate on the same benchmark, comparable to top global closed-source models [2][11]. Model Accessibility - KAT-Dev-32B is available on the Hugging Face platform for further research and development [7]. - The API key for KAT-Coder has been made available for application on the "Kuaishou Wanqing" enterprise-level model service and development platform, allowing users to access coding tools directly [7]. Training Innovations - The KAT series models underwent several innovative training phases, including Mid-Training, Supervised Fine-Tuning (SFT), Reinforcement Fine-Tuning (RFT), and large-scale Agentic Reinforcement Learning (RL) [9][12]. - Mid-Training focused on enhancing the model's capabilities related to "LLM-as-Agent," improving tool usage, multi-turn interaction, and instruction adherence [10][12]. - SFT involved collecting real demand delivery trajectories marked by human engineers to enhance end-to-end delivery capabilities [13]. - RFT introduced ground truth for trajectory exploration, improving the efficiency and stability of the reinforcement learning phase [15]. Advanced Techniques - The team implemented entropy-based tree pruning to efficiently learn from non-linear trajectory histories and maximize throughput while minimizing costs [19]. - The SeamlessFlow framework was developed to manage trajectory trees and ensure high throughput training by decoupling RL training from the agent's internal logic [21][22]. Emergent Capabilities - Post-training analysis revealed two significant emergent phenomena: a reduction in dialogue rounds by 32% compared to SFT models and the ability to call multiple tools in parallel [33][35]. - The model's efficiency preference and parallel calling capabilities were attributed to the implicit optimization pressure from the trajectory tree structure [33]. Future Prospects - The Kuaipilot team aims to explore the frontiers of code intelligence, including enhancing tool integration, expanding language support, and developing collaborative coding systems [35].
从Debugger到Developer : 低代码时代新基准NoCode-bench,SWE-Bench作者力荐
机器之心· 2025-08-08 07:53
Core Insights - The article discusses the introduction of a new benchmark called NoCode-bench, aimed at evaluating the capabilities of large language models (LLMs) in natural language-driven feature addition tasks in software development [3][27]. - Current LLMs show a low success rate of only 20% in performing these tasks, highlighting significant challenges in AI's ability to handle real-world software development scenarios [3][26]. Group 1: Benchmark Development - NoCode-bench was developed to address the limitations of existing benchmarks like SWE-bench, which primarily focus on bug fixing rather than feature addition [6][27]. - The benchmark emphasizes the importance of understanding software documentation changes to implement new features, reflecting a more realistic development environment [6][27]. - The construction of NoCode-bench involved a rigorous five-phase process, starting from selecting well-maintained open-source projects to filtering instances based on developer-verified release notes [8][10][16]. Group 2: Challenges Identified - The tasks in NoCode-bench present three main challenges: 1. Increased complexity of input, with document changes being nearly twice as long as bug reports, requiring better long-text comprehension [12]. 2. Difficulty in locating changes, as tasks often involve multiple files and code blocks, demanding high cross-file editing capabilities [13]. 3. Greater editing volume, with nearly 20% of tasks requiring modifications of over 200 lines of code, increasing the risk of errors [14]. Group 3: Model Performance Evaluation - A comprehensive evaluation of six leading LLMs, including Claude-4-Sonnet and GPT-4o, revealed disappointing success rates, with the best-performing model achieving only 15.79% success [18][26]. - The analysis of failure cases identified three primary reasons for poor performance: lack of cross-file editing ability, insufficient understanding of codebase structure, and inadequate tool invocation capabilities [20][21][22]. Group 4: Future Directions - The research indicates that the current state of LLMs is not ready for the complexities of document-driven feature development, suggesting a need for further advancements in AI capabilities [24][27]. - The findings provide a roadmap for future AI software engineers, focusing on improving cross-file editing, codebase comprehension, and tool interaction [27].