Workflow
CodeLlama
icon
Search documents
AAAI 2026|AP2O-Coder 让大模型拥有「错题本」,像人类一样按题型高效刷题
机器之心· 2026-01-14 05:37
作者: 上交博士,在腾讯codebuddy 实习,发表一作顶会顶刊论文10篇(含best paper 等),开源PFLlib等明星项目,获得社区赞誉。主要研究AI强化 学习、AI合成数据、Agent 记忆等。 在 AI 辅助 Coding 技术快速发展的背景下,大语言模型(LLMs)虽显著提升了软件开发效率,但开源的 LLMs 生成的代码依旧存在运行时错误,增加了 开发者调试成本。 现有基于偏好优化的改进方法,多依赖「通过 / 失败」二元信号构建训练数据,难以知晓「错在哪」,也忽视了模型能力在训练时的动态变化特性。 针对此缺口,在腾讯 CodeBuddy 实习期间,我们提出自适应渐进式偏好优化方法(AP2O),并构建 AP2O-Coder 框架。该方法借鉴人类的「按题型高 效刷题」经验出发,通过「考试 - 分析 - 纠错 - 小测」的系统性流程提升模型代码纠错能力,在多款主流开源模型上实现最高 3% 的 pass@k 性能提 升,同时降低训练数据需求量。 论文标题:AP2O-Coder: Adaptively Progressive Preference Optimization for Reducing C ...
生成式AI赋能需求工程:一场正在发生的变革
机器之心· 2025-11-27 12:13
Core Insights - The article presents a systematic literature review on the application of Generative AI (GenAI) in Requirements Engineering (RE), highlighting its transformative potential and the challenges that need to be addressed for effective industrial adoption [4][51]. Research Growth - Research on GenAI in the RE field has shown exponential growth, with the number of relevant papers increasing from 4 in 2022 to 23 in 2023, and projected to reach 113 in 2024 [10][8]. - A total of 238 papers were reviewed, indicating a strong academic interest following the release of ChatGPT [8][10]. Research Focus Imbalance - The focus of research is heavily skewed towards certain phases of RE, with 30% dedicated to requirements analysis, while only 6.8% is focused on requirements management, indicating a lack of attention to complex socio-technical factors [11][9]. - GenAI is currently in a "rapid expansion but immature" phase, with a significant increase in quantity but insufficient depth in research [14]. Technical Landscape - A significant reliance on the GPT model family is observed, with 67.3% of studies using it, which limits exploration of diverse technological paths [16]. - GPT-4 is primarily used for complex requirement analysis, while open-source alternatives like CodeLlama are underutilized despite their lower hallucination rates [17][16]. Challenges Identified - The research identifies three core challenges: reproducibility (66.8%), hallucination (63.4%), and interpretability (57.1%), which are interrelated and must be addressed collectively [30][31]. - The lack of reproducibility is particularly problematic due to the random nature of large language models (LLMs) and their opaque APIs [30]. Evaluation Practices - There is a notable lack of standardized evaluation metrics in the RE field, with only 23.9% of studies releasing tools and 45.8% using non-public datasets [35][37]. - Traditional NLP metrics dominate the evaluation methods, failing to capture the complexity of RE tasks [33]. Industrial Adoption - The industrial adoption of GenAI in RE is lagging, with 90.3% of studies remaining at the conceptual or prototype stage, and only 1.3% achieving production-level integration [39][41]. - The value of GenAI in industry is seen in accelerating requirement documentation and reducing communication costs, but companies are hesitant due to compliance and risk control concerns [43]. Future Roadmap - A four-phase strategy is proposed for advancing GenAI in RE: strengthening evaluation infrastructure, governance-aware development, scalable context-aware deployment, and industrial-level standardization [46]. - Key areas for improvement include generalization capabilities, data quality, and evaluation methods [45]. Recommendations for Researchers and Practitioners - Researchers are encouraged to explore diverse models beyond GPT, develop hybrid architectures specific to RE, and focus on reproducibility [53]. - Practitioners should use GenAI as an auxiliary tool rather than a decision-maker, especially in low-risk tasks [53].