Workflow
训练框架创新
icon
Search documents
“复刻”幻方量化打造Deepseek 量化私募基金念空在大模型底层技术研发取得突破
经济观察报· 2025-06-03 11:17
随着AI大模型迭代升级,如今量化私募基金对AI大模型底层技 术的研发布局,日益侧重算法优化。在这个过程,产学研的结 合将是他们在大模型底层技术研发方面取得突破的"捷径"。 作者:陈植 封图:图虫创意 5月以来,全球大模型研发公司在大模型语义理解、多模态等方面的"较劲"悄然升级。 中国深度求索(DeepSeek)公司表示,DeepSeek R1模型已完成小版本升级,令大模型的思维深 度与推理能力显著提升。 国内量化私募基金念空科技与上海交通大学计算机学院开展合作,提出一种全新的大模型训练框架 (SASR),并发表论文投向全球顶级人工智能会议期刊NIPS。 念空科技创始人王啸在6月3日接受本报记者专访时表示,这项全新的大模型训练框架(SASR), 在GSM8K任务中,在仅使用1.5B模型的情况下,准确率就超过了80%,接近GPT-4o的表现;而在 KK逻辑推理任务中,其准确率比GPT-4o还高出约9个百分点。SASR让通用大模型变得更"聪明"。 他告诉记者,当前大模型技术的训练框架,主要围绕监督微调(SFT)和强化学习(RL),所谓监督微 调(SFT)就是不断给大模型输入资料和案例进行监督训练,相当于"刷题"; ...
“复刻”幻方量化打造Deepseek 量化私募基金念空在大模型底层技术研发取得突破
Jing Ji Guan Cha Wang· 2025-06-03 06:57
Core Insights - The competition among global large model development companies has intensified, particularly in semantic understanding and multimodal capabilities since May [2] - Domestic quantitative private equity funds are also entering the race, achieving breakthroughs in AI large model foundational technology [2][5] - A new training framework (SASR) proposed by NianKong Technology in collaboration with Shanghai Jiao Tong University has shown promising results, achieving over 80% accuracy on the GSM8K task with a 1.5B model, nearing GPT-4o's performance [2][4] Group 1: Training Framework and Algorithm Optimization - The current training frameworks for large models primarily focus on Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), with the challenge being to optimize the balance between these two methods [3][8] - The new training framework aims to dynamically adjust the relationship between SFT and RL, allowing the model to become "smarter" without increasing data volume [3][9] - The innovative training framework has been applied in quantitative investment strategy development, achieving approximately 80% market prediction accuracy compared to traditional models [4][13] Group 2: Industry Trends and Collaborations - Many quantitative private equity firms are establishing AI Labs to focus on foundational technology research for large models, emphasizing algorithm optimization [6][11] - The integration of academic research and private equity expertise is seen as a shortcut to breakthroughs in large model foundational technology [5][11] - The emergence of smarter large models with lower parameter counts but superior overall capabilities is attributed to innovations in training frameworks and algorithm optimization [10] Group 3: Future Directions and Challenges - The ability of large models to become "smarter" in various vertical fields depends on high-quality data and effective training modes [12] - NianKong Technology aims to empower large models to excel in more vertical fields, enhancing China's competitiveness in the global AI landscape [14]