大规模强化学习
Search documents
只用512张H200,106B模型靠分布式RL杀出重围,全网开源
3 6 Ke· 2025-12-10 06:55
【导读】Prime Intellect发布的INTELLECT-3,在数学、代码等多项基准测试中取得同规模最强表现。该模型旨在将训练前沿模型的技术栈开放给社区, 推动大规模RL研究的普及与发展。 Prime Intellect已经把完整的训练流程——包括模型权重、训练框架、数据集、RL环境和评测体系——全部开源,希望能推动更多关于大规模强化学习的 开放研究。 INTELLECT-3使用的训练软件与基础设施,与即将在Prime Intellect平台向所有人开放的版本完全一致。 这意味着未来每个人、每家公司都能拥有对最先进模型进行后训练的能力。 多项基准,斩获SOTA INTELLECT-3是一个106B参数的Mixture-of-Experts(MoE)模型,基于GLM 4.5 Air进行了监督微调(SFT)和强化学习训练。 最近,Prime Intellect正式发布了INTELLECT-3。 这是一款拥有106B参数的混合专家(Mixture-of-Experts)模型,基于Prime Intellect的强化学习(RL)技术栈训练。 在数学、代码、科学与推理的各类基准测试上,它达成了同规模中最强的成绩, ...
OpenAI回归机器人:想把大模型推向物理世界
3 6 Ke· 2025-09-17 11:12
Core Insights - OpenAI is refocusing its research and recruitment efforts on "embodied intelligence," particularly in humanoid systems, after a pause of several years [1][4] - The company is building a robotics research matrix aimed at real-world applications, indicating a shift from purely algorithmic development to hardware integration [1][4] Recruitment and Talent Acquisition - OpenAI has been actively recruiting talent with backgrounds in humanoid robotics and physical control algorithms, emphasizing teleoperation and simulation tools like Nvidia Isaac [3][8] - Job postings highlight the need for experience in designing mechanical systems for high-volume production, suggesting a focus on scalable robotics solutions [3][8] Strategic Direction - The appointment of Caitlin Kalinowski, former head of AR hardware at Meta, to lead robotics and consumer hardware initiatives signals a strong commitment to the robotics sector [4] - OpenAI's previous achievements in robotics, such as the Dactyl robotic hand, demonstrate its capability in sim-to-real applications, which the company is now revisiting [6] Technical Capabilities - OpenAI aims to extend its general model's understanding and reasoning to a complete loop of perception and control, requiring capabilities in data collection, model optimization, and hardware design [8] - The company is focusing on large-scale reinforcement learning and real-time inference to enhance the stability and timing of perception-control systems [8] Market Context - The humanoid robotics sector is competitive, with significant investments exceeding $5 billion since 2024, and a projected trillion-dollar market by 2050 [9] - OpenAI's recent adjustments in computing power, funding, and governance, including a new non-binding memorandum with Microsoft, may influence its robotics development pace and external collaborations [9]
计算机行业点评报告:阿里云QwQ-32B开源模型全球首发,引领超低密度智能与端侧生态范式革命
Huaxin Securities· 2025-03-11 13:34
Investment Rating - The report maintains a "Buy" rating for Alibaba (BABA.N), Google (GOOGL.O), and Microsoft (MSFT.O) [11] Core Insights - The release of the QwQ-32B model by Alibaba Cloud marks a significant leap in parameter efficiency, achieving a 20-fold compression ratio while maintaining performance comparable to the DeepSeek-R1 model, which has 671 billion parameters [4][5] - The QwQ-32B model demonstrates exceptional performance in various benchmark tests, surpassing similar-sized models like OpenAI's o1-mini [4] - The innovative training methodology of QwQ-32B, which combines cold-start pre-training with a results-driven reinforcement learning system, enhances its reasoning capabilities [5][7] - The open-source nature of QwQ-32B, licensed under Apache 2.0, has led to rapid adoption in the AI community, indicating a strong market interest and potential for commercial value [6][9] Summary by Sections Market Performance - The computer industry has outperformed the CSI 300 index over the past month (4.6% vs. 1.5%), three months (7.5% vs. -1.2%), and twelve months (34.2% vs. 9.8%) [1] Investment Highlights - The QwQ-32B model's performance is on par with DeepSeek-R1, showcasing its capability to operate efficiently on consumer-grade graphics cards, thus reducing deployment costs [4] - The model's architecture supports critical thinking chain generation based on external feedback, enhancing its application in real-world scenarios [5] Competitive Landscape - The global AI technology competition is intensifying, with major players like Google and Microsoft launching advanced models, while domestic companies like ByteDance and Baidu are also making significant strides [8] Investment Recommendations - The report suggests focusing on Alibaba Cloud's ecosystem partners and edge inference chip companies, highlighting the commercial potential of open-source models [9]