GPTOSS
Search documents
Reinforcement Learning Tutorial - RLVR with NVIDIA & Unsloth
Matthew Berman· 2025-12-15 13:00
This is the tech that got AI to be the best in the world at chess, Go, League of Legends, and even master autonomous driving. And today, I'm going to show you how to set it up and actually run it on your home computer. And by the way, I'm partnering with Nvidia on this video.They wanted me to put together this tutorial, and I thought it would be awesome to show you how to do RL locally. So, how did this actually happen. How did AI surpass humans at all of these games.The answer is reinforcement learning. An ...
OpenAI Goes OPEN-SOURCE! gpt-oss is HERE!
Matthew Berman· 2025-08-05 22:09
Model Release - Open AAI 发布了最先进的开源模型 GPTOSS,包含 1200 亿参数和 200 亿参数两个版本 [1] - 这些模型是 open weight 的语言模型,意味着模型权重也被发布 [1] Performance Benchmarks - 1200 亿参数版本的 GPTOSS 在 Code Forces 竞赛中,使用工具的情况下得分为 2622,与 Frontier 模型(得分 2706)非常接近 [2] - 200 亿参数版本的 GPTOSS 在使用工具的情况下得分为 2516,考虑到其规模,表现同样出色 [2] - 这些模型在编程方面的得分超过了地球上大多数人 [2]
OpenAI Dropped a FRONTIER Open-Weights Model
Matthew Berman· 2025-08-05 17:17
Model Release & Capabilities - Open AAI released GPTOSS, state-of-the-art open-weight language models in 120 billion and 20 billion parameter versions [1] - The models outperform similarly sized open-source models on reasoning tasks and demonstrate strong tool use capabilities [3] - The models are optimized for efficient deployment on consumer hardware, with the 120 billion parameter version running efficiently on a single 80 GB GPU and the 20 billion parameter version on edge devices with 16 GB of memory [4][5] - The models excel in tool use, few-shot learning, function calling, chain of thought reasoning, and health issue diagnosis [8] - The models support context lengths of up to 128,000 tokens [12] Training & Architecture - The models were trained using a mix of reinforcement learning and techniques informed by OpenAI's most advanced internal models [3] - The models utilize a transformer architecture with a mixture of experts, reducing the number of active parameters needed to process input [10][11] - The 120 billion parameter version activates only 5 billion parameters per token, while the 20 billion parameter version activates 36 billion parameters [11][12] - The models employ alternating dense and locally banded sparse attention patterns, group multi-query attention, and RoPE for positional encoding [12] Safety & Security - OpenAI did not put any direct supervision on the chain of thought for either OSS model [21] - The models were pre-trained and filtered to remove harmful data related to chemical, biological, radiological, and nuclear data [22] - Even with robust fine-tuning, maliciously fine-tuned models were unable to reach high capability levels according to OpenAI's preparedness framework [23] - OpenAI is hosting a challenge for red teamers with $500,000 in awards to identify safety issues with the models [24]