对齐逆转

Search documents
OpenAI没开源的gpt-oss基础模型,他去掉强化学习逆转出来了
机器之心· 2025-08-13 03:27
Core Viewpoint - OpenAI has released two inference models, gpt-oss-120b and gpt-oss-20b, but has not provided the pre-trained base model. Jack Morris, a researcher, has successfully reverted the gpt-oss model to a base model, gpt-oss-20b-base, which has been well-received upon release [1][2][4]. Model Release - Jack Morris announced the release of gpt-oss-20b-base, which is a base model capable of generating arbitrary text, unlike the original gpt-oss models that were aligned for specific outputs [2][6]. - The model is based on the gpt-oss-20b mixture of experts model and has been fine-tuned using low-rank adaptation (LoRA) [4][6]. Technical Details - gpt-oss-20b-base was created by reversing the alignment phase of the gpt-oss-20b training process, allowing it to generate more natural text [6][8]. - The model has been fine-tuned with a low-rank update applied to only a few linear layers, using approximately 20,000 documents from the FineWeb dataset [17][20]. - The fine-tuning process involved 1500 steps with a learning rate of 2e-6 and a batch size of 16, achieving a maximum sequence length of 8192 [20]. Memory and Output - Testing revealed that gpt-oss-20b-base retains memory of certain copyrighted materials, indicating it has knowledge of at least three out of six tested books [9][22]. - The model's outputs can include inappropriate content and assist in illegal activities due to the reversal of the alignment phase [8][9]. Future Plans - Jack Morris plans to further investigate the memory contents of gpt-oss-20b-base and attempt to reverse gpt-oss-120b, as well as explore instruction fine-tuning and comparisons with GPT-2 and GPT-3 [22].