刚刚，字节开源Seed-OSS-36B模型，512k上下文

Core Viewpoint - ByteDance's Seed team has officially released and open-sourced the Seed-OSS series models, which include three versions: Seed-OSS-36B-Base (with synthetic data), Seed-OSS-36B-Base (without synthetic data), and Seed-OSS-36B-Instruct, trained on 12 trillion tokens and achieving excellent performance on various benchmarks [1][2]. Model Features - The Seed-OSS-36B architecture incorporates various design choices, including causal language modeling, Grouped Query Attention, SwiGLU activation function, RMSNorm, and RoPE positional encoding [4]. - Each model contains 36 billion parameters distributed across 64 layers and supports a vocabulary size of 155,000 [5]. - A notable feature is the native long-context capability, with a maximum context length of 512k tokens, allowing for the processing of long documents and reasoning chains without performance loss [6][7]. Inference Budget Control - The model introduces inference budget control, allowing developers to specify how much reasoning the model should perform before providing an answer [10]. - This design enables teams to adjust performance based on task complexity and deployment efficiency needs [12]. - Recommended budget values are multiples of 512 tokens, with a budget of 0 indicating direct answer output [13][26]. Benchmark Performance - The Seed-OSS-36B-Base model achieved scores of 65.1 on MMLU-Pro and 81.7 on MATH, demonstrating competitive performance [15]. - The Seed-OSS-36B-Instruct version achieved state-of-the-art (SOTA) results in various fields, including 91.7% on AIME24 and 67.4 on LiveCodeBench v6 [17]. - In long-context processing tests, the model reached a score of 94.6 on RULER (128K context length), marking the highest score among open-source models [18]. User Interaction and Token Management - During operation, the model informs users of token usage, enhancing user awareness of resource consumption [25]. - If no inference budget is set, the model defaults to unlimited length reasoning, while a budget of 0 prompts direct answer output [27].