Workflow
开源大模型
icon
Search documents
刚刚,字节开源Seed-OSS-36B模型,512k上下文
机器之心· 2025-08-21 01:03
Core Viewpoint - ByteDance's Seed team has officially released and open-sourced the Seed-OSS series models, which include three versions: Seed-OSS-36B-Base (with synthetic data), Seed-OSS-36B-Base (without synthetic data), and Seed-OSS-36B-Instruct, trained on 12 trillion tokens and achieving excellent performance on various benchmarks [1][2]. Model Features - The Seed-OSS-36B architecture incorporates various design choices, including causal language modeling, Grouped Query Attention, SwiGLU activation function, RMSNorm, and RoPE positional encoding [4]. - Each model contains 36 billion parameters distributed across 64 layers and supports a vocabulary size of 155,000 [5]. - A notable feature is the native long-context capability, with a maximum context length of 512k tokens, allowing for the processing of long documents and reasoning chains without performance loss [6][7]. Inference Budget Control - The model introduces inference budget control, allowing developers to specify how much reasoning the model should perform before providing an answer [10]. - This design enables teams to adjust performance based on task complexity and deployment efficiency needs [12]. - Recommended budget values are multiples of 512 tokens, with a budget of 0 indicating direct answer output [13][26]. Benchmark Performance - The Seed-OSS-36B-Base model achieved scores of 65.1 on MMLU-Pro and 81.7 on MATH, demonstrating competitive performance [15]. - The Seed-OSS-36B-Instruct version achieved state-of-the-art (SOTA) results in various fields, including 91.7% on AIME24 and 67.4 on LiveCodeBench v6 [17]. - In long-context processing tests, the model reached a score of 94.6 on RULER (128K context length), marking the highest score among open-source models [18]. User Interaction and Token Management - During operation, the model informs users of token usage, enhancing user awareness of resource consumption [25]. - If no inference budget is set, the model defaults to unlimited length reasoning, while a budget of 0 prompts direct answer output [27].
传媒行业周观察(20250811-20250815):看好游戏、IP、AI、影视等景气度方向
Huachuang Securities· 2025-08-18 05:47
Investment Rating - The report maintains a "Recommendation" rating for the media industry, expecting the industry index to outperform the benchmark index by over 5% in the next 3-6 months [3][44]. Core Viewpoints - The report highlights optimism in sectors such as gaming, intellectual property (IP), artificial intelligence (AI), and film, indicating a favorable market outlook [1][3]. - The media sector is currently experiencing a resurgence, with AI applications gaining traction and cultural confidence being bolstered through content output [3][6]. - The report emphasizes the potential for significant growth in the AI application industry, particularly in public cloud services and user engagement scenarios [3][6]. Market Performance Overview - The media sector index rose by 1.00% last week, underperforming the CSI 300 index, which increased by 2.37%, resulting in a relative underperformance of 1.37% [7][10]. - The media sector's total market capitalization is approximately 178.65 billion yuan, with 140 listed companies [3]. Gaming Sector Insights - The gaming market shows positive trends, with high-frequency data indicating upward movement and favorable mid-year report expectations [3][15]. - Notable games such as "Peacekeeper Elite" and "Honor of Kings" continue to dominate the iOS sales rankings, reflecting strong daily active user (DAU) engagement [15][16]. Film Market Analysis - As of August 15, 2025, the film box office has reached 33.006 billion yuan, recovering approximately 85% of the pre-pandemic levels in terms of box office revenue [20][21]. - The average ticket price is reported at 32.6 yuan, with a total of 20.879 million viewers during the week of August 11-15, 2025 [21][26]. AI Sector Developments - The report notes the ongoing advancements in AI applications, with a focus on companies like Kuaishou and Youzan, which are expected to benefit from AI integration [3][29]. - The launch of new AI technologies and products by major companies like Huawei and Apple is anticipated to further drive growth in the sector [29][30][31]. Key Company Recommendations - The report suggests focusing on companies such as Tencent, Alibaba, Kuaishou, and Meitu, which are well-positioned to leverage the current market dynamics [3][6]. - Specific stocks like Giant Network, G-bits, and Perfect World are highlighted as potential investment opportunities within the gaming sector [3][6].
全球AI大模型迭代提速!中国开源生态爆发
Wind万得· 2025-08-12 22:37
Core Viewpoint - The global AI industry is experiencing a rapid acceleration in technological iterations, with major companies like OpenAI, Google DeepMind, and Baidu releasing or updating large model products, indicating a period of intensive innovation [1] Group 1: Major Company Developments - OpenAI launched GPT-5 on August 8, featuring enhanced reasoning, multimodal capabilities, and enterprise customization, with significant improvements in programming performance and reduced hallucination rates [3] - Baidu plans to release a new AI inference model by the end of August, aimed at enhancing complex task processing capabilities [3] - Google DeepMind introduced the "Genie3" model on August 6, capable of generating dynamic 3D worlds, although it still faces limitations in practical operability and multi-agent interactions [3] - Chinese companies are making significant strides in the open-source large model sector, with Tencent announcing the open-source "Hunyuan 3D World Model 1.0" and Alibaba releasing four open-source models, with one ranking third globally on an international evaluation platform [3][4] Group 2: Open Source Landscape - As of July 31, nine out of the top ten open-source large models globally are from Chinese companies, with Zhipu GLM-4.5 ranked first, showcasing China's transition from technology catch-up to ecosystem leadership [4] - The open-source approach adopted by Chinese companies contrasts with the closed-source model favored by U.S. tech firms like OpenAI, which has shifted from open-source to closed-source operations to maintain its technological edge [6] Group 3: Industry Challenges and Opportunities - The open-source model accelerates technology dissemination but faces challenges such as "fine-tuning internal competition," where most updates focus on parameter tuning rather than foundational architecture innovation [6] - Developers encounter compatibility issues due to frequent model updates and interface changes, complicating integration efforts [6] - The "combinatorial effect" of open-source models may weaken technological barriers, preventing significant capability gaps between companies [6] Group 4: Market Dynamics and Future Outlook - Differentiated AI applications are creating incremental opportunities, with companies like Kuaishou focusing on video and image generation, Alibaba leveraging AI in e-commerce, and Tencent exploring applications in advertising and gaming [7] - As of now, the total number of registered personal users for large models exceeds 3.1 billion, with API call users surpassing 159 million [7] - The next generation of large models is expected to benefit from increased reasoning demands, driving growth in computing power requirements [7] - By 2025, the AI large model industry is anticipated to exhibit accelerated technological iterations, a rising open-source ecosystem, and diverse commercialization paths, enhancing China's global influence in the AI sector [7]
超越OpenAI医疗能力,百川发布开源大模型Baichuan-M2
Feng Huang Wang· 2025-08-11 07:32
此外,百川智能针对医疗领域用户隐私考虑下的模型私有化部署需求,对Baichuan-M2进行了极致轻量 化,量化后的模型精度接近无损,可以在RTX4090上单卡部署,相比DeepSeek-R1 H20双节点部署的方 式,成本降低了57倍。 凤凰网科技讯 8月11日,百川大模型正式发布开源医疗增强大模型Baichuan-M2。据官方介绍,该模型 以32B的较小尺寸,不仅反超OpenAI最新开源(300109)模型gpt-oss120b,更是力压Qwen3-235B、 Deepseek R1、Kimi K2等当前世界所有开源大模型。 ...
反超OpenAI,百川宣布开源医疗大模型发布
Xin Lang Ke Ji· 2025-08-11 05:25
Group 1 - Baichuan Intelligent has launched the open-source medical enhancement model Baichuan-M2, claiming to surpass OpenAI's latest models in deployment cost and medical capabilities, achieving the top position among all open-source models globally [1][4] - Baichuan-M2 scored 60.1 on HealthBench, outperforming OpenAI's latest open-source model gpt-oss120b, which scored 57.6, as well as other models like Qwen3-235B and Deepseek R1 [1] - The model has been optimized for extreme lightweight deployment, allowing it to be run on a single RTX 4090 card, reducing costs by 57 times compared to the dual-node deployment of DeepSeek-R1 H20 [4] Group 2 - Baichuan-M2 MTP version, optimized for higher interaction speed in emergency and outpatient scenarios, achieved a 74.9% increase in token processing speed in single-user settings [4]
现在就等梁文锋了
投资界· 2025-08-10 07:45
Core Insights - The article discusses the recent advancements in AI technology, particularly focusing on the competitive landscape among major players like OpenAI, Google, and Anthropic, highlighting their latest model releases and innovations [5][10][11]. Group 1: OpenAI Developments - OpenAI has released its first open-weight large language models, gpt-oss-120b and gpt-oss-20b, with parameters of 117 billion and 21 billion respectively, designed for local deployment [13][19]. - The gpt-oss-120b model achieves performance close to OpenAI's o4-mini on core reasoning benchmarks and can run efficiently on a single 80 GB GPU [13][19]. - The release aims to address local deployment needs and market demands, although it includes restrictions on commercial use for entities with annual revenues exceeding $100 million or daily active users over 1 million [19][26]. Group 2: Google Innovations - Google introduced Genie 3, a groundbreaking model that allows users to generate interactive 3D virtual worlds from text prompts, achieving 720p resolution at 24 FPS [27][28]. - The model requires precise physical feedback and interaction, presenting significant technical challenges, but has the potential to revolutionize fields like robotics and gaming if successfully developed [29][30]. - Despite its impressive capabilities, Genie 3 is currently in the demonstration phase and not available for public testing, indicating it remains a future prospect [30]. Group 3: Anthropic's Strategy - Anthropic has updated its top-tier model, Claude Opus 4.1, which reportedly improves AI programming capabilities by 2%, reflecting the current upper limit of AI coding abilities [34][38]. - The model's performance metrics show it has the highest market share and reputation in AI coding, positioning Anthropic as a strong competitor against OpenAI and Google [38][39]. - The focus on enhancing programming capabilities allows Anthropic to maintain relevance in the competitive landscape of large model commercialization [38]. Group 4: Contributions from Chinese Scientists - The article highlights the significant contributions of Chinese scientists and engineers in the development of these AI models, particularly within OpenAI and Google [40][42]. - Key figures include Ren Hongyu, who worked on language model training optimization at OpenAI, and Emma Wang, who contributed to the design and optimization of Genie 3 at Google [42][46].
三位90后,估值700亿
投资界· 2025-08-10 07:45
Core Viewpoint - The article highlights the rapid rise of Mistral AI, a startup founded by three young graduates, which has achieved a remarkable valuation of approximately $10 billion within two years, showcasing the explosive growth potential in the AI sector [2][6][12]. Group 1: Company Overview - Mistral AI was founded by three 90s graduates who previously worked at top AI firms and returned to France to capitalize on the AI revolution [6][8]. - The company launched its first open-source large model, Mistral 7B, which outperformed competitors in several benchmark tests, quickly gaining attention in the developer community [6][7]. - Mistral AI aims to lead the generative AI wave through open-source initiatives, contrasting with closed models from competitors like OpenAI [6][7]. Group 2: Funding and Valuation - Mistral AI completed a record seed round of $1.13 billion shortly after its establishment, achieving a valuation of over $2.6 billion [10]. - By the end of 2023, the company raised $415 million in Series A funding, increasing its valuation to $2 billion, and later secured $640 million in Series B funding, bringing its valuation to $6 billion [11][12]. - The latest funding round discussions could potentially elevate Mistral's valuation to around $10 billion, with significant interest from major investors [12][13]. Group 3: Competitive Landscape - The AI landscape is becoming increasingly competitive, with the emergence of other open-source models like DeepSeek, which has gained significant traction [7][8]. - Mistral AI has launched several products, including a chatbot and a reasoning model, to compete directly with other players in the market [8]. - Despite initial success in France, Mistral's international performance has been mixed, indicating challenges in scaling beyond local markets [8]. Group 4: Industry Trends - The article notes a trend of young entrepreneurs in the AI sector, with many 90s graduates leading startups that are rapidly gaining valuations and market presence [14][16]. - The rise of AI is compared to the historical impact of electricity, suggesting that AI will significantly influence GDP across nations [13].
中国“霸榜”全球开源大模型:光环下的隐忧与挑战
Zheng Quan Shi Bao· 2025-08-06 18:37
Core Viewpoint - The recent surge in open-source AI models in China is reshaping the global AI landscape, with significant implications for technology influence and application acceleration, while also presenting challenges related to model iteration and compatibility costs [1][2][3]. Group 1: Open-source Model Surge - In the past two weeks, Alibaba's Tongyi Qianwen has released six open-source models, marking a resurgence in China's large model development, reminiscent of the "hundred model battle" of 2023 [1]. - The recent open-source wave has seen major Chinese companies, including Alibaba and Tencent, rapidly releasing new models, with China occupying nine out of the top ten spots in the Hugging Face open-source model ranking [2]. - The success of DeepSeek is viewed as a turning point, prompting more Chinese companies to adopt open-source strategies and focus on model optimization and iteration [2]. Group 2: Competitive Landscape - The latest rankings from Chatbot Arena show Alibaba's Tongyi Qianwen 3 surpassing several closed-source models, indicating a shift towards open-source dominance in China [4]. - The divergence in paths between open-source and closed-source models is evident, with Chinese companies embracing open-source while U.S. firms lean towards closed-source strategies [4][5]. - Open-source models are seen as a way for latecomers in the AI field to break the dominance of established players, allowing for rapid optimization and ecosystem development [5]. Group 3: Challenges and Concerns - The rapid iteration of open-source models has led to a phenomenon of "tuning internal competition" and homogenization, raising concerns about a lack of disruptive innovation [7][8]. - Developers face challenges with frequent updates and compatibility issues, leading to increased adaptation costs and potential innovation stagnation [8]. - Experts suggest the need for unified API standards and a focus on foundational research to avoid low-level repetitive construction and to foster genuine algorithmic breakthroughs [8].
安联锐视:前端IPC或后端NVR可以接入通义千问、DeepSeek等开源大模型
Mei Ri Jing Ji Xin Wen· 2025-08-06 13:27
Group 1 - The company emphasizes the importance of product intelligence and is integrating with open-source large models such as Tongyi Qianwen and DeepSeek for its front-end IPC and back-end NVR systems [2] - The company is collaborating with Guangzhou Potential Space Technology Co., Ltd. to develop products that interface with the Volcano Vision large model, initially promoting applications like AI store inspections [2] - The company's subsidiary, Zhejiang Anxing Yulian Robot Co., Ltd., is developing intelligent agents primarily for government departments [2]
欢迎OpenAI重返开源大模型赛道,谈一谈我关注的一些要点
3 6 Ke· 2025-08-06 07:55
Core Viewpoint - OpenAI has released two open-source large models, GPT-OSS 120B and GPT-OSS 20B, marking its return to the open-source arena after a six-year hiatus, driven by competitive pressures and the need to cater to enterprise clients who prioritize data security [1][4][5]. Group 1: OpenAI's Shift to Open Source - OpenAI's name originally signified "openness" and "open source," but it deviated from this path since early 2019, limiting the release of its models due to "safety concerns" [1][2]. - OpenAI is now one of the few leading AI developers without any new open-source models until the recent release, alongside Anthropic, which has also not released open-source models [2][5]. Group 2: Reasons for Open Sourcing - Open-sourcing allows clients to run models locally, enhancing data security by keeping sensitive information off third-party platforms, which is crucial for industries like government and finance [3][4]. - Clients can fine-tune open-source models to meet specific industry needs, making them more attractive for sectors with complex requirements [3][4]. Group 3: Competitive Landscape - The release of GPT-OSS is seen as a response to competitors like Meta's LLaMA series and DeepSeek, which have gained traction in the enterprise market due to their open-source nature [4][5]. - The global landscape now features only two major developers without open-source versions, highlighting a significant shift towards open-source models in the industry [5]. Group 4: Technical Insights - GPT-OSS models are comparable in performance to GPT-4o3 and utilize a mixed expert architecture, which is a common approach among leading models [6][7]. - The training of GPT-OSS utilized significant computational resources, with the 120B parameter version consuming 2.1 million H100 GPU hours, indicating a substantial investment in infrastructure [9][10]. Group 5: Limitations of Open Source - GPT-OSS is described as an "open weight" model rather than a fully open-source model, lacking comprehensive training details and proprietary tools used in its development [8][9]. - The release of GPT-OSS does not include the latest advancements or training methodologies, limiting its impact on the broader AI development landscape [6][10].