Seek .(SKLTY)
Search documents
AI数学天花板来了?DeepSeek新模型低调开源,网友直呼:R2指日可待!
Hua Er Jie Jian Wen· 2025-04-30 12:52
就在所有人都在期待DeepSeek官宣R2大模型之际,公司却出其不意地在"五一"前夕投下了另一枚技术炸弹。 4月30日,DeepSeek在Hugging Face平台上悄然开源了其最新模型——DeepSeek-Prover-V2-671B,一个专注于数学定理证明的大语言模型,专门针 对形式化数学证明任务进行优化。 DeepSeek-Prover-V2-671B使用了DeepSeek-V3架构,参数高达6710亿,采用MoE(混合专家)模式,具有61层Transformer层,7168维隐藏层。 | Hugging Face Q. Search models, datasets, users ... | | Models | ■ Datasets ■ Spaces Posts | Docs | Enterprise | Pricing | VII | Log In Sign Up | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | < deepseek-ai/DeepSeek-Prover-V2-671B = 0 Wke 152 | Follo ...
华为郭振兴: DeepSeek浪潮后,AI将快速释放巨大的制造业生产红利 | 最前线
3 6 Ke· 2025-04-30 09:48
Group 1 - Huawei hosted the AI + Manufacturing Industry Summit 2025 in Guangzhou, focusing on accelerating industry intelligence with over 900 attendees from various manufacturing sectors [1] - Huawei introduced a "three-layer, five-step, eight-phase" methodology and shared 20 solutions across seven key scenarios in the manufacturing sector [1] - The company emphasized its full-stack AI infrastructure, which can adapt flexibly to multiple manufacturing scenarios, lowering the threshold for AI adoption [1] Group 2 - In the automotive sector, Huawei's collaboration with GAC Group has significantly reduced the vehicle development cycle from 36 months to 18 months using AI models and development toolchains [1] - Huawei's software development cycle has improved from 9-18 months to one month per release by integrating over 13 million high-value documents and 850+ open-source code repositories into its data platform [2] - By 2025, over 300 enterprises are expected to have plans for large model deployment, indicating a surge in demand for AI capabilities in manufacturing [2] Group 3 - Huawei has adapted its DeepSeek solution across various scenarios, including pre-training and reinforcement learning, to help clients complete secondary training quickly [3] - The company has optimized the performance of various models in the Ascend environment, with over 100 manufacturing partners already utilizing the DeepSeek solution [3] - Huawei aims to provide end-to-end full-stack infrastructure to support enterprises' digital transformation by focusing on data management and intelligent connectivity [3]
从DeepSeek到硬科技:国中资本的投资新视野 | 投资人:快答2025
Sou Hu Cai Jing· 2025-04-30 06:29
Group 1 - The article discusses the transformative impact of artificial intelligence and hard technology on global industry dynamics, highlighting the emergence of DeepSeek as a milestone event in AI development [2][3][4] - The investment landscape is shifting, with a focus on hard technology sectors such as new energy vehicles, semiconductors, healthcare, and advanced manufacturing, which are aligned with national strategic emerging industries [7][8] - The importance of the entrepreneur's qualities, referred to as the "five powers," is emphasized as a non-traditional metric for evaluating investment projects, focusing on mission-driven motivation, problem-solving ability, innovative vitality, perseverance, and grounded determination [5][6][8] Group 2 - The investment strategy for 2025 includes a commitment to exploring new opportunities in hard technology, particularly in sectors that align with national priorities and demonstrate innovation and technological advantages [7][8] - The article suggests that the investment community should not be limited by traditional models and should embrace new paths for investment strategy, as demonstrated by the emergence of DeepSeek [10][11] - The evolving relationship between government funding and venture capital is highlighted, indicating a shift in collaboration models as both parties adapt to new requirements and environments [12] Group 3 - The outlook for small and medium enterprises (SMEs) in China is optimistic, supported by favorable government policies and a focus on building core competencies and adaptability in entrepreneurs [13] - The fundraising environment is evolving, with a focus on diverse fund types, including venture capital, angel funds, and acquisition funds, as the industry adapts to current market conditions [14] - The article emphasizes the enduring significance of consumer investment, which is expected to recover and grow in 2025, driven by technological advancements and changing consumer demands [18][19] Group 4 - The investment philosophy of the company is rooted in value investing, emphasizing the importance of sustainable business models that can generate returns without relying solely on external funding [15] - The company expresses a consistently optimistic outlook for the investment landscape, focusing on long-term growth opportunities and the potential for innovation across various sectors [20]
Qwen3深夜炸场,阿里一口气放出8款大模型,性能超越DeepSeek R1,登顶开源王座
3 6 Ke· 2025-04-29 09:53
Core Insights - The release of Qwen3 marks a significant advancement in open-source AI models, featuring eight hybrid reasoning models that rival proprietary models from OpenAI and Google, and surpass the open-source DeepSeek R1 model [4][24]. - Qwen3-235B-A22B is the flagship model with 235 billion parameters, demonstrating superior performance in various benchmarks, particularly in software engineering and mathematics [2][4]. - The Qwen3 series introduces a unique dual reasoning mode, allowing the model to switch between deep reasoning for complex problems and quick responses for simpler queries [8][21]. Model Performance - Qwen3-235B-A22B achieved a score of 95.6 in the ArenaHard test, outperforming OpenAI's o1 (92.1) and DeepSeek's R1 (93.2) [3]. - Qwen3-30B-A3B, with 30 billion parameters, also shows strong performance, scoring 91.0 in ArenaHard, indicating that smaller models can still achieve competitive results [6][20]. - The models have been trained on approximately 36 trillion tokens, nearly double the data used for the previous Qwen2.5 model, enhancing their capabilities across various domains [17][18]. Model Architecture and Features - Qwen3 employs a mixture of experts (MoE) architecture, activating only about 10% of its parameters during inference, which significantly reduces computational costs while maintaining high performance [20][24]. - The series includes six dense models ranging from 0.6 billion to 32 billion parameters, catering to different user needs and computational resources [5][6]. - The models support 119 languages and dialects, broadening their applicability in global contexts [12][25]. User Experience and Accessibility - Qwen3 is open-sourced under the Apache 2.0 license, making it accessible for developers and researchers [7][24]. - Users can easily switch between reasoning modes via a dedicated button on the Qwen Chat website or through commands in local deployments [10][14]. - The model has received positive feedback from users for its quick response times and deep reasoning capabilities, with notable comparisons to other models like Llama [25][28]. Future Developments - The Qwen team plans to focus on training models capable of long-term reasoning and executing real-world tasks, indicating a commitment to advancing AI capabilities [32].
DeepSeek-R2发布在即,参数量翻倍,华为昇腾芯片利用率达82%!
Sou Hu Cai Jing· 2025-04-29 07:17
Core Insights - The next-generation AI model DeepSeek-R2 is set to be released, featuring advanced parameters and architecture [1][5] - DeepSeek-R2 will utilize a hybrid expert model (MoE) with an intelligent gating network, significantly enhancing performance for high-load inference tasks [5] - The total parameter count for DeepSeek-R2 is expected to reach 1.2 trillion, doubling the 671 billion parameters of DeepSeek-R1, making it comparable to GPT-4 Turbo and Google's Gemini 2.0 Pro [5] Cost Efficiency - DeepSeek-R2's unit inference cost is projected to decrease by 97.4% compared to GPT-4, costing approximately $0.07 per million tokens, while GPT-4 costs $0.27 per million tokens [8] - The model's cost efficiency is attributed to the use of Huawei's Ascend 910B chip cluster, which achieves a computational performance of 512 PetaFLOPS with an 82% resource utilization rate [7][8] Hardware and Infrastructure - DeepSeek-R2's training framework is based on Huawei's Ascend 910B chip cluster, which has been validated to deliver 91% of the performance of NVIDIA's previous A100 training cluster [7] - The introduction of Huawei's Ascend 910C chip, which is entering mass production, may provide a domestic alternative to NVIDIA's high-end AI chips, enhancing hardware autonomy in China's AI sector [10]
阿里发布并开源模型Qwen3,成本仅为DeepSeek-R1的1/3
Guan Cha Zhe Wang· 2025-04-29 03:27
Core Insights - Alibaba has launched the new Qwen3 model, which is claimed to be the world's strongest open-source model, outperforming leading models like DeepSeek-R1 and OpenAI-o1 while significantly reducing costs and computational power requirements [1][3]. Model Performance - The flagship version Qwen3-235B-A22B achieved high scores in various benchmark tests, including 81.5 in the AIME25 assessment, over 70 in the LiveCodeBench evaluation, and 95.6 in the ArenaHard test, surpassing competitors [1][2]. - Qwen3's total parameter count is 235 billion, setting a new standard for open-source model intelligence, and it can be deployed with only four H20 GPUs, utilizing one-third of the memory compared to similar performance models [3]. Model Variants and Accessibility - Qwen3 includes multiple versions, such as 30B and 235B MoE models, along with several dense models ranging from 0.6B to 32B [3]. - The model supports over 119 languages and is available under a permissive Apache 2.0 license, allowing global developers and organizations to download and commercialize it for free [6]. Industry Impact - Qwen3 is expected to enhance the capabilities of intelligent agents and large model applications, lowering the barriers for utilizing agent tools [6]. - Alibaba has released over 200 models, with a global download count exceeding 300 million, establishing Qwen3 as the leading open-source model, surpassing the U.S. Llama [6].
阿里Qwen3性能超越DeepSeek-R1;美媒曝马斯克孩子数量远超14个;ChatGPT推出购物功能
Guan Cha Zhe Wang· 2025-04-29 01:10
Group 1: Stock Market Performance - The three major U.S. stock indices closed mixed, with the Dow Jones up by 0.28% and the S&P 500 up by 0.06%, while the Nasdaq fell by 0.1% [1] - Major tech stocks showed varied performance, with Intel rising over 2%, while Nvidia dropped over 2% [1] Group 2: AI and Technology Developments - Alibaba's Qwen3 has been released as an open-source model, surpassing competitors with a total of 235 billion parameters [2] - Apple CEO Tim Cook has reorganized the company's robotics team, indicating dissatisfaction with the progress in AI and machine learning [6][5] - OpenAI is enhancing its ChatGPT tool to facilitate online shopping, allowing users to purchase products directly through the platform [7] Group 3: Investment and Financial Activities - Amazon launched its first batch of satellites for the "Project Kuiper" internet initiative, aiming to deploy over 3,200 satellites for global internet coverage [7] - Alphabet plans to issue approximately $4 billion in high-grade corporate bonds, with the longest maturity potentially yielding 1% to 1.05% above U.S. Treasury rates [7] - Over 700 billion yuan has been allocated by local government funds towards humanoid robotics and related industries [8] Group 4: Company Listings and IPOs - Seres has applied for a mainboard listing in Hong Kong, with projected revenue of 145.1 billion yuan in 2024, marking a 305.5% year-on-year increase [9] - Stone Technology is considering an IPO in Hong Kong to raise up to $500 million, although plans are still in the early stages [10]
阿里发布并开源千问3,称成本仅需DeepSeek-R1三分之一
Di Yi Cai Jing· 2025-04-29 00:33
Core Insights - Alibaba Cloud has launched the new Qwen3 model, which is the first "hybrid reasoning model" in China, integrating "fast thinking" and "slow thinking" into a single model, significantly reducing deployment costs and enhancing performance compared to previous models [1][4] Group 1: Model Performance and Architecture - Qwen3 features a total parameter count of 235 billion, with only 22 billion activated, and utilizes a mixture of experts (MoE) architecture [2][3] - The model has achieved a performance leverage of over 10 times with its 30B parameter MoE model, requiring only 3 billion to match the performance of the previous Qwen2.5-32B model [3] - Qwen3 has outperformed global top models like DeepSeek-R1 and OpenAI-o1 in various benchmarks, securing its position as the strongest open-source model globally [1][2] Group 2: Cost Efficiency and Deployment - The deployment cost for Qwen3 has significantly decreased, requiring only 4 H20 units for full deployment, with memory usage being one-third of that of DeepSeek-R1 [1][3] - All Qwen3 models are hybrid reasoning models, allowing users to set a "thinking budget" for performance and cost optimization in AI applications [3][4] Group 3: Future Developments and Goals - Future enhancements for Qwen3 will focus on expanding data scale, increasing model size, extending context length, and broadening modality range, while leveraging environmental feedback for long-term reasoning [4] - The Qwen3 team views this launch as a significant milestone towards achieving general artificial intelligence (AGI) and superintelligent AI (ASI) [4]
阿里开源千问3模型 成本仅需DeepSeek-R1三分之一
2 1 Shi Ji Jing Ji Bao Dao· 2025-04-29 00:24
Core Insights - Alibaba has open-sourced over 200 models, achieving a global download count exceeding 300 million, with over 100,000 derivative models of Qwen [6] - The newly released Qwen3 model features a parameter count of 235 billion, significantly reducing costs while outperforming leading models like DeepSeek-R1 and OpenAI-o1 [1][2] Performance Enhancements - Qwen3 has shown substantial improvements in reasoning, instruction adherence, tool invocation, and multilingual capabilities, setting new performance records among domestic and global open-source models [2] - In the AIME25 evaluation, Qwen3 scored 81.5, surpassing previous open-source records, and achieved over 70 points in the LiveCodeBench assessment, outperforming Grok3 [2][3] Model Architecture - Qwen3 employs a mixed expert (MoE) architecture, activating only 22 billion parameters out of 235 billion, which allows for efficient performance with reduced computational costs [1][2] - The model offers various versions, including 30B and 235B MoE models, as well as dense models ranging from 0.6B to 32B, all achieving state-of-the-art performance for their sizes [4] Application and Accessibility - Qwen3 supports the upcoming surge in intelligent agents and large model applications, with a BFCL evaluation score of 70.8, surpassing top models like Gemini2.5-Pro and OpenAI-o1 [5] - The model is open-sourced under the Apache 2.0 license, supporting over 119 languages, and is available for free download on platforms like HuggingFace and Alibaba Cloud [5][6]
超越DeepSeek?巨头们不敢说的技术暗战
3 6 Ke· 2025-04-29 00:15
Group 1: DeepSeek-R1 Model and MLA Technology - The launch of the DeepSeek-R1 model represents a significant breakthrough in AI technology in China, showcasing a competitive performance comparable to industry leaders like OpenAI, with a 30% reduction in required computational resources compared to similar products [1][3] - The multi-head attention mechanism (MLA) developed by the team has achieved a 50% reduction in memory usage, but this has also increased development complexity, extending the average development cycle by 25% in manual optimization scenarios [2][3] - DeepSeek's unique distributed training framework and dynamic quantization technology have improved inference efficiency by 40% per unit of computing power, providing a case study for the co-evolution of algorithms and system engineering [1][3] Group 2: Challenges and Innovations in AI Infrastructure - The traditional fixed architecture, especially GPU-based systems, faces challenges in adapting to the rapidly evolving demands of modern AI and high-performance computing, often requiring significant hardware modifications [6][7] - The energy consumption of AI data centers is projected to rise dramatically, with future power demands expected to reach 600kW per cabinet, contrasting sharply with the current capabilities of most enterprise data centers [7][8] - The industry is witnessing a shift towards intelligent software-defined hardware platforms that can seamlessly integrate existing solutions while supporting future technological advancements [6][8] Group 3: Global AI Computing Power Trends - Global AI computing power spending has surged from 9% in 2016 to 18% in 2022, with expectations to exceed 25% by 2025, indicating a shift in computing power from infrastructure support to a core national strategy [9][11] - The scale of intelligent computing power has increased significantly, with a 94.4% year-on-year growth from 232EFlops in 2021 to 451EFlops in 2022, surpassing traditional computing power for the first time [10][11] - The competition for computing power is intensifying, with major players like the US and China investing heavily in infrastructure to secure a competitive edge in AI technology [12][13] Group 4: China's AI Computing Landscape - China's AI computing demand is expected to exceed 280EFLOPS by the end of 2024, with intelligent computing accounting for over 30%, driven by technological iterations and industrial upgrades [19][21] - The shift from centralized computing pools to distributed computing networks is essential to meet the increasing demands for real-time and concurrent processing in various applications [20][21] - The evolution of China's computing industry is not merely about scale but involves strategic breakthroughs in technology sovereignty, industrial security, and economic resilience [21]