Seek .(SKLTY)

Search documents
阿里发布并开源千问3,称成本仅需DeepSeek-R1三分之一
Di Yi Cai Jing· 2025-04-29 00:33
Core Insights - Alibaba Cloud has launched the new Qwen3 model, which is the first "hybrid reasoning model" in China, integrating "fast thinking" and "slow thinking" into a single model, significantly reducing deployment costs and enhancing performance compared to previous models [1][4] Group 1: Model Performance and Architecture - Qwen3 features a total parameter count of 235 billion, with only 22 billion activated, and utilizes a mixture of experts (MoE) architecture [2][3] - The model has achieved a performance leverage of over 10 times with its 30B parameter MoE model, requiring only 3 billion to match the performance of the previous Qwen2.5-32B model [3] - Qwen3 has outperformed global top models like DeepSeek-R1 and OpenAI-o1 in various benchmarks, securing its position as the strongest open-source model globally [1][2] Group 2: Cost Efficiency and Deployment - The deployment cost for Qwen3 has significantly decreased, requiring only 4 H20 units for full deployment, with memory usage being one-third of that of DeepSeek-R1 [1][3] - All Qwen3 models are hybrid reasoning models, allowing users to set a "thinking budget" for performance and cost optimization in AI applications [3][4] Group 3: Future Developments and Goals - Future enhancements for Qwen3 will focus on expanding data scale, increasing model size, extending context length, and broadening modality range, while leveraging environmental feedback for long-term reasoning [4] - The Qwen3 team views this launch as a significant milestone towards achieving general artificial intelligence (AGI) and superintelligent AI (ASI) [4]
阿里开源千问3模型 成本仅需DeepSeek-R1三分之一
2 1 Shi Ji Jing Ji Bao Dao· 2025-04-29 00:24
Core Insights - Alibaba has open-sourced over 200 models, achieving a global download count exceeding 300 million, with over 100,000 derivative models of Qwen [6] - The newly released Qwen3 model features a parameter count of 235 billion, significantly reducing costs while outperforming leading models like DeepSeek-R1 and OpenAI-o1 [1][2] Performance Enhancements - Qwen3 has shown substantial improvements in reasoning, instruction adherence, tool invocation, and multilingual capabilities, setting new performance records among domestic and global open-source models [2] - In the AIME25 evaluation, Qwen3 scored 81.5, surpassing previous open-source records, and achieved over 70 points in the LiveCodeBench assessment, outperforming Grok3 [2][3] Model Architecture - Qwen3 employs a mixed expert (MoE) architecture, activating only 22 billion parameters out of 235 billion, which allows for efficient performance with reduced computational costs [1][2] - The model offers various versions, including 30B and 235B MoE models, as well as dense models ranging from 0.6B to 32B, all achieving state-of-the-art performance for their sizes [4] Application and Accessibility - Qwen3 supports the upcoming surge in intelligent agents and large model applications, with a BFCL evaluation score of 70.8, surpassing top models like Gemini2.5-Pro and OpenAI-o1 [5] - The model is open-sourced under the Apache 2.0 license, supporting over 119 languages, and is available for free download on platforms like HuggingFace and Alibaba Cloud [5][6]
超越DeepSeek?巨头们不敢说的技术暗战
3 6 Ke· 2025-04-29 00:15
Group 1: DeepSeek-R1 Model and MLA Technology - The launch of the DeepSeek-R1 model represents a significant breakthrough in AI technology in China, showcasing a competitive performance comparable to industry leaders like OpenAI, with a 30% reduction in required computational resources compared to similar products [1][3] - The multi-head attention mechanism (MLA) developed by the team has achieved a 50% reduction in memory usage, but this has also increased development complexity, extending the average development cycle by 25% in manual optimization scenarios [2][3] - DeepSeek's unique distributed training framework and dynamic quantization technology have improved inference efficiency by 40% per unit of computing power, providing a case study for the co-evolution of algorithms and system engineering [1][3] Group 2: Challenges and Innovations in AI Infrastructure - The traditional fixed architecture, especially GPU-based systems, faces challenges in adapting to the rapidly evolving demands of modern AI and high-performance computing, often requiring significant hardware modifications [6][7] - The energy consumption of AI data centers is projected to rise dramatically, with future power demands expected to reach 600kW per cabinet, contrasting sharply with the current capabilities of most enterprise data centers [7][8] - The industry is witnessing a shift towards intelligent software-defined hardware platforms that can seamlessly integrate existing solutions while supporting future technological advancements [6][8] Group 3: Global AI Computing Power Trends - Global AI computing power spending has surged from 9% in 2016 to 18% in 2022, with expectations to exceed 25% by 2025, indicating a shift in computing power from infrastructure support to a core national strategy [9][11] - The scale of intelligent computing power has increased significantly, with a 94.4% year-on-year growth from 232EFlops in 2021 to 451EFlops in 2022, surpassing traditional computing power for the first time [10][11] - The competition for computing power is intensifying, with major players like the US and China investing heavily in infrastructure to secure a competitive edge in AI technology [12][13] Group 4: China's AI Computing Landscape - China's AI computing demand is expected to exceed 280EFLOPS by the end of 2024, with intelligent computing accounting for over 30%, driven by technological iterations and industrial upgrades [19][21] - The shift from centralized computing pools to distributed computing networks is essential to meet the increasing demands for real-time and concurrent processing in various applications [20][21] - The evolution of China's computing industry is not merely about scale but involves strategic breakthroughs in technology sovereignty, industrial security, and economic resilience [21]
比DeepSeek R2先发!阿里巴巴Qwen3上新8款,登顶全球最强开源模型
Tai Mei Ti A P P· 2025-04-28 23:27
Qwen3千呼万唤始出来,直接登顶全球最强开源模型。 4月29日凌晨,阿里巴巴开源新一代通义千问模型Qwen3(简称千问3),旗舰模型Qwen3-235B-A22B参 数量仅为DeepSeek-R1的1/3,总参数量235B,激活仅需22B,成本大幅下降,性能全面超越R1、 OpenAI-o1等全球顶尖模型,登顶全球最强开源模型。 | | Qwen3-235B-A228 | Qwen3-32B | OpenAl-o1 | Deepseek R1 | Grok 3 Beta | Gemini2.5-Pro | OpenAl-o3-mini | | --- | --- | --- | --- | --- | --- | --- | --- | | | Mil | Deces | 2024.12.17 | | Think | | Median | | ArenaHard | 95.6 | 93.8 | 92.1 | 93.2 | . | 96.4 | 89.0 | | AIME'24 | 85.7 | 81.4 | 74.3 | 79.8 | 83.9 | 92.0 | 79.6 | | AIME'25 | 81. ...
阿里Qwen3深夜开源,8款模型、集成MCP,性能超DeepSeek-R1,2小时狂揽16.9k星
3 6 Ke· 2025-04-28 23:23
Core Insights - Alibaba Cloud has officially open-sourced the Qwen3 series models, which include 2 MoE models and 6 dense models, achieving over 16.9k stars on GitHub within 2 hours of release [2][3] Model Features - The Qwen3 series features 8 parameter sizes ranging from 0.6B to 235B, with flagship models like Qwen3-235B-A22B and Qwen3-30B-A3B showcasing significant capabilities in programming, mathematics, and general reasoning [4][12] - The introduction of a hybrid thinking mode allows users to switch between "thinking" and "non-thinking" modes, enabling control over the depth of reasoning [15][16] - Enhanced reasoning capabilities surpass previous models in mathematics, code generation, and common-sense logic [4][15] Performance Metrics - Qwen3 models have demonstrated superior performance in various benchmarks compared to well-known models such as DeepSeek-R1 and OpenAI's models [12][13] - The Qwen3-30B-A3B model achieves performance exceeding that of QwQ-32B while using only 1/10 of the activated parameters [11][12] - The pre-training dataset for Qwen3 has doubled in size to approximately 3600 billion tokens, enhancing its capabilities in STEM and programming tasks [20][21] Deployment and Accessibility - The Qwen3 models are open-sourced on platforms like Hugging Face, ModelScope, and Kaggle, under the Apache 2.0 license [7] - Developers are encouraged to utilize various frameworks and tools for local deployment, including SGLang and vLLM [9] Future Directions - The company aims to continue enhancing model capabilities by optimizing architecture and training methods, focusing on expanding data scale, increasing model size, and improving long-term reasoning through reinforcement learning [24]
Deep Seek分析:未来5年,钱放黄金、存银行、买房哪个更划算?
Sou Hu Cai Jing· 2025-04-28 22:51
Group 1: Gold Investment - Gold prices have experienced significant fluctuations, with a notable increase during geopolitical tensions, such as a 12% rise during the escalation of the Russia-Ukraine conflict in 2024 [3] - The selling of physical gold can be challenging, as banks typically do not buy back gold bars, and gold shops offer significantly lower buyback prices compared to market value [3] - Investors should be cautious about blindly chasing gold prices, as high entry points can lead to long-term losses [3] Group 2: Real Estate Investment - Average housing prices have dropped by 30% compared to 2021, leading some to believe it is a good time to buy [5] - Despite the price drop, there are still bubbles in certain markets, such as Shanghai and Shenzhen, where the price-to-income ratio is as high as 40, indicating potential for further declines [5][7] - The demand for investment properties has decreased significantly, with many investors either selling or holding cash, suggesting that now may not be the best time for real estate investment [7] Group 3: Bank Deposits - Major banks have significantly reduced deposit interest rates since 2024, leading to lower returns for savers [7] - The purchasing power of savings is declining due to rising prices, making bank deposits less attractive compared to other investment options [7] - While bank deposits may result in slow asset depreciation, they are considered less risky compared to investments in gold and real estate [9] Group 4: Diversified Asset Allocation - A diversified asset allocation strategy is recommended to mitigate risks and enhance wealth preservation over the next five years [9] - An example of diversification includes splitting funds into three parts: one-third in low-risk investments like government bonds, another third in low-risk products like structured deposits, and the final third in medium-risk investments like mixed funds [9]
DeepSeek新一代大模型即将发布,推动低代码开发成主流
Xuan Gu Bao· 2025-04-28 15:09
Group 1 - DeepSeek's new model, DeepSeek R2, is expected to launch in early May and will reduce costs by 97% compared to GPT-4, utilizing Ascend cards for training [1] - DeepSeek R2 will feature a hybrid expert model (MoE) with a total parameter count of 1.2 trillion, doubling the parameters of DeepSeek R1, which had 671 billion [1] - The model aims to achieve breakthroughs in programming capabilities, multilingual reasoning, and higher accuracy at lower costs [1] Group 2 - Jin Modern is actively expanding its standardized, general-purpose software product business centered around an "AI low-code" development platform, having developed several standardized platform software products [2] - Haoyun Technology continues to invest in low-code technology research and development, with its low-code platform "Haoyida" deeply integrated with AI and IoT to customize AI agents for enterprises [2]
马蜂窝AI旅行助手官宣上线,DeepSeek大模型+垂直精调模型致力打破“幻觉”
Cai Jing Wang· 2025-04-28 08:32
Core Insights - The core focus of the news is the launch of the AI travel assistant "AI Xiao Ma" by Mafengwo, which aims to enhance travel planning by utilizing advanced AI technology and extensive travel data to provide reliable recommendations [1][7]. Group 1: AI Xiao Ma Features - "AI Xiao Ma" integrates the DeepSeek large model and Mafengwo's vertical fine-tuning model, leveraging over a decade of accumulated travel data to eliminate inaccuracies in travel recommendations [1]. - The assistant offers functionalities such as real-time Q&A, itinerary planning, online travel guidance, and personalized recommendations, accessible through the Mafengwo app [1]. - The AI aims to reduce the average browsing time for travel planning significantly, with users previously spending an average of 62.5 minutes for trips to Xinjiang and 90.4 minutes for trips to Australia [7]. Group 2: AI Lu Shu Customization - The newly launched "AI Lu Shu" product differs from other AI tools by actively asking users questions to better understand their travel needs, ensuring a more comprehensive and personalized travel plan [3][5]. - After users clarify their requirements, "AI Lu Shu" conducts in-depth research to create a tailored travel plan that includes itinerary, accommodation, transportation, attractions, dining, shopping, budget, and practical tips [5]. - The customization process allows users to adjust their preferences and confirm their needs before the AI generates the final travel plan, enhancing user engagement and satisfaction [3][5]. Group 3: Reliability and User Experience - The dual model architecture of "AI Xiao Ma" ensures both efficiency and reliability, cross-verifying AI-generated recommendations with verified travel content from Mafengwo's database [7]. - The AI assistant addresses common pain points for independent travelers, such as inefficient route planning and discrepancies between social media images and actual experiences [7]. - Mafengwo emphasizes that true intelligence in AI lies in enhancing the travel experience, with ongoing updates based on user interactions to continuously improve the service [9].
DeepSeek-R2大模型临近发布时间窗口!科创板人工智能ETF(588930)低位上涨翻红,实时成交额突破3600万元
Sou Hu Cai Jing· 2025-04-28 03:19
消息面上,三个月前的春节,DeepSeek R1火爆出圈,让"东升西落"成为了随后一段时间的叙事和预 期。三个月后五一临近。DeepSeek R2临近发布此前市场预期的"5月发布"时间窗口。人工智能板块有望 迎来强烈催化剂。当前高层集体学习人工智能释放出强烈的政策升级信号,叠加DeepSeek R2等国产大 模型迭代节点临近,科技板块有望在政策与技术的双重驱动下重获资金关注。 4月28日,A股市场今日走势较稳,人工智能题材V型反弹,市场风险偏好快速提升。科创板人工智能指 数成分股中,恒玄科技、当虹科技、奥普特涨超3%,有方科技、寒武纪-U、中科星图、虹软科技、道 通科技涨超2%。科创板人工智能ETF(588930)市场热度较高,连续2个交易日获得资金净流入。 科创板人工智能ETF(588930)跟踪的科创板人工智能指数布局30只科创板人工智能龙头,覆盖AI产业 链上游算力、中游大模型云计算、下游机器人等各类创新应用,聚焦电子、计算机、机械设备、家电、 通信五大行业,前五大成分股合计权重47%,或具有较高的AI主题纯度和更高的弹性。 国元证券表示,国家高度重视人工智能产业的发展,长期成长空间广阔。尽管计算机行业 ...
宝马中国宣布接入DeepSeek,传日产计划关闭武汉工厂 | 汽车早参
Mei Ri Jing Ji Xin Wen· 2025-04-27 22:40
Group 1 - BMW China announced the integration of DeepSeek, enhancing its AI ecosystem and improving human-machine interaction in new models starting in Q3 2025 [1] - The partnership with Alibaba aims to boost product competitiveness and provide consumers with a stronger smart experience, potentially driving innovation in the high-end automotive sector [1] Group 2 - Nissan plans to close its Wuhan factory by March 31, 2026, due to low production capacity utilization, with only 10,000 units produced since its opening in 2022 [2] - Despite the closure, Nissan will invest $1.4 billion in China and launch around 10 new electrified models by 2027, indicating a commitment to transformation [2] Group 3 - Bosch and Remote have upgraded their strategic partnership to focus on comprehensive collaboration in hydrogen and electric technologies, aiming to develop a hydrogen-electric ecosystem [3] - The first batch of 1,000 new energy commercial vehicles developed jointly will be launched in domestic and international markets in the second half of this year [3] Group 4 - Ford has suspended exports of SUVs, pickups, and sports cars to China due to a significant increase in import tariffs, affecting approximately 5,500 vehicles in 2024 [4] - The decision reflects the challenges faced by American automakers due to retaliatory tariffs, impacting market share and brand perception in China [4] Group 5 - Volkswagen will maintain vehicle prices in the U.S. until the end of May, addressing concerns over rising costs due to tariffs and high inflation [5] - This decision aligns with strategies from other automakers to alleviate consumer purchasing pressure in a high-interest-rate environment [5]