GPT 4.5
Search documents
吴恩达:图灵测试不够用了,我会设计一个AGI专用版
量子位· 2026-01-10 03:07
光看名字就知道,这个测试专为AGI而生。 去年是AGI水涨船高的一年,吴恩达在其年度总结中也曾表示: 鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 新年新气象!AI大神 吴恩达 2026年目标公开: 要做一个新的图灵测试,他称之为 图灵-AGI测试 。 2025年或许会被铭记为 人工智能工业时代的开端 。 创新推动模型性能到达新的高度,AI驱动的应用变得不可或缺,顶尖企业人才争夺激烈,基础设施建设推动社会生产总值增长。 学术界和工业界频繁提及AGI概念,硅谷的公司也会为抢先AGI定下季度目标。 但关于AGI的定义至今还没有统一标准,现有基准测试还常常误导大众,使其高估当前的AI水平。 吴恩达注意到该趋势,于是新的图灵测试将试图弥补这一空白。 正如网友所言: 要衡量智能首先要定义智能。 图灵-AGI测试设想 传统的图灵测试在AGI时代显然不够用。 它由艾伦·图灵在上世纪五十年代提出,提出用人机对话来测试机器的智能水平。 在测试过程中,人类评估者需要确定他们是在与人还是与机器交谈。如果机器能够成功骗过评估者,那么就算通过了测试。 但现在的AI显然不再满足于简单的对话交互,而是要构建起经济有用的系统,所以亟 ...
Sebastian Raschka万字年终复盘:2025,属于「推理模型」的一年
机器之心· 2026-01-02 09:30
Core Insights - The AI field continues to evolve rapidly, with significant advancements in reasoning models and algorithms such as RLVR and GRPO, marking 2025 as a pivotal year for large language models (LLMs) [1][4][19] - DeepSeek R1's introduction has shifted the focus from merely stacking parameters to enhancing reasoning capabilities, demonstrating that high-performance models can be developed at a fraction of previously estimated costs [9][10][12] - The importance of collaboration between humans and AI is emphasized, reflecting on the boundaries of this partnership and the evolving role of AI in various tasks [1][4][66] Group 1: Reasoning Models and Algorithms - The year 2025 has been characterized as a "year of reasoning," with RLVR and GRPO algorithms gaining prominence in the development of LLMs [5][19] - DeepSeek R1's release showcased that reasoning behavior can be developed through reinforcement learning, enhancing the accuracy of model outputs [6][19] - The estimated training cost for the DeepSeek R1 model is significantly lower than previous assumptions, around $5.576 million, indicating a shift in cost expectations for advanced model training [10][12] Group 2: Focus Areas in LLM Development - Key focus areas for LLM development have evolved over the years, with 2025 emphasizing RLVR and GRPO, following previous years' focus on RLHF and LoRA techniques [20][22][24] - The trend of "Benchmaxxing" has emerged, highlighting the overemphasis on benchmark scores rather than real-world applicability of LLMs [60][63] - The integration of tools in LLM training has improved performance, allowing models to access external information and reduce hallucination rates [54][56] Group 3: Architectural Trends - The architecture of LLMs is converging towards using mixture of experts (MoE) layers and efficient attention mechanisms, indicating a shift towards more scalable and efficient models [43][53] - Despite advancements, traditional transformer architectures remain prevalent, with ongoing improvements in efficiency and engineering adjustments [43][53] Group 4: Future Directions - Future developments are expected to focus on expanding RLVR applications beyond mathematics and coding, incorporating reasoning evaluation into training signals [25][27] - Continuous learning is anticipated to gain traction, addressing challenges such as catastrophic forgetting while enhancing model adaptability [31][32] - The need for domain-specific data is highlighted as a critical factor for LLMs to establish a foothold in various industries, with proprietary data being a significant concern for companies [85][88]
DeepSeek终于丢了开源第一王座,但继任者依然来自中国
猿大侠· 2025-07-19 03:43
Core Viewpoint - Kimi K2 has surpassed DeepSeek to become the number one open-source model globally, ranking fifth overall, closely following top proprietary models like Musk's Grok 4 [1][18]. Group 1: Rankings and Performance - Kimi K2 achieved a score of 1420, placing it fifth in the overall rankings, with only a slight gap from leading proprietary models [2][21]. - The top ten models all scored above 1400, indicating that open-source models are increasingly competitive with proprietary ones [20][22]. - Kimi K2's performance in various categories includes tying for first in multi-turn dialogue and second in programming ability, matching models like GPT 4.5 and Grok 4 [3][18]. Group 2: Community Engagement and Adoption - Kimi K2 has gained significant attention in the open-source community, with 5.6K stars on GitHub and nearly 100,000 downloads on Hugging Face [5][4]. - The CEO of AI search engine startup Perplexity has publicly endorsed Kimi K2, indicating plans for further training based on this model [5][24]. Group 3: Architectural Decisions - Kimi K2 inherits the DeepSeek V3 architecture but includes several parameter adjustments to optimize performance [8][11]. - Key structural changes in Kimi K2 include increasing the number of experts, halving the number of attention heads, retaining only the first layer as dense, and implementing flexible routing for expert combinations [12][14]. - Despite an increase in total parameters by 1.5 times, the model's efficiency in prefill and decode times has improved, suggesting a cost-effective optimization strategy [13][14]. Group 4: Industry Perspectives - The perception that open-source models are inferior is being challenged, with industry experts predicting that open-source will increasingly outperform proprietary models [18][24]. - Tim Dettmers from the Allen Institute for AI and the CEO of Perplexity have both emphasized the growing importance of open-source models in shaping AI capabilities globally [24][25].
梁文锋等来及时雨
是说芯语· 2025-07-19 01:26
Core Viewpoint - The article discusses the competitive landscape of AI models, particularly focusing on DeepSeek and its challenges in maintaining user engagement and market position against emerging competitors like Kimi and others in the "AI Six Dragons" group [3][4][8]. Group 1: DeepSeek's Performance and Challenges - DeepSeek experienced a significant decline in monthly active users, dropping from a peak of 169 million in January to 160 million by May, a decrease of 5.1% [3][4]. - The app's download ranking has plummeted, falling out of the top 30 in the Apple App Store, indicating a loss of user interest [4]. - The user engagement rate for DeepSeek has decreased from 7.5% at the beginning of the year to 3% by the end of May, with website traffic also down by 29% [4][5]. Group 2: Competition and Market Dynamics - Competitors like Kimi and others are rapidly releasing new models, with Kimi K2 being highlighted for its performance and open-source nature, achieving state-of-the-art results in various benchmarks [10][11]. - The pricing strategy of Kimi K2 aligns closely with DeepSeek's, offering competitive rates for API usage, which could further erode DeepSeek's market share [11]. - Other players in the market are also emphasizing cost-effectiveness and performance, challenging DeepSeek's previously established reputation for value [10][11]. Group 3: Technological and Strategic Implications - DeepSeek's R2 model has faced delays due to supply chain issues related to the NVIDIA H20 chip, which has impacted its computational capabilities [5][7]. - The lack of significant updates to DeepSeek's models has led to a perception of stagnation, with competitors rapidly advancing in both performance and features [8][10]. - The article suggests that DeepSeek needs to quickly release new models and enhance its capabilities to regain market interest and user engagement [17][19].
DeepSeek终于丢了开源第一王座,但继任者依然来自中国
量子位· 2025-07-18 08:36
Core Viewpoint - Kimi K2 has surpassed DeepSeek to become the number one open-source model globally, ranking fifth overall, closely following top proprietary models like Musk's Grok 4 [1][19]. Group 1: Ranking and Performance - Kimi K2 achieved a score of 1420, placing it fifth in the overall ranking, with only a slight gap from leading proprietary models [2][22]. - The top ten models now all have scores above 1400, indicating that open-source models are increasingly competitive with proprietary ones [20][21]. Group 2: Community Engagement and Adoption - Kimi K2 has gained significant attention in the open-source community, with 5.6K stars on GitHub and nearly 100,000 downloads on Hugging Face [5][4]. - The CEO of AI search engine startup Perplexity has publicly endorsed Kimi K2, indicating its strong internal evaluation and future plans for further training based on this model [5][27]. Group 3: Model Architecture and Development - Kimi K2 inherits the DeepSeek V3 architecture but includes several parameter adjustments to optimize performance [9][12]. - Key modifications in Kimi K2's structure include increasing the number of experts, halving the number of attention heads, retaining only the first layer as dense, and implementing flexible expert routing [13][15]. Group 4: Industry Trends and Future Outlook - The stereotype that open-source models are inferior is being challenged, with industry experts predicting that open-source will increasingly outperform proprietary models [19][24]. - Tim Dettmers from the Allen Institute for AI suggests that open-source models defeating proprietary ones will become more common, highlighting their importance in localizing AI experiences [25][27].
梁文锋等来及时雨
36氪· 2025-07-16 10:19
Core Viewpoint - The article discusses the competitive landscape of AI large models, focusing on DeepSeek's challenges and the emergence of new players like Kimi, which are rapidly gaining market attention and user engagement [3][4][10]. Group 1: DeepSeek's Performance and Challenges - DeepSeek experienced a significant decline in monthly active users, dropping from a peak of 1.69 billion in May, reflecting a 5.1% decrease [4]. - The user engagement for DeepSeek has fallen from a peak of 7.5% in January to 3% by the end of May, with a 29% decrease in website traffic [4][5]. - The company has faced delays in launching its R2 model due to unexpected export restrictions on the H20 chip, which has limited its computational resources [5][8]. Group 2: Competitive Landscape - Other AI players, referred to as the "AI Six Dragons," are set to release new foundational models, intensifying competition against DeepSeek [3][4]. - Kimi's K2 model has achieved state-of-the-art performance in various benchmarks, surpassing DeepSeek in tasks related to coding and mathematical reasoning [14]. - The pricing strategy of Kimi K2 aligns closely with DeepSeek's API pricing, making it a direct competitor in terms of cost [15]. Group 3: Market Dynamics and User Preferences - DeepSeek's reputation for cost-effectiveness is being challenged as competitors like Alibaba, ByteDance, and Baidu offer lower-priced alternatives [13]. - The lack of significant upgrades in DeepSeek's models has led to a perception shift, with users increasingly viewing it as less competitive compared to newer models [12][13]. - The context window limitation of DeepSeek's models (64K) is significantly smaller than that of competitors like Kimi K2 (128K) and MiniMax-M1 (1 million), impacting its performance [22][23]. Group 4: Future Considerations - To regain market interest, DeepSeek must expedite the release of new models and enhance its capabilities, particularly in multi-modal functionalities, which are becoming increasingly important in the AI landscape [28][30]. - The article suggests that DeepSeek's focus on open-source development should also align with commercial viability to maintain user engagement and developer activity [24][25].
Think a Recession Is Coming? This AI Stock Can Still Thrive.
The Motley Fool· 2025-05-06 09:15
Core Insights - The AI industry is facing challenges as new models require significant computational resources, but DeepSeek's recent success with lower resource usage raises questions about future trends [1] - OpenAI's GPT 4.5 model is costly and offers limited real-world applications, indicating that more computing power may not be the ultimate solution [2] - Recent AI models are producing incorrect information more frequently, which could impact their reliability and adoption [3] IBM's Strategy - IBM is focusing on developing small, efficient AI models rather than competing in the race for the most powerful models, positioning itself to thrive in a potentially challenging economic environment [4][5] - The Granite family of AI models is designed for enterprise customers seeking cost-effective solutions that meet safety benchmarks, outperforming competitors in producing harmful content [7] - The Granite 3.3 model requires significant GPU memory, ranging from 28 GB to 84 GB, making it reliant on expensive data center GPUs [8] Granite 4.0 Developments - The upcoming Granite 4.0 models aim to run on inexpensive consumer-grade hardware, with the Granite 4.0 Tiny model requiring 72% less memory than its predecessor, operating on as little as 12 GB of GPU memory [9] - This shift to a new hybrid architecture allows for better performance on lower-cost hardware, making it suitable for enterprises looking to reduce costs [10] Economic Considerations - In a recession, enterprises are likely to prioritize projects that save money or enhance productivity, which aligns with IBM's focus on efficient AI solutions [10] - The unpredictability of U.S. tariff policies may impact the economy, but demand for projects with clear returns on investment is expected to remain strong [11] - IBM's emphasis on efficiency in its AI models could yield positive results if economic conditions worsen [12]