Scaling Law(尺度定律)
Search documents
Transformer能否支撑下一代Agent?
Tai Mei Ti A P P· 2025-12-22 07:39
文 | 划重点KeyPoints,作者 | 李越 12月18日,2025腾讯ConTech大会暨腾讯科技Hi Tech Day正式播出,中国工程院院士、知名专家和学 者、头部科技企业创始人及知名投资人齐聚一堂,共同探讨智能时代的机遇与挑战。 原本能够带领我们通往AGI的Transformer,是否已经触碰到了天花板? 只会做题的优等生 在2017年之前,AI自然语言处理(NLP)的主流方式还是RNN(循环神经网络)和LSTM(长短期记忆 网络)。它们处理信息的方式像一个勤恳的阅读者,必须按顺序一个字一个字地读,效率低下且难以捕 捉长距离的语义关联。 2017年,Google论文《Attention Is All You Need》横空出世,彻底改变了这一切。 Transformer架构抛弃了循环,引入了"自注意力机制"。它不再按顺序阅读,而是能同时关注句子中的所 有词,并计算它们之间的关联权重。 在圆桌论坛环节,当主持人把话筒递给阶跃星辰首席科学家张祥雨,询问关于模型架构未来时,这位学 术大牛抛出了一枚"深水炸弹":现有的Transformer架构无法支撑下一代Agent。 而就在不久前,斯坦福大学教授、"A ...
AI落地的关键堵点,华为用“黑科技”打通了
Guan Cha Zhe Wang· 2025-08-15 04:06
Core Viewpoint - The traditional Scaling Law for AI models is facing significant bottlenecks, particularly in China, where infrastructure investment is lagging behind the US, leading to challenges in AI inference performance and commercial viability [1][4][9]. Group 1: AI Inference Challenges - AI inference has become a critical area, with current demand for inference computing power exceeding that for training, as evidenced by GPT-5's API call volume exceeding 20 billion calls per minute [4][6]. - Chinese enterprises face a "push not moving," "push slow," and "push expensive" dilemma, with domestic models outputting less than 60 tokens per second compared to over 200 tokens per second for foreign models [7][9]. - The increasing complexity of AI applications, such as long text processing and multi-turn dialogues, has intensified the demand for improved inference performance [1][4][6]. Group 2: Huawei's UCM Technology - Huawei has introduced the Unified Cache Manager (UCM), a breakthrough technology designed to enhance AI inference performance by optimizing memory management and overcoming HBM capacity limitations [1][11]. - UCM employs a tiered caching strategy that allows for the efficient storage and retrieval of KV Cache data, significantly reducing inference latency and costs [10][11][18]. - The technology has demonstrated substantial improvements in inference speed, with a reported 125-fold increase in processing speed for specific applications in collaboration with China UnionPay [19][21]. Group 3: Industry Implications and Future Prospects - The introduction of UCM is seen as a pivotal move for the Chinese AI industry, potentially leading to a positive cycle of user growth, increased investment, and rapid technological iteration [18][24]. - Huawei's open-source approach to UCM aims to foster collaboration within the AI ecosystem, allowing various stakeholders to integrate and enhance their frameworks [28]. - The technology is expected to be applicable across various industries, addressing the challenges posed by the increasing volume of data and the need for efficient inference solutions [23][24].
一文了解DeepSeek和OpenAI:企业家为什么需要认知型创新?
混沌学园· 2025-06-10 11:07
Core Viewpoint - The article emphasizes the transformative impact of AI technology on business innovation and the necessity for companies to adapt their strategies to remain competitive in the evolving landscape of AI [1][2]. Group 1: OpenAI's Emergence - OpenAI was founded in 2015 by Elon Musk and Sam Altman with the mission to counteract the monopolistic power of major tech companies in AI, aiming for an open and safe AI for all [9][10][12]. - The introduction of the Transformer architecture by Google in 2017 revolutionized language processing, enabling models to understand context better and significantly improving training speed [13][15]. - OpenAI's belief in the Scaling Law led to unprecedented investments in AI, resulting in the development of groundbreaking language models that exhibit emergent capabilities [17][19]. Group 2: ChatGPT and Human-Machine Interaction - The launch of ChatGPT marked a significant shift in human-machine interaction, allowing users to communicate in natural language rather than through complex commands, thus lowering the barrier to AI usage [22][24]. - ChatGPT's success not only established a user base for future AI applications but also reshaped perceptions of human-AI collaboration, showcasing vast potential for future developments [25]. Group 3: DeepSeek's Strategic Approach - DeepSeek adopted a "Limited Scaling Law" strategy, focusing on maximizing efficiency and performance with limited resources, contrasting with the resource-heavy approaches of larger AI firms [32][34]. - The company achieved high performance at low costs through innovative model architecture and training methods, emphasizing quality data selection and algorithm efficiency [36][38]. - DeepSeek's R1 model, released in January 2025, demonstrated advanced reasoning capabilities without human feedback, marking a significant advancement in AI technology [45][48]. Group 4: Organizational Innovation in AI - DeepSeek's organizational model promotes an AI Lab paradigm that fosters emergent innovation, allowing for open collaboration and resource sharing among researchers [54][56]. - The dynamic team structure and self-organizing management style encourage creativity and rapid iteration, essential for success in the unpredictable field of AI [58][62]. - The company's approach challenges traditional hierarchical models, advocating for a culture that empowers individuals to explore and innovate freely [64][70]. Group 5: Breaking the "Thought Stamp" - DeepSeek's achievements highlight a shift in mindset among Chinese entrepreneurs, demonstrating that original foundational research in AI is possible within China [75][78]. - The article calls for a departure from the belief that Chinese companies should only focus on application and commercialization, urging a commitment to long-term foundational research and innovation [80][82].