Workflow
BERT
icon
Search documents
烦人的内存墙
半导体行业观察· 2026-02-02 01:33
公众号记得加星标⭐️,第一时间看推送不会错过。 前所未有的无监督训练数据的可用性,以及神经网络的扩展规律,导致用于服务/训练低层逻辑模型 (LLM)的模型规模和计算需求出现了前所未有的激增。然而,主要的性能瓶颈正日益转移到内存 带宽上。 过去20年,服务器硬件的峰值浮点运算能力(FLOPS)以每两年3倍的速度增长,超过了DRAM和互 连带宽的增长速度,后两者分别仅以每两年1.6倍和1.4倍的速度增长。这种差距使得内存而非计算成 为人工智能应用(尤其是服务应用)的主要瓶颈。 本文分析了编码器和解码器Transformer模型,并展示了内存带宽如何成为解码器模型的主要瓶颈。 我们提出重新设计模型架构、训练和部署策略,以克服这一内存限制。 引言 近年来,训练大型语言模型 (LLM) 所需的计算量以每两年 750 倍的速度增长。这种指数级增长趋势 是人工智能加速器发展的主要驱动力,这些加速器致力于提升硬件的峰值计算能力,但往往以牺牲其 他部分(例如内存层次结构)的简化为代价。 然而,这些趋势忽略了训练和服务人工智能模型过程中一个新兴的挑战:内存和通信瓶颈。事实上, 许多人工智能应用的瓶颈并非计算能力,而是芯片内部/芯 ...
LeCun预言成真?这有一份通往AGI的硬核路线图:从BERT到Genie,在掩码范式的视角下一步步构建真正的世界模型
量子位· 2026-01-01 02:13
Core Viewpoint - The article discusses the emergence of World Models in AI, emphasizing the importance of Masking as a foundational principle for building these models, which are seen as essential for achieving Artificial General Intelligence (AGI) [1][3][5]. Group 1: Definition and Components of World Models - The true World Model is defined as an organic system composed of three core subsystems: a Generative Heart, an Interactive Loop, and a Memory System [6][8]. - The Generative Heart ($G$) predicts future states and simulates world dynamics, while the Interactive Loop ($F,C$) allows for real-time interaction and decision-making [8]. - The Memory System ($M$) ensures continuity over time, preventing the world from becoming a series of fragmented experiences [8][9]. Group 2: Evolution of World Models - The evolution of World Models is categorized into five stages, with Masking being the central theme throughout these stages [10][12]. - Stage I focuses on Mask-based Models, highlighting Masking as a universal generative principle rather than just a pre-training technique [13][24]. - Stage II aims for Unified Models that process and generate all modalities under a single architecture, with a debate between Language-Prior and Visual-Prior modeling approaches [25][26]. Group 3: Interactive Generative Models - Stage III introduces Interactive Generative Models, where models respond to user actions, transforming from mere simulators to interactive environments [36][40]. - The Genie series, particularly Genie-3, represents the state-of-the-art in real-time interactive models, achieving 720p resolution and 24fps frame rates [41][42]. Group 4: Memory and Consistency - Stage IV addresses Memory & Consistency, focusing on the need for persistent memory to prevent catastrophic forgetting and state drift in generated worlds [46][48]. - Solutions proposed include Externalized Memory, architecture-level persistence, and consistency governance to maintain coherence in generated environments [49][50]. Group 5: Ultimate Form of World Models - Stage V envisions True World Models that exhibit persistence, agency, and emergence, allowing for complex interactions and societal dynamics within the simulated world [51][52]. - The article concludes with the challenges of coherence, compression, and alignment that must be addressed to realize these advanced models [58].
NUS尤洋教授深度探讨智能增长的瓶颈:或许我们将这样实现AGI?
机器之心· 2025-12-31 04:09
Core Insights - The essence of intelligent growth is not about architectural changes but how computational power translates into intelligence [6][7] - The current paradigm (Transformer + massive computational power) faces a bottleneck in fully utilizing the increasing computational resources, leading to diminishing returns on pre-training [6][8] - Future directions should focus on breakthroughs in foundational paradigms rather than mere engineering optimizations [8][9] Group 1: Current State of Intelligence - There is no clear definition of intelligence, and even top experts struggle to define AGI (Artificial General Intelligence) [15][16] - The core of intelligence is seen as prediction and creation, with significant advancements needed to approach AGI [17][18] Group 2: Bottlenecks in Intelligent Development - The main source of bottlenecks in intelligent growth is the inefficiency in converting computational power into usable intelligence [19][20] - Pre-training is the most significant contributor to model intelligence, consuming the most computational resources [20][21] - The current model architectures, particularly Transformers, are unable to fully leverage the continuous growth in computational power [33] Group 3: Future Directions - There is a need for higher precision computing and more advanced optimizers to enhance model intelligence [45] - The exploration of scalable model architectures and loss functions is crucial for better utilization of computational resources [45] - The industry must find ways to "consume" more energy in a unit of time and effectively convert it into intelligence [42][45]
Transformer能否支撑下一代Agent?
Tai Mei Ti A P P· 2025-12-22 07:39
Core Insights - The current Transformer architecture is deemed insufficient for supporting the next generation of AI agents, as highlighted by experts at the Tencent ConTech conference [1][2][11] - There is a growing consensus that the AI industry is transitioning from a "scaling era" focused on data and computational power to a "research era" that emphasizes foundational innovation [11][12] Group 1: Limitations of Current AI Models - Experts, including prominent figures like Fei-Fei Li and Ilya Sutskever, express concerns that existing Transformer models are reaching their limits, particularly in understanding causality and physical reasoning [2][5][11] - The marginal returns of scaling laws are diminishing, indicating that simply increasing model size and data may not yield further advancements in AI capabilities [2][10] - Current models are criticized for their reliance on statistical correlations rather than true understanding, likening them to students who excel in exams through memorization rather than comprehension [4][5] Group 2: Challenges in Long Context Processing - The ability of Transformers to handle long contexts is questioned, with evidence suggesting that performance degrades significantly beyond a certain token limit [6][7] - The architecture's unidirectional information flow restricts its capacity for deep reasoning, which is essential for effective decision-making [6][7] Group 3: Need for New Architectures - The industry is urged to explore new architectural breakthroughs that integrate causal logic and physical understanding, moving beyond the limitations of current models [11][12] - Proposed alternatives include nonlinear RNNs that allow for internal feedback and reasoning, which could enhance AI's ability to learn and adapt [12][13] Group 4: Implications for the AI Industry - A shift away from Transformer-based models could lead to a reevaluation of hardware infrastructure, as current systems are optimized for these architectures [13] - The value of data types may also change, with physical world sensor data and interactive data becoming increasingly important in the new AI landscape [14] - Companies in the tech sector face both challenges and opportunities as they navigate this transition towards more advanced AI frameworks [16]
谷歌AI往事:隐秘的二十年,与狂奔的365天
3 6 Ke· 2025-11-27 12:13
Core Insights - Google has undergone a significant transformation in the past year, moving from a state of perceived stagnation to a strong resurgence in AI capabilities, highlighted by the success of its Gemini applications and models [2][3][44] - The company's long-term investment in AI technology, dating back over two decades, has laid a robust foundation for its current advancements, showcasing a strategic evolution rather than a sudden breakthrough [3][6][45] Group 1: Historical Context and Development - Google's AI journey began with Larry Page's vision of creating an ultimate search engine capable of understanding the internet and user intent [9][47] - The establishment of Google Brain in 2011 marked a pivotal moment, focusing on unsupervised learning methods that would later prove essential for AI advancements [12][18] - The "cat paper" published in 2012 demonstrated the feasibility of unsupervised learning and led to the development of recommendation systems that transformed platforms like YouTube [15][16] Group 2: Key Acquisitions and Innovations - The acquisition of DeepMind in 2014 for $500 million solidified Google's dominance in AI, providing access to top-tier talent and innovative research [22][24] - Google's development of Tensor Processing Units (TPUs) was a strategic response to the limitations of existing hardware, enabling more efficient processing of AI workloads [25][30] Group 3: Challenges and Strategic Shifts - The emergence of OpenAI and the success of ChatGPT in late 2022 prompted Google to reassess its AI strategy, leading to a restructuring of its AI teams and a renewed focus on a unified model, Gemini [41][42] - The rapid development and deployment of Gemini and its variants, such as Gemini 3 and Nano Banana Pro, have positioned Google back at the forefront of the AI landscape [43][44] Group 4: Future Outlook - Google's recent advancements in AI reflect a culmination of years of strategic investment and innovation, reaffirming its identity as a company fundamentally rooted in AI rather than merely a search engine [47][48]
扩散不死,BERT永生,Karpathy凌晨反思:自回归时代该终结了?
3 6 Ke· 2025-11-05 04:44
Core Insights - The article discusses Nathan Barry's innovative approach to transforming BERT into a generative model using a diffusion process, suggesting that BERT's masked language modeling can be viewed as a specific case of text diffusion [1][5][26]. Group 1: Model Transformation - Nathan Barry's research indicates that BERT can be adapted for text generation by modifying its training objectives, specifically through a dynamic masking rate that evolves from 0% to 100% [13][27]. - The concept of using diffusion models, initially successful in image generation, is applied to text by introducing noise and then iteratively denoising it, which aligns with the principles of masked language modeling [8][11]. Group 2: Experimental Validation - Barry conducted a validation experiment using RoBERTa, a refined version of BERT, to demonstrate that it can generate coherent text after being fine-tuned with a diffusion approach [17][21]. - The results showed that even without optimization, the RoBERTa Diffusion model produced surprisingly coherent outputs, indicating the potential for further enhancements [24][25]. Group 3: Industry Implications - The article highlights the potential for diffusion models to challenge existing generative models like GPT, suggesting a shift in the landscape of language modeling and AI [30][32]. - The discussion emphasizes that the generative capabilities of language models can be significantly improved through innovative training techniques, opening avenues for future research and development in the field [28][30].
前阿里、字节大模型带头人杨红霞创业:大模型预训练,不是少数顶尖玩家的算力竞赛|智能涌现独家
Sou Hu Cai Jing· 2025-10-30 08:35
Core Insights - Yang Hongxia, a key figure in large model research from Alibaba and ByteDance, has launched a new AI company, InfiX.ai, focusing on decentralized model training and innovation in the AI space [1][15][36] - InfiX.ai aims to democratize access to large model training, allowing small and medium enterprises, research institutions, and individuals to participate in the process [4][16][19] Company Overview - InfiX.ai was founded by Yang Hongxia after her departure from ByteDance, with a focus on model-related technologies [1][15] - The company has quickly assembled a team of 40 people in Hong Kong, leveraging the region's strong talent pool and funding opportunities [3][15] Technological Innovations - InfiX.ai is developing a decentralized approach to large model training, contrasting with the centralized models dominated by major institutions [4][16] - The company has released the world's first FP8 training framework, which enhances training speed and reduces memory consumption compared to the commonly used FP16/BF16 [7][10] - InfiX.ai's model fusion technology allows for the integration of different domain-specific models, reducing resource waste and enhancing knowledge sharing [10][16] Market Positioning - The company is targeting challenging fields, particularly in healthcare, with a focus on cancer detection, to demonstrate the capabilities of its models [15][41] - InfiX.ai's approach is gaining traction, with increasing interest from investors and a shift in perception towards decentralized model training in the industry [15][36] Future Vision - Yang Hongxia envisions a future where every organization has its own expert model, facilitated by model fusion across different domains and geographical boundaries [16][19] - The company aims to make model training accessible and affordable, fostering a collaborative environment for AI development [16][19]
Embedding黑箱成为历史!这个新框架让模型“先解释,再学Embedding”
量子位· 2025-10-21 09:05
Core Insights - The article introduces GRACE, a new explainable generative embedding framework developed by researchers from multiple universities, aimed at addressing the limitations of traditional text embedding models [1][6]. Group 1: Background and Limitations - Text embedding models have evolved from BERT to various newer models, mapping text into vector spaces for tasks like semantic retrieval and clustering [3]. - A common flaw in these models is treating large language models as "mute encoders," which output vectors without explaining the similarity between texts [4]. - This black-box representation becomes a bottleneck in tasks requiring high interpretability and robustness, such as question-answer matching and cross-domain retrieval [5]. Group 2: GRACE Framework Overview - GRACE transforms "contrastive learning" into "reinforcement learning," redefining the meaning of contrastive learning signals [6]. - The framework emphasizes generating explanations (rationales) for text before learning embeddings, allowing the model to produce logical and semantically consistent reasoning [7][25]. - GRACE consists of three key modules: 1. Rationale-Generating Policy, which generates explanatory reasoning chains for input texts [8]. 2. Representation Extraction, which combines input and rationale to compute final embeddings [9]. 3. Contrastive Rewards, which redefines contrastive learning objectives as a reward function for reinforcement learning updates [11]. Group 3: Training Process - GRACE can be trained in both supervised and unsupervised manners, utilizing labeled query-document pairs and self-alignment techniques [12][18]. - In the supervised phase, the model learns semantic relationships from a dataset of 1.5 million samples [13]. - The unsupervised phase generates multiple rationales for each text, encouraging consistent representations across different explanations [17]. Group 4: Experimental Results - GRACE was evaluated across 56 datasets in various tasks, showing significant performance improvements over baseline models in retrieval, pair classification, and clustering [19][20]. - The results indicate that GRACE not only enhances embedding capabilities without sacrificing generative abilities but also provides transparent representations that can be understood by users [25][27]. Group 5: Conclusion - Overall, GRACE represents a paradigm shift in embedding models, moving towards a framework that can explain its understanding process, thus enhancing both performance and interpretability [28].
X @Kraken
Kraken· 2025-09-26 14:49
Listing & Trading - BERT (Bertcoin) 已在 Kraken 上线 [1] - Kraken 开始交易 BERT [1]
X @THE HUNTER ✴️
GEM HUNTER 💎· 2025-09-23 16:57
Cryptocurrency Trends - The document identifies a list of trending cryptocurrencies, including DOG, TOSHI, ASTER, APEX, MOMO, TRUMP, WLFI, PUMP, SUN, UFD, TROLL, BERT, NMR, BITCOIN, and BLESS [1] - The document acknowledges that the list of trending cryptocurrencies is incomplete and seeks community input to identify missing cryptocurrencies [1]