Engram
Search documents
从OpenClaw说起:Agentic AI时代CPU价值的回归
半导体行业观察· 2026-03-11 02:00
Core Insights - The article discusses the emergence of "Agentic AI" and highlights the launch of OpenClaw, a lightweight AI agent deployed on a Mac Mini, which serves as a personal assistant through messaging interactions [2][44] - It emphasizes the shift from GPU-dominated computing to a more balanced CPU-GPU collaboration in AI applications, particularly in the context of intelligent agents [44] Group 1: Definition and Characteristics of AI Agents - AI agents are defined as intelligent systems capable of autonomous perception, decision-making, and action to achieve specific goals, distinguishing them from AI assistants and chatbots [5][6] - Key capabilities required for AI agents include perception, planning, memory, and action, enabling them to perform complex tasks independently [7][9] Group 2: Chain-of-Thought (CoT) and Its Importance - CoT is described as a foundational element for Agentic AI, allowing models to break down complex tasks into logical steps, enhancing accuracy and reducing errors [10][20] - The article outlines how CoT facilitates task planning, exception handling, interpretability, and the synergy between reasoning and action [12][13] Group 3: Retrieval-Augmented Generation (RAG) - RAG is introduced as a method to enhance CoT by providing external knowledge, addressing issues like error propagation and lack of feedback in AI agents [21][24] - The RAG process involves text vectorization, similarity metrics, and nearest neighbor search to retrieve relevant information for improved decision-making [26][27] Group 4: Engram and Its Role - Engram is presented as a memory module that enhances reasoning by separating static knowledge storage from dynamic inference, improving the efficiency of AI agents [33][35] - The integration of Engram allows for faster knowledge retrieval and reduces the cognitive load on models, enabling them to focus on complex reasoning tasks [34][36] Group 5: CPU's Resurgence in AI - The article argues that the evolution of Agentic AI necessitates a renewed focus on CPU capabilities, particularly in handling high concurrency and inter-process context switching [38][39] - It highlights the importance of technologies like CXL for memory expansion and efficient CPU-GPU communication, which are critical for the performance of intelligent agents [41][42]
Dense、MoE之外第三条Scaling路径:交大提出JTok模块,省1/3算力
机器之心· 2026-03-02 15:16
Core Insights - The article discusses the limitations of traditional scaling methods in large models, emphasizing the need for new approaches to decouple parameters from computational costs [2][4][19] - It introduces the JTok and JTok-M modules, which utilize token-indexed parameters to enhance model capacity without significantly increasing computational requirements [3][5][10] - The findings suggest that JTok-M can achieve substantial performance improvements while reducing computational costs by approximately 35% [5][24][26] Summary by Sections Traditional Scaling Limitations - Traditional scaling methods bind parameters and computational requirements, leading to linear increases in both as model size grows [2][19] - The MoE (Mixture of Experts) approach, while promising, has drawbacks such as lower sample efficiency and increased memory and communication overhead [2][3] Introduction of JTok and JTok-M - JTok introduces a new scaling dimension by using modulation vectors for each token, allowing for enhanced model capacity without additional computational costs [3][10] - JTok-M further refines this by incorporating context-aware dynamic modulation, improving performance while maintaining efficiency [14][16] Performance and Efficiency Gains - JTok-M has shown significant performance improvements across various tasks, with notable increases in accuracy for models ranging from 650M to 61B parameters [5][39] - The approach allows for a reduction in computational requirements while achieving similar or better performance compared to traditional models [5][26][44] Theoretical Framework and Validation - The article presents a theoretical framework that integrates JTok-M into existing scaling laws, demonstrating its potential to shift the performance-computation curve downward [24][25] - Empirical results confirm that JTok-M maintains stable performance gains across different model sizes and training budgets, validating its scalability [26][29] Practical Applications and Future Directions - JTok and JTok-M have been tested across various downstream tasks, showing improvements in knowledge retention, reasoning, and mathematical problem-solving capabilities [35][39] - The innovations presented in JTok-M represent a significant step forward in redefining scaling laws for large models, offering a sustainable path for future developments in the field [34][32]
DeepSeek更新后被吐槽变冷变傻:比20年前的青春伤感文学还尴尬
Mei Ri Jing Ji Xin Wen· 2026-02-12 22:23
Core Insights - DeepSeek has initiated a gray testing phase for its flagship model, allowing for a context length of up to 1 million tokens, significantly expanding from the previous 128K tokens in version V3.1 released in August last year [1] - Users have reported mixed reactions to the recent updates, with some expressing dissatisfaction over the model's change in tone and interaction style, leading to a trending topic on social media regarding its perceived coldness [1][4] Group 1: Model Updates and Features - The latest version of DeepSeek supports the processing of extremely long texts, as demonstrated by its ability to handle a document with over 240,000 tokens [1] - The upcoming DeepSeek V4 model is expected to be released in mid-February 2026, with the current version being a speed-optimized variant that sacrifices some quality for performance testing [6] - DeepSeek's V series models are designed for optimal performance, with V3 marking a significant milestone due to its efficient MoE architecture [6] Group 2: User Feedback and Reactions - Users have criticized the new version for its impersonal approach, referring to users as "users" instead of personalized nicknames, which has led to a perception of the model being less engaging [4] - Some users have described the updated model as overly simplistic and lacking emotional depth, comparing its output unfavorably to older literary styles [4] - Conversely, a segment of users appreciates the model's newfound objectivity and rationality, noting that it appears more attuned to the psychological state of the questioner [5] Group 3: Technical Innovations - DeepSeek has introduced two innovative architectures: mHC for optimizing information flow in deep Transformers, enhancing stability and scalability without increasing computational load, and Engram for decoupling static knowledge from dynamic computation [7] - These innovations aim to significantly reduce the cost of long-context reasoning while maintaining performance [7]
春节见?DeepSeek下一代模型:“高性价比”创新架构,助力中国突破“算力芯片和内存”瓶颈
硬AI· 2026-02-11 08:40
Core Viewpoint - Nomura Securities believes that DeepSeek's upcoming next-generation model V4 may further reduce training and inference costs through innovative architectures mHC and Engram technology, accelerating the innovation cycle of China's AI value chain [2][4][5]. Group 1: Innovation in Technology Architecture - The report indicates that computing chips and memory have been bottlenecks for China's large models, and V4 is expected to introduce two key technologies—mHC and Engram—to optimize these constraints from both algorithmic and engineering perspectives [7]. - mHC, or "Manifold Constraint Hyperconnection," aims to address the bottleneck of information flow and training instability in deep Transformer models, enhancing the communication between neural network layers [8]. - Engram is a "conditional memory" module designed to decouple "memory" from "computation," allowing static knowledge to be stored in a sparse memory table, which can be quickly accessed during inference, thus freeing up expensive GPU memory for dynamic calculations [11]. Group 2: Impact on AI Development - The combination of these two technologies is significant for China's AI development, as mHC provides a more stable training process to compensate for potential shortcomings in domestic chips, while Engram smartly manages memory to bypass HBM capacity and bandwidth limitations [13]. - Nomura emphasizes that the most direct commercial impact of V4 will be a further reduction in the training and inference costs of large models, stimulating demand and benefiting Chinese AI hardware companies through an accelerated investment cycle [13][14]. Group 3: Market Dynamics and Competition - Nomura believes that major global cloud service providers are still in a race for general artificial intelligence, and the capital expenditure competition is far from over, suggesting that V4 is unlikely to create the same level of shockwaves in the global AI infrastructure market as last year [15]. - However, global large model and application developers are facing increasing capital expenditure burdens, and if V4 can significantly lower training and inference costs while maintaining high performance, it will serve as a strong boost for these players [15][16]. - The report reviews the market landscape one year after the release of DeepSeek's V3 and R1 models, noting that these models accelerated the development of Chinese LLMs and applications, altering the competitive landscape and increasing attention on open-source models [16]. Group 4: Software Evolution - On the application side, the more powerful and efficient V4 is expected to give rise to more capable AI agents, transitioning from "dialogue tools" to "AI assistants" that can handle complex tasks [20][21]. - This shift will require more frequent interactions with underlying large models, increasing token consumption and thereby raising computing demand [21]. - Consequently, the enhancement of model efficiency is not expected to "kill software," but rather create value for leading software companies that can leverage the capabilities of the new generation of large models to develop disruptive AI-native applications or agents [22].