Workflow
Transformer架构
icon
Search documents
2026年,AI将从炒作走向务实
Xin Lang Cai Jing· 2026-01-05 03:29
Core Insights - 2026 is anticipated to be a pivotal year for AI, transitioning from large-scale model development to practical applications that integrate AI into real-world workflows [2][34] - The focus is shifting towards deploying lightweight models and embedding intelligence into physical devices, moving away from mere demonstrations to targeted deployments [2][34] Group 1: Scaling Law and Model Development - The AI industry is nearing the limits of the Scaling Law, prompting a shift towards new architectural research and smaller, more efficient models [4][21] - Experts suggest that smaller language models (SLMs) will become the standard in AI applications by 2026 due to their cost-effectiveness and performance advantages [5][22] - The trend towards SLMs is supported by advancements in edge computing, making them more suitable for deployment on local devices [6][22] Group 2: World Models and Gaming Industry - 2026 is expected to be a key year for world models, which learn how objects interact in three-dimensional space, enhancing predictive capabilities [8][25] - The gaming industry is projected to see significant growth in the world model market, with estimates rising from $1.2 billion in 2022 to $27.6 billion by 2030 [9][25] Group 3: Agent Integration and Practical Applications - The introduction of the Model Context Protocol (MCP) is seen as a critical advancement, enabling AI agents to interact with external tools and databases, thus facilitating their integration into real-world systems [11][27] - As MCP reduces friction in connecting AI agents to practical systems, 2026 may mark the year when these agents transition from demonstration to everyday use [12][28] Group 4: Human-AI Collaboration - There is a growing belief that AI will enhance human workflows rather than replace them, with expectations of new job roles emerging in AI governance and data management [14][31] - The narrative is shifting towards how AI can assist human tasks, with predictions of a low unemployment rate as companies begin to hire for new roles related to AI [14][31] Group 5: Physical AI and Market Trends - Advances in small models, world models, and edge computing are expected to drive the adoption of physical AI applications, including robotics and wearable devices [16][34] - The market for physical AI is anticipated to grow, with wearable devices becoming a cost-effective entry point for consumers [17][34]
梁文锋DeepSeek新论文!接棒何恺明和字节,又稳了稳AI的“地基”
Xin Lang Cai Jing· 2026-01-02 05:27
Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyper-Connections), which significantly improves the residual connection component of the Transformer architecture, a foundational element that has seen little change since its inception in 2015 [1][3] Group 1: Historical Context - The evolution of neural network architectures began with ResNet, introduced by Kaiming He in 2015, which addressed the vanishing gradient problem and enabled the training of very deep networks [3] - The Transformer model, released in 2017, adopted residual connections as a standard feature, forming the basis for many leading models today [3] Group 2: Technical Comparisons - Hyper-Connections, proposed by ByteDance in 2024, expanded the single residual flow into multiple parallel streams, enhancing model performance but introducing stability issues during training [5][10] - mHC aims to resolve the stability problems associated with Hyper-Connections by constraining the connection weight matrix within a specific mathematical space, ensuring that signal amplification does not occur [10][12] Group 3: Mathematical Innovation - The core innovation of mHC involves using a Doubly Stochastic Matrix for the connection weights, which guarantees that the output does not exceed the maximum input value, thus preserving energy conservation [10][12] - The implementation of mHC utilizes the Sinkhorn-Knopp algorithm to achieve the desired matrix properties efficiently, allowing for end-to-end training without introducing new hyperparameters [11][12] Group 4: Engineering Excellence - DeepSeek's approach to implementing mHC demonstrates significant engineering capabilities, including the development of custom CUDA kernels and operator fusion techniques to minimize computational delays [16] - The ability to integrate innovative mathematical solutions into practical training environments highlights DeepSeek's competitive advantage in the AI research landscape [16]
LSTM之父率队造出PoPE:终结RoPE泛化难题,实现Transformer的极坐标进化
机器之心· 2026-01-02 01:55
而在近期 LSTM 之父 Jürgen Schmidhuber 的 USI & SUPSI 瑞士 AI 实验室团队的一项新研究中,分析表明,当前流行的旋转位置嵌入(RoPE)方法中的 what 与 where 是纠缠在一起的。这种纠缠会损害模型性能,特别是当决策需要对这两个因素进行独立匹配时。 基于这一观察,他们提出了新的方案: 极坐标位置嵌入(Polar Coordinate Position E mb edding ) ,简称 PoPE 。 编辑|Panda Transformer 架构中的注意力机制是根据内容(what)和序列中的位置(where)将键(key)与查询(query)进行匹配。 该团队表示,PoPE 消除了内容与位置的混淆,使得其在需要仅通过位置或仅通过内容进行索引的诊断任务上表现远优于 RoPE。 论文标题:Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings 论文地址:https://arxiv.org/abs/2509.10534 该论文的一作为 Anand Gopalakrishn ...
有300亿美元也未必“再造GPT-4”?NUS尤洋最新长文:拆穿AI增长瓶颈的真相
量子位· 2025-12-31 03:37
Core Viewpoint - The article discusses the growing anxiety surrounding the "AI bottleneck" as the third anniversary of ChatGPT approaches, questioning whether current technological paradigms can effectively utilize increased computational power to develop models significantly stronger than GPT-4 [1][2]. Group 1: Nature of Intelligence and Its Measurement - Intelligence is fundamentally about energy conversion, where AI has transformed electricity into reusable intelligence over the past decade, but the efficiency of this conversion is now under scrutiny [6]. - The essence of intelligence is not explanation but prediction, characterized by the ability to forecast future states and bear the consequences of those predictions [7][10]. - The current models derive their intelligence primarily from the pre-training phase, which consumes the most energy and computation, raising questions about the stability of intelligence growth with continued computational investment [15][20]. Group 2: Computational Paradigms and Their Limitations - The article emphasizes that the real bottleneck is not the cessation of computational growth but rather the diminishing returns in the relationship between computational power and intelligence growth [22][27]. - It challenges the mainstream narrative by suggesting that pre-training, fine-tuning, and reinforcement learning are fundamentally about gradient computation and parameter updates, rather than distinct methodologies [12][11]. - The success of the Transformer architecture is attributed to its compatibility with GPU systems, which has enabled a stable feedback loop between computational growth, model scaling, and capability enhancement [16][18]. Group 3: Future Directions and Exploration - Future AI infrastructure should focus on the overall scalability of parallel computing systems rather than just single-chip performance, with an emphasis on maintaining or improving the ratio of computational to communication costs [24][25]. - Multiple exploration directions are proposed, including higher precision, advanced optimizers, and more scalable architectures or loss functions, all aimed at ensuring that increased computational investments yield proportional intelligence enhancements [25][26]. - The article concludes that as long as more efficient computational organization methods can be found, the upper limits of intelligence are far from being reached [27].
豆包日活破亿,接下来应该就要“搞钱”了
Sou Hu Cai Jing· 2025-12-27 19:41
日活最快破亿的国产AI产品是哪个,这个问题如今终于有了答案。日前36氪爆料称,豆包的日均活跃 用户数已经突破1亿大关,并且据字节内部人士透露,豆包的UG、市场推广费用,是字节跳动所有日活 破亿的产品中花费最低的。 在国内互联网江湖,日活破亿往往就意味着一款产品成功"上岸",拥有了现象级的影响力。当然,纵观 过去的历史,互联网产品日活破亿通常也代表着它要开始"搞钱",商业化会成为新的目标,微博、抖 音、快手、哔哩哔哩、小红书莫不如此。 之所以说在日活成功跨过亿级这个节点后,豆包的下一步是商业化,是因为它实在是太烧钱。几乎是同 一时间在火山引擎FORCE原动力大会现场,火山引擎方面就宣布,截至今年12月,豆包大模型日均调 用量已突破50万亿Tokens,较去年同期增长超过10倍。 当然,豆包大模型的API不仅仅只有豆包在用,根据火山引擎总裁谭待透露的信息,2025年有超过100 家企业在火山引擎的累计Tokens使用量超过了一万亿。即便按照豆包大模型Tokens调用中只有50%服务 于豆包App,日均25万亿Tokens所需的成本也是一个天文数字。 相比于单纯生成文字,图片、音频、视频所需的Tokens就呈指数级 ...
当姚顺雨们开始掌舵科技巨轮
Tai Mei Ti A P P· 2025-12-25 05:12
文 | 象先志 Meta的Alexandr Wang、腾讯AI Lab的姚顺雨、小米MiMo团队的罗福莉……这些名字的共同点,不仅是 年轻,更是手握旧时代工程师缺乏的关键能力。 这不是简单的后浪推前浪,而是AI行业技术断层引发的权力重构。为什么AI领域里,经验似乎输给了 直觉?年轻技术派领导资深工程派时,科技公司内部发生了什么? 本文将为你拆解背后的逻辑、冲突与未来。 旧大陆的探险家们不得不退后,因为这个被算法重构的新世界,只对它的原住民敞开大门。 如果在五年前,有人告诉你这些事,你一定会认为是天方夜谭 但当Meta(前Facebook)将AI帅印交给28岁年轻人;腾讯给刚毕业博士开亿级薪酬,还授首席AI科学 家头衔;小米把"人车家"大模型指挥权交给95后。 这一切,这就是现实。 2017年Google发布《Attention Is All You Need》论文前,AI世界更像精细的工匠活。那是RNN和LSTM 的时代,算法专家就像老练的钟表匠。 他们要精细设计规则,手动提取特征,用深厚语言学知识修补模型漏洞。在那个世界里,深耕越久、见 过的bug越多,价值就越高。 然而,Transformer架构的出现和 ...
CMU教授万字反思:西方式AGI永远到不了
量子位· 2025-12-20 07:38
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI "不是AGI还没到,而是永远到不了。" CMU(卡内基梅隆大学)教授、艾伦人工智能实验室研究员 Tim Dettmers 从硬件瓶颈、资源成本、现实应用三重维度论证: 第一个是信息移动成本 。 比如有效计算需要平衡 全局信息传到局部 和 局部信息整合 ,可信息移动的成本会随距离呈平方级上升;芯片缓存也能说明问题,L2、L3缓 存比L1 大,但却因物理位置更远而速度更慢。 为什么AGI从一开始,就是个违背物理规律的幻想工程? 一篇长文,指出 GPU性能峰值停在2018年,机架级优化2027年耗尽潜力,AI每提升1%的能力,资源消耗要翻好几倍 …… 核心观点 AGI的讨论都在回避"计算的物理枷锁" 智能不是飘在天上的想法,而是得靠电脑、芯片这些实实在在的东西算出来,而这些东西都得遵守物理规律。 计算从不是抽象概念,所有智能都要扎根物理现实。 这也是Dettmers反驳AGI的核心,很多人在聊到AGI时总把它当成抽象的哲学概念,但很多人忽略了硬件实现,而硬件必然受到物理规律限 制。 现在芯片里的晶体管越做越小,虽然能降低计算成本,但内存反而越来越贵,现在芯片上几 ...
全网破防,AI“手指难题”翻车逼疯人类,6根手指,暴露Transformer致命缺陷
3 6 Ke· 2025-12-15 12:39
最近,网友们已经被AI「手指难题」逼疯了。给AI一支六指手,它始终无法正确数出到底有几根手指!说吧AI,你是不是在嘲笑人类?其实这背后,暗 藏着Transformer架构的「阿喀琉斯之踵」…… 最近几天,整个互联网陷入阴影—— AI,在用数手指嘲笑人类。 人类给AI的这道题,指令很简单:在图中的每根手指上,依次标出数字。 当然题目中有个小陷阱,就是这只手其实有六个手指。 结果,Nano Banana Pro理直气壮地在这只手上标出1、2、3、4、5,直接略过了其中一只手指。 这荒诞的场面,再一次震惊了网友们。 AI模型是真的这么傻吗? 很多人不这么认为——或许,AI只是在装傻,调戏人类而已。 很有可能,它是在嘲笑这些试图测试自己的劣质人类。 为了通过图灵测试,AI必须让自己变得愚蠢一点,才能看起来像人类。如果太聪明,人类就破防了。 GPT-5.2,同样翻车了 有人也拿这个问题去问GPT-5.2,而且prompt里明明白白写了图里有六根手指。 但GPT-5.2面对「图里有几根手指」的问题,还是斩钉截铁地说:五根! 理由就是:人类有五根手指,所以图里没有五根手指就是错的。 还有人把手指画得奇形怪状,人类都要难倒的 ...
AI文章仿写工具哪个好?深度评测帮你选
Sou Hu Cai Jing· 2025-12-14 16:14
Core Insights - The article discusses the need for a comprehensive tool that automates the entire content creation process, from collection to publication, addressing the limitations of existing AI writing tools that often serve single functions [1][2] - It evaluates several mainstream "AI-generated article imitation" tools based on their automation, functionality, originality, publication flexibility, and cost-effectiveness [2] Group 1: Tool Evaluations - **First Place: Youcaiyun AI Content Factory** - Scoring 9.8/10, it offers a complete content production pipeline, including article collection, intelligent filtering, deep originality/rewrite, and automated publication, designed to meet the needs of website owners and content operators [4][6] - **Second Place: Zhixie Workshop** - Scoring 8.5/10, it excels in creative writing and deep imitation, particularly for literary texts, but lacks built-in content collection and automated publication capabilities, making it suitable for individual creators or small studios [7] - **Third Place: Xuncaitong** - Scoring 7.9/10, it has strong web information scraping and aggregation capabilities, but its rewriting function is basic and requires manual proofreading, limiting its effectiveness for high-quality SEO optimization [8][10] - **Fourth Place: Yigaojingling** - Scoring 7.0/10, it is a lightweight tool for quick generation of draft content, but its simplicity and lack of advanced features make it less suitable for teams with high-quality content needs [11] Group 2: Industry Trends - The evolution of text generation technology has progressed from simple template filling to deep semantic understanding and creative imitation, with modern large language models achieving over 70% vocabulary and sentence structure variation while retaining factual information [2] - The article emphasizes the importance of selecting a tool that integrates into a complete workflow rather than standalone features, highlighting the growing homogeneity in AI content creation tools [12]
从 LLM 到 World Model:为什么我们需要能理解并操作世界的空间智能?
海外独角兽· 2025-12-03 12:05
编译:Haozhen、Gemini 如今 LLM 的语言理解与生成能力已展现出惊人的广泛适用性,但随着 LLM 的发展,一个事实越 发凸显:仅靠语言,仍不足以支撑真正的智能。 从更本质的角度看,人类处理世界的方式从来不只依赖文字,而是通过视觉、空间感知、物理直觉 与行动能力等共同构成完整的认知体系。语言只是对三维世界的"有损压缩":它记录结论,却省略 过程;它表达结构,却隐藏动态。而真正的智能,源于不断与世界互动、不断在空间中推理和行动 的能力。 正因如此,构建能够"理解并操作世界"的空间智能(Spatial Intelligence)与世界模型(World Models)成为继 LLM 之后的关键方向。 2024 年,李飞飞、Justin Johnson 等学者创立了 World Labs,今年 11 月推出了 Marble 这个 3D 世界 生成模型。团队尝试突破模型"只懂文本"的限制,让模型具备在三维环境中定位、推理、模拟、生 成甚至执行任务的能力。这不仅意味着新的技术路线,也意味着新的 AI 价值尺度:从语言走向世 界、从描述走向交互、从静态认知走向动态智能。 本文整理了李飞飞和 Justin Joh ...