Workflow
Transformer
icon
Search documents
盛会直击:英伟达GTC大会四大核心重磅发布
Mei Ri Jing Ji Xin Wen· 2026-03-23 02:47AI Processing
英伟达在GPU领域深耕多年,自1999年发布首款GPU至今,已有约27年时间。其芯片制程从220纳米迭 代至4纳米左右,未来还将向1.6纳米推进,这也是我们较为期待的投资价值。 本轮AI浪潮始于2023年,当时市场主流GPU为A100与H100。截至目前,市场主流GPU已更新为 Blackwell架构芯片。A100与H100的核心技术特征是什么?其中H100芯片性能强劲,在2023年AI浪潮爆 发后,迅速成为市场炙手可热的GPU产品。 H100由中国台湾台积电采用4纳米工艺代工生产,单芯片集成800亿个晶体管,还专门内置了 Transformer模型引擎。为什么要专门针对Transformer做硬件适配?当前国内外我们耳熟能详的各类大模 型,其底层架构基本都是基于Transformer基础架构针对性优化发展而来。 英伟达极具前瞻性地在Hopper架构中,从硬件层面对Transformer做了专项优化,也就是引入了对应的专 用引擎。英伟达也凭借这一核心优势,在短短两年多的时间里,从一家规模中等的企业,成长为全球市 值第一的科技巨头,由此也能充分看到AI产业的强劲爆发力。 英伟达在2023年前后发布的Blackw ...
腾讯研究院AI速递 20260317
腾讯研究院· 2026-03-16 16:01
生成式AI 二、智谱推出GLM-5-Turbo,为OpenClaw龙虾场景深度优化 1. 智谱推出GLM-5-Turbo,专门针对OpenClaw龙虾Agent场景深度优化,强化工具调用、长链路执行、定时任务 和指令遵循等核心能力; 2. 同步发布龙虾套餐(个人版和Team版)解决Agent场景tokens消耗高的问题,并推出企业级Claw安全管理体系, 支持权限编排、审计日志和多Agent协同监控; 3. 盲测中90%用户认为GLM-5-Turbo优于其他国产模型;多家大厂内测团队在工具调用稳定性和长任务执行方面给 予高度评价。 一、AI版Chrome正式推出,WebMCP让AI直接调用网页功能 1. 谷歌Chrome团队正式推出WebMCP协议,AI智能体可通过 API直接调用网页底层功能,无需再依赖截屏识图、模 拟点击等低效方式; 2. WebMCP由谷歌与微软联合共建并已开源,前端开发者可通过声明式或命令式两套API直接在浏览器端接入,无需 额外部署后端服务; 3. 未来网页将分为人机两层:一层面向用户提供视觉交互,一层面向AI提供结构化工具接口,前端角色从"画页面"升 级为"定义AI与世界的接口"。 ...
全新线性注意力范式!哈工深张正团队提出模长感知线性注意力!显存直降92.3%!
机器之心· 2026-03-15 03:30
本文一作孟维康是哈尔滨工业大学(深圳)与鹏城实验室联合培养的博士生,本科毕业于哈尔滨工业大学,主要研究方向是高效能基础模型。通讯作者张正教 授,哈尔滨工业大学(深圳)的长聘教授及博士生导师,教育部青年长江学者,广东特支计划青年珠江学者,深圳市优青。长期从事高效能多模态机器学习的研 究,专注于高效与可信多模态大模型。 当 Transformer 席卷计算机视觉领域,高分辨率图像、超长序列任务带来的算力与显存瓶颈愈发凸显:标准 Softmax 注意力的二次复杂度,让 70K+token 的超分辨 率任务直接显存爆炸,高分辨率图像分割、检测的推理延迟居高不下。 线性注意力虽通过核函数重构实现了线性复杂度,完美解决了算力开销问题,却始终无法摆脱性能退化的问题,与原生 Softmax 注意力的精度差距始终难以弥合。 近日,哈工深张正团队、联合鹏城实验室、昆士兰大学等团队,发布重磅论文《Norm×Direction: Restoring the Missing Query Norm in Vision Linear Attention》,提 出 NaLaFormer(Norm-aware Linear Attention ...
ICLR2026|山大、理想汽车和中科院联合提出离线强化学习新范式:让Transformer学会「去其糟粕」
机器之心· 2026-03-14 02:30
针对这一痛点, 山东大学、中科院、理想汽车与清华大学 的研究团队联合提出了一种名为 PRGS(Peak-Return Greedy Slicing) 的新框架。 PRGS 的目标是在不改变离线数据来源的前提下,从原始轨迹中自动筛选出更有学习价值的子轨迹(sub-trajectories),用于训练 Transformer 型离线 RL 方法,并在推理阶段进一步避免「糟糕历史」对当前决策的干扰。 目前,该论文已接收于国际计算机顶级会议 ICLR 2026。ICLR(International Conference on Learning Representations)是机器学习与表示学习领域 的国际顶级会议之一,与 NeurIPS、ICML 并列为人工智能方向最具影响力的学术会议。本次 ICLR 2026 共有接近 19000 篇有效投稿,接收率约为 28%。 离线强化学习(Offline RL)的一大难点是:训练数据固定、质量参差不齐。近两年,Decision Transformer(DT)等基于 Transformer 的方法因为把决 策建模成条件序列生成而受到关注,但它们往往把「整条轨迹」作为学习单位: ...
ICLR 2026|早于DeepSeek Engram,STEM已重构Transformer「记忆」
机器之心· 2026-03-09 02:50
Core Insights - The article discusses the evolution of parameter organization in large language models, emphasizing the need for more efficient memory representation methods [2] - It introduces STEM, a new approach that replaces the up-projection in the Feed Forward Network (FFN) with a token-indexed embedding table, allowing for static memory access without runtime routing [4][9] - The article highlights the significant improvements in model capabilities through structural changes rather than merely increasing scale or computational power [29][30] Summary by Sections - **Memory Organization**: The traditional method of storing knowledge in dense matrices has limitations in addressability and efficiency, prompting a shift towards more structured parameter organization [2][3] - **STEM Approach**: STEM directly modifies the FFN structure by using a static embedding table indexed by tokens, which simplifies memory access and enhances model performance [4][9] - **Key Insights of STEM**: - **Editability**: The explicit token-parameter relationship allows for direct modification of knowledge vectors without retraining, enabling easier knowledge editing [16][18] - **Training Stability**: STEM's static sparse structure avoids common issues found in dynamic routing systems, leading to improved training stability [20] - **Memory Efficiency**: The geometric structure of embeddings in STEM reduces interference between parameters, allowing for more addressable memory slots at lower computational costs [22][23] - **Computational Efficiency**: Removing up-projection saves significant computational resources, and large embedding tables can be offloaded to CPUs for efficient access [24] - **Experimental Results**: STEM was tested against dense baselines at model sizes of 350M and 1B, showing an average performance improvement of 3-4%, with some knowledge tasks improving by up to 9-10% [36]
X @Avi Chawla
Avi Chawla· 2026-03-08 06:33
Transformer and Mixture of Experts, explained visually!Mixture of Experts (MoE) is a popular architecture that uses different experts to improve Transformer models.Transformer and MoE differ in the decoder block:- Transformer uses a feed-forward network.- MoE uses experts, which are feed-forward networks but smaller compared to those Transformer.During inference, a subset of experts are selected. This makes inference faster in MoE.Also, since the network has multiple decoder layers:- The text passes through ...
X @Cassandra Unchained
Cassandra Unchained· 2026-03-01 01:30
The transformer’s internal “silence” reaches for the next token. Ballard’s silence reaches for the stars. ...
AI势不可挡:2026年模型升级有哪些预期差?
2026-02-10 03:24
Summary of AI Industry Conference Call Industry Overview - The conference focused on the AI industry, particularly the anticipated model upgrades by 2026 and the overall trends in AI development. The speaker emphasized the recent adjustments in the AI sector due to demand-side slowdowns and macroeconomic fluctuations abroad [1][2]. Key Points and Arguments 1. **Model Upgrades and Trends**: - The AI industry is expected to see significant model upgrades by 2026, with a focus on integrating models with real-world scenarios. The current model evolution is anticipated to continue upward, enhancing application deployment [1][4]. - The historical context of model upgrades was discussed, highlighting the introduction of the Transformer architecture in 2018 and the market impact of ChatGPT in 2022. The model improvements are primarily driven by increasing parameter counts, which enhance intelligence levels [2][4]. 2. **Pre-training and Post-training**: - The transition from pre-training to post-training paradigms is crucial for model evolution. Pre-training is likened to innate intelligence, while post-training represents acquired knowledge through education. This dual approach is expected to enhance model capabilities significantly [2][4]. 3. **Multimodal Models**: - The emergence of multimodal models is a key development, allowing models to process and integrate various types of data beyond text. This shift is expected to broaden the application boundaries of AI models [3][9]. 4. **Commercialization Pathways**: - The speaker highlighted that the commercialization of AI applications is becoming clearer, with significant market opportunities anticipated as models mature. The integration of AI into various sectors is expected to drive substantial market growth [4][10]. 5. **Challenges and Solutions**: - A notable challenge in the AI sector has been the bottleneck in pre-training due to insufficient data. However, new training paradigms like post-training have emerged to revitalize the industry [5][8]. 6. **Future Market Opportunities**: - The AI industry is poised for a major transformation, particularly in the fields of coding and video generation. The development of generative video models is expected to create new market segments and drive commercialization [6][9][13]. Additional Important Insights - The conference emphasized the importance of digital infrastructure and regulatory frameworks for the successful deployment of AI in business-to-business (B2B) scenarios. High labor costs in certain sectors are also seen as a catalyst for faster AI adoption [12]. - The speaker recommended focusing on companies like Alibaba and Tencent, which are expected to benefit from the AI market's restructuring. Additionally, sectors such as healthcare, legal services, and enterprise solutions are highlighted as areas for significant AI application growth [11][12]. - The demand for AI computing power is projected to increase dramatically, with expectations that the need for training could exceed current levels by three to ten times, indicating a robust growth trajectory for the AI computing sector [14]. Conclusion - The overall sentiment from the conference is one of optimism regarding the future of the AI industry, with a strong belief in the potential for significant advancements and commercialization in the coming years. The speaker urged stakeholders to maintain confidence despite short-term fluctuations in the market [11][14].
90后大牛,集体上位
投资界· 2026-02-09 07:19
以下文章来源于版面之外 ,作者画画 版面之外 . 版面之外,才是真相。 权力交接。 作者 / 画画 来源 / 版面之外 (ID:Out_take) 2 0 2 5年底到2 0 2 6年初的几个月里,科技圈有个现象挺耐人寻味。 没有盛大的发布会,没有官方通告,但在深圳腾讯大厦、杭州阿里西溪园区、北京字节 跳动办公楼里,指挥大模型战场的人,悄然换上了一副副年轻面孔。 先 看 腾 讯 , 虽 然 过 去 一 两 年 被 认 为 大 模 型 落 后 , 但 它 丝 毫 没 闲 着 。 先 是 前 Op e nAI 研 究员姚顺雨被传1亿年薪入职腾讯,经过几次辟谣之后,终于在去年底正式加入腾讯, 头衔是首席 AI 科学家,直接向腾讯总裁刘炽平汇报。 就 在 上 周 , 清 华 大 学 计 算 机 系 博 士 、 前 新 加 坡 S e a AI La b 高 级 研 究 科 学 家 庞 天 宇 也 入 职腾讯,负责多模态强化学习。在腾讯这种讲究山头和资历的老牌帝国里,这俩人简直 是坐着猎鹰 9 号火箭上位的。 再 看 阿 里 。 林 俊 旸 , 硕 士 毕 业 后 直 接 加 入 阿 里 AI 研 究 机 构 达 摩 ...
大厂AI权力交接:90后,集体上位
3 6 Ke· 2026-02-02 00:22
2025年底到2026年初的几个月里,科技圈有个现象挺耐人寻味。 没有盛大的发布会,没有官方通告,但在深圳腾讯大厦、杭州阿里西溪园区、北京字节跳动办公楼里, 指挥大模型战场的人,悄然换上了一副副年轻面孔。 先看腾讯,虽然过去一两年被认为大模型落后,但它丝毫没闲着。先是前 OpenAI 研究员姚顺雨被传1 亿年薪入职腾讯,经过几次辟谣之后,终于在去年底正式加入腾讯,头衔是首席 AI 科学家,直接向腾 讯总裁刘炽平汇报。 就在上周,清华大学计算机系博士、前新加坡Sea AI Lab高级研究科学家庞天宇也入职腾讯,负责多模 态强化学习。在腾讯这种讲究山头和资历的老牌帝国里,这俩人简直是坐着猎鹰 9 号火箭上位的。 再看阿里。林俊旸,硕士毕业后直接加入阿里AI研究机构达摩院,成为智能计算实验室的算法专家, 专注于大模型研究。今天他已经是阿里最年轻的 P10,也是开源模型通义千问背后的核心推手。 如果你把腾讯、阿里、大模型独角兽这几家的核心人物拉出来,包括 Kimi 的杨植麟,刚被 Meta 砸下 数十亿美金收购的 Manus 创始人肖弘,会发现一个挺震撼的现象,掌舵着 AI 方向的,全是一帮 90 后。 这批人精准地 ...