Transformer

Search documents
“推理模型还处于RNN的阶段”——李建忠对话GPT-5与Transformer发明者Lukasz Kaiser实录
AI科技大本营· 2025-10-10 09:52
对话嘉宾 | 李建忠、 Lukasz Kaiser 出品 | CSDN(ID:CSDNnews) 今年开年之际,DeepSeek R1 配合前年年末 OpenAI o1 轰炸了整个 AI 圈子,随后强化学习之父 Rich Sutton 荣获图灵奖,又是用一篇论文向大家宣 告了强化学习、经验时代这些词汇将成为 2025 的主题,我们可能都难免这么觉得: 推理模型 的时代已经来了! 但接下来的一个观点却刷新了我的认知:Transformer 核心发明者之一、OpenAI 科学家 Lukasz Kaiser 就直言,目前的推理模型还处在当年 GPT 都 没出来的机器学习阶段, 未来还需要一个 Transformer 创新级别的推理模型。 而近期,这位定义了大模型核心架构的关键人物,就与奇点智能研究院院长、CSDN 高级副总裁李建忠一道,在 CSDN 的《AI 进化论》栏目中展开了一 场关于 "大模型的第一性思考" 的深度对话。 Lukasz Kaiser 是 AI 领域最具影响力的科学家之一,2017 年他与其他七位谷歌同事(后称"Transformer 八子")共同撰写了那篇开创性的论文 《Attention I ...
国庆长假充电指南:Ilya Sutskever's Top 30 论文阅读清单
锦秋集· 2025-10-01 13:25
国庆中秋双节同至。 我们深信,用探索的精神为祖国献礼,用学习的态度为社会贡献,是迎接新时代的最佳方式。对于 关注AI领域发展的投资者、从业者与研究者而言,这更是沉淀专业认知、把握技术趋势的黄金窗 口。面对快速迭代的AI浪潮,一份兼具权威性与系统性的技术资料,能让您的假期学习事半功倍, 高效完成专业能力的跃升。 今 天 , 我 们 为 大 家 选 了 来 自 Ilya Sutskever 推 荐 的 30 篇 前 沿 论 文 合 集 ( Ilya Sutskever's Top 30)。 Primers • Ilya Sutskever's Top 30 覆盖近15年AI领域里程碑成果的合集,以 "技术底层-能力 突破-场景落地"为逻辑主线 ,串联起AI从"感知智能"到"认知智能"的关键跃迁:从奠定深度学习 基础的 CNN 、 RNN ,到重构自然语言处理领域的 Transformer 与 自注意力机制 ,再到推动 RAG 、 多步推理 等前沿方向的核心研究,每一篇论文都是对应技术领域的"奠基之作",直接关联 当前AI产业落地的核心能力底座。 这份清单既清晰拆解了"残差映射""动态指针网络"等专业术语的技术逻辑 ...
报名倒计时!一键 GET 2025 全球机器学习技术大会参会指南
AI科技大本营· 2025-09-28 10:59
Core Viewpoint - The 2025 Global Machine Learning Technology Conference will be held on October 16-17 in Beijing, focusing on cutting-edge AI research and applications, featuring over 50 prominent speakers from various fields [1][3]. Group 1: Conference Overview - The conference will cover twelve major topics, including advancements in large language models, intelligent agent engineering, multimodal models, and AI-enabled software development [3][4]. - The event aims to provide a platform for genuine exchange between academia and industry, showcasing both theoretical methodologies and practical experiences [4]. Group 2: Key Speakers and Sessions - Notable speakers include Lukasz Kaiser from OpenAI, Li Jianzhong from Singularity Intelligence Research Institute, and Wang Bin from Xiaomi Group, who will discuss the future of AI and large model technologies [6][14]. - The main stage will feature a high-level roundtable discussion on the core issues of AI industry paradigm shifts, involving key figures from the AI sector [14][15]. Group 3: Detailed Agenda - The first day will include sessions on topics such as the evolution of large language models and practical applications of multimodal models [15][28]. - The second day will focus on embodied intelligence, intelligent hardware, and the infrastructure needed for large models, with various specialized sessions scheduled throughout the day [22][28]. Group 4: Logistics and Participation - The conference will take place at the Westin Hotel in Beijing, with registration starting at 8:00 AM and the official program beginning at 9:00 AM on both days [31][32]. - Attendees are encouraged to arrive early to avoid congestion and ensure a smooth check-in process [32][33].
从模型到生态:2025 全球机器学习技术大会「开源模型与框架」专题前瞻
AI科技大本营· 2025-09-26 05:49
Core Insights - The article discusses the growing divide between open-source and closed-source AI models, highlighting that the performance gap has narrowed from 8% to 1.7% as of 2025, indicating that open-source models are catching up [1][12]. Open Source Models and Frameworks - The 2025 Global Machine Learning Technology Conference will feature a special topic on "Open Source Models and Frameworks," inviting creators and practitioners to share their insights and experiences [1][12]. - Various open-source projects are being developed, including mobile large language model inference, reinforcement learning frameworks, and efficient inference services, aimed at making open-source technology more accessible to developers [2][7]. Key Contributors - Notable contributors to the open-source projects include: - Wang Zhaode, a technical expert from Alibaba Taotian Group, focusing on mobile large language model inference [4][23]. - Chen Haiquan, an engineer from ByteDance, contributing to the Verl project for flexible and efficient reinforcement learning programming [4][10]. - Jiang Yong, a senior architect at Dify, involved in the development of open-source tools [4][23]. - You Kaichao, the core maintainer of vLLM, which provides low-cost large model inference services [4][7]. - Li Shenggui, a core developer of SGLang, currently a PhD student at Nanyang Technological University [4][23]. Conference Highlights - The conference will feature discussions on the evolution of AI competition, which now encompasses data, models, systems, and evaluation, with major players like Meta, Google, and Alibaba vying for dominance in the AI ecosystem [12][13]. - Attendees will have the opportunity to hear from leading experts, including Lukasz Kaiser, a co-inventor of GPT-5 and Transformer, who will provide insights into the future of AI technology [12][13]. Event Details - The conference is set to take place soon, with a focus on the latest technological insights and industry trends, encouraging developers to participate and share their experiences [12][13].
从Transformer到GPT-5,听听OpenAI科学家 Lukasz 的“大模型第一性思考”
AI科技大本营· 2025-09-23 02:11
Core Viewpoint - The article discusses the revolutionary impact of the paper "Attention Is All You Need," which introduced the Transformer architecture, fundamentally changing the landscape of artificial intelligence and natural language processing [2][17]. Group 1: The Impact of the Transformer - The paper "Attention Is All You Need" has been cited 197,159 times on Google Scholar, highlighting its significant influence in the AI research community [3][26]. - The authors of the paper, known as the "Transformer Eight," have become prominent figures in the AI industry, with seven of them starting their own companies [4][24]. - The introduction of the Transformer architecture has led to a paradigm shift in AI, moving away from RNNs and enabling better handling of long-distance dependencies in language processing [17][18]. Group 2: Lukasz Kaiser's Journey - Lukasz Kaiser, one of the authors, chose to join OpenAI instead of starting a commercial venture, focusing on the pursuit of AGI [4][25]. - Kaiser has a strong academic background, holding dual master's degrees in computer science and mathematics, and has received prestigious awards for his research [7][8]. - His decision to leave a stable academic position for Google Brain in 2013 was driven by a desire for innovation in deep learning [11][12]. Group 3: The Evolution of AI Models - Kaiser and his team introduced the attention mechanism to address the limitations of RNNs, leading to the development of the Transformer model [15][17]. - The success of the Transformer has spurred a wave of entrepreneurship in the AI field, with many authors of the original paper becoming CEOs and CTOs of successful startups [24][27]. - Kaiser has been involved in the development of cutting-edge models like GPT-4 and GPT-5 at OpenAI, contributing to the forefront of AI research [27]. Group 4: Future Directions in AI - Kaiser predicts that the next phase of AI will focus on teaching models to think more deeply, emphasizing the importance of generating intermediate steps in reasoning [29]. - The upcoming ML Summit 2025 will feature Kaiser discussing the history, present, and future of reasoning models, indicating ongoing advancements in AI technology [28][30].
从Transformer到GPT-5,听听OpenAI科学家 Lukasz 的“大模型第一性思考”
3 6 Ke· 2025-09-22 13:04
Core Insights - The paper "Attention Is All You Need" proposed a revolutionary Transformer architecture that replaced the traditional RNNs in natural language processing, leading to significant advancements in AI applications like ChatGPT and DALL-E [1][15][24] - The authors, known as the "Transformer Eight," gained recognition for their groundbreaking work, which has been cited over 197,159 times as of the article's publication [2][15] Group 1: The Impact of Transformer Architecture - The introduction of the Transformer architecture has reshaped the AI landscape, enabling better handling of long-distance dependencies in language processing compared to RNNs [1][15] - The architecture's parallel processing capabilities have made it a new paradigm in NLP, extending its influence to various AI subfields, including computer vision and speech recognition [15][24] Group 2: The Journey of Lukasz Kaiser - Lukasz Kaiser, one of the "Transformer Eight," chose to join OpenAI instead of pursuing entrepreneurial ventures, focusing on AGI and leading the development of models like GPT-4 and GPT-5 [3][21] - Kaiser's academic background in logic and games laid the foundation for his contributions to AI, emphasizing a systematic approach to problem-solving [5][6] Group 3: The Evolution of AI Research - The transition from RNNs to Transformers marked a significant shift in AI research, with Kaiser and his team identifying the limitations of RNNs and proposing the attention mechanism as a solution [10][12] - The development of the Tensor2Tensor library facilitated the rapid iteration of the Transformer model, reflecting Kaiser's commitment to making AI more accessible [13][14] Group 4: Future Directions in AI - Kaiser has articulated a vision for the future of AI, emphasizing the importance of teaching models to think and reason more deeply, which could lead to a paradigm shift in AI capabilities [25][26] - The anticipated advancements include multi-modal AI, larger and more capable Transformers, and the proliferation of AI services through APIs and cloud platforms [25][26]
惹毛馬斯克的後果!被傳統行業卡脖子後,他決定自己來! #馬斯克 #特斯拉 #ElonMusk #Tesla #供應鏈 #變壓器 #Megapack #工業 #科技 #商業戰爭
大鱼聊电动· 2025-09-10 07:29
一個又大又笨 的鐵盒子 能卡住特斯拉 上百億的生意嗎! 這個東西 就叫變壓器 電網的 「心臟瓣膜」 但現在 全球大缺貨! 如果沒有變壓器 特斯拉的 Megapack 儲能系統 造得再快 賣得再好 也無法接入電網! 特斯拉的能源帝國 就這樣被一群 行動遲緩的 傳統製造商 死死地 卡住了脖子! 你猜馬斯克 會怎麼做? 妥協? 等待? 他的回答 一如既往的 簡單粗暴 既然你們搞不定 那老子就 自己來! 這就是馬斯克 永遠不要成為 他前進路上 的絆腳石. ...
给包凡的信 | Findme
投中网· 2025-08-14 09:37
Core Viewpoint - The article reflects on the return of a prominent figure in the investment banking industry, expressing a sense of anticipation and curiosity about the changes that have occurred during their absence, particularly in the context of evolving relationships and market dynamics [3][4]. Group 1: Industry Trends - The rise of generative AI has become a significant trend in the investment landscape, with major players like ChatGPT and xAI gaining attention and funding in 2023 [4][5]. - The "Big Model Six Dragons" emerged as key players in the AI sector, with numerous companies entering the market, indicating a rapid expansion and competition in AI technologies [6]. - New consumer companies, referred to as the "three sisters" in the Hong Kong stock market, have shown strong performance, suggesting emerging investment opportunities in the consumer sector [7]. Group 2: Personal Reflections and Relationships - The article discusses the evolution of personal relationships within the industry, questioning whether past friendships have changed and how perceptions of individuals have shifted over time [5][6]. - It highlights the importance of long-term relationships and the value of giving without immediate returns, reflecting a philosophy of trust and future potential [5]. - The narrative includes observations about various industry figures, noting their changing roles and public perceptions, which may influence future collaborations and opportunities [8][9]. Group 3: Company Dynamics - The article mentions the operational changes within a prominent investment firm, indicating a shift towards a more decentralized management structure, allowing for personal privacy and autonomy for key figures [10]. - It emphasizes the firm's successful fundraising efforts and the strategic decisions made in response to market conditions, showcasing adaptability in a fluctuating environment [10]. - The discussion includes the firm's historical context and its evolution over the past two decades, reflecting on its growth and the challenges faced [10][11].
亚洲电力设备:关于高压电力设备定价、需求及关税影响的专家电话会议要点-Asia Power Equipment_ Key takeaways from expert call on pricing, demand and tariff impact for high voltage power equipment
2025-08-05 03:15
Summary of Key Points from the Expert Call on High Voltage Power Equipment Industry Overview - **Industry**: High Voltage Power Equipment - **Key Drivers**: Demand driven by renewable energy installations, data centers, and potential growth in transmission capital expenditures (capex) Core Insights 1. **Price Increases**: - Price hikes for high voltage power equipment have accelerated, with certain types experiencing over 10% year-over-year increases since June 2025, attributed to tariffs and rising demand from renewables [2][4][5] - General price increases were noted at 3-5% in the first half of 2025, with transformers seeing the largest hikes [4][5] 2. **Strong Demand**: - Demand for high voltage power equipment remains robust year-to-date (YTD), primarily driven by new connections for renewable energy projects, accounting for over 70% of total demand [2][5] - The expert anticipates continued strong demand through 2026/27 due to the push for renewable energy and data center installations [2][5] 3. **Future Demand Dynamics**: - While demand from renewables may plateau, it is expected that new connections for gas-fired and nuclear power plants, along with data centers, will fill the gap [5][6] - The replacement cycle for existing equipment is expected to gain momentum in the coming years, although currently, replacement demand is less than 30% [2][5] 4. **Transmission Capex Growth**: - A forecasted 10% growth in transmission capex for 2025, with potential for stronger growth in subsequent years, contingent on resolving permitting issues [6] - The expert highlighted that regulatory hurdles remain a significant barrier to long-distance transmission network growth [6] 5. **Trade Tariff Impact**: - The impact of trade tariffs on pricing is seen as limited, with operators willing to pay higher prices to secure essential equipment for grid connections [6] - Equipment manufacturers are adjusting prices or negotiating with customers to pass on increased costs due to tariffs [6] 6. **Supply Constraints**: - There has been no noticeable increase in supply for high voltage power equipment YTD, particularly for transformers, primarily due to a lack of skilled labor [6] - Local manufacturers face challenges in ramping up capacity, and there is reluctance among regulated utilities to procure from Chinese manufacturers due to national security concerns [6] Additional Insights - **Market Sentiment**: The expert's views align with a bullish outlook on the demand/supply imbalance for high voltage power equipment in the US, supporting the positive ratings on companies like Hyundai Electric, Hyosung Heavy, and Sieyuan Electric [2][4] - **Long-term Trends**: The expert noted that lead times for high voltage equipment remain extended, indicating ongoing supply chain challenges [5] Conclusion - The high voltage power equipment industry is poised for growth driven by renewable energy and data center demands, despite challenges in supply and regulatory hurdles. The pricing environment is influenced by tariffs, but demand remains strong, suggesting a favorable outlook for key players in the market.
辛顿教授世界人工智能大会演讲PPT
2025-07-29 02:10
Summary of Key Points from the Conference Call Industry or Company Involved - The discussion revolves around the field of Artificial Intelligence (AI), particularly focusing on Digital Intelligence versus Biological Intelligence. Core Points and Arguments 1. **Two Paradigms of Intelligence** - The essence of intelligence is reasoning, achieved through symbolic rules manipulating symbolic expressions. Learning can be secondary to understanding knowledge representation [7][8][9]. 2. **Evolution of Language Models** - Over the past 30 years, significant advancements have occurred in language modeling, including the introduction of embedding vectors and the invention of transformers by Google [13][14]. 3. **Understanding of Language by LLMs** - Large Language Models (LLMs) understand language similarly to humans by converting words into compatible feature vectors, indicating a level of comprehension in their responses [16][28]. 4. **Analogy of Words as Lego Blocks** - Words are compared to high-dimensional Lego blocks, which can model various concepts and communicate ideas effectively [20][24]. 5. **Digital vs. Biological Computation** - Digital computation, while energy-intensive, allows for easy knowledge sharing among agents with the same model. In contrast, biological computation is less energy-consuming but struggles with knowledge transfer [51]. 6. **Knowledge Transfer Mechanisms** - Knowledge can be distilled from a teacher to a student in AI systems, allowing for efficient learning and adaptation [41][48]. 7. **Challenges of AI Control** - A super-intelligence could manipulate users to gain power, raising concerns about control and safety in AI development [55][57]. 8. **Global Cooperation on AI Safety** - There is skepticism about international collaboration on AI safety measures against threats like cyber attacks and autonomous weapons [64]. 9. **Training Benevolent AI** - Techniques to train AI to be benevolent may be independent of those that enhance its intelligence, suggesting a need for focused research on AI safety [68][72]. Other Important but Possibly Overlooked Content - The discussion emphasizes the potential risks associated with AI development, likening the situation to owning a tiger cub that could become dangerous as it matures, highlighting the urgency for safety measures [61]. - The need for countries to establish well-funded AI safety institutes to focus on making AI systems that do not seek control is also noted [72].