Workflow
Transformer
icon
Search documents
裁员了,很严重,大家做好准备吧!
猿大侠· 2025-06-04 02:55
Core Viewpoint - The article emphasizes the urgency for technology professionals to adapt to the rapid growth of AI applications, highlighting the need for skills in AI model development and application to avoid job displacement and to seize high-paying opportunities in the industry [1][2]. Group 1: Industry Trends - The demand for AI talent is surging, with major companies like Alibaba and ByteDance actively hiring AI model developers while simultaneously laying off traditional tech roles [1]. - There is a growing consensus among large firms regarding the urgency of accelerating AI application deployment, shifting focus from traditional coding skills to AI model experience [1][2]. Group 2: Learning Opportunities - The article promotes a free training program aimed at equipping participants with AI model application development skills, emphasizing the importance of understanding AI principles, application technologies, and practical project experience [2][4]. - The training includes live sessions with industry experts, covering typical business scenarios, technical architecture, and core principles of AI model technologies such as RAG, Agent, and Transformer [2][11]. Group 3: Career Development - The program offers insights into current job market trends for AI model roles, including salary expectations and career progression strategies from the perspective of hiring managers [6]. - Participants will have access to internal referral opportunities, enhancing their chances of securing high-paying job offers directly from major companies [6][8]. Group 4: Practical Application - The training includes hands-on experience with popular AI applications, allowing participants to build a portfolio of practical projects that can be showcased in job applications [8][11]. - The course aims to bridge the gap between technical knowledge and real-world application, helping participants to effectively implement AI solutions in various business contexts [4][11].
DeepSeek技术溯源及前沿探索报告
Zhejiang University· 2025-05-22 01:20
Investment Rating - The report does not provide a specific investment rating for the industry Core Insights - The report discusses the evolution of large language models (LLMs) and highlights the significance of DeepSeek technology in bridging the gap between open-source and closed-source AI models, reducing the development lag from 6-12 months to 1-3 months [69] Summary by Sections Language Models - Language models aim to calculate the probability of a sequence of words, enabling machines to understand human language [6] - The report outlines the basic tasks of language models, including encoding and word embedding, which help in representing words in a way that captures their meanings [13][17] Transformer - The Transformer architecture introduced in 2017 revolutionized deep learning with its self-attention mechanism, allowing for parallel computation and better understanding of global context [32] - The report emphasizes the importance of the Transformer model as a foundational technology for large models, highlighting its ability to capture complex semantic relationships through multi-head attention [33] DeepSeek - DeepSeek technology is positioned as a significant advancement in AI, with its architecture allowing for efficient model training and inference, thus addressing the computational demands of large models [70] - The report details the stages of DeepSeek's development, including supervised fine-tuning and reinforcement learning, which enhance its reasoning capabilities [117][119] New Generation Agents - The report discusses the transition from generative models to reasoning models, indicating a shift in focus towards enhancing logical reasoning capabilities in AI systems [107] - It highlights the integration of LLMs with agent-based systems, where LLMs serve as the brain of agents, enabling them to perform complex tasks through planning and tool usage [133]
Google首席科学家万字演讲回顾AI十年:哪些关键技术决定了今天的大模型格局?
机器人圈· 2025-04-30 09:10
Google 首席科学家Jeff Dean 今年4月于在苏黎世联邦理工学院发表关于人工智能重要趋势的演讲,本次演讲回顾 了奠定现代AI基础的一系列关键技术里程碑,包括神经网络与反向传播、早期大规模训练、硬件加速、开源生 态、架构革命、训练范式、模型效率、推理优化等。算力、数据量、模型规模扩展以及算法和模型架构创新对AI 能力提升的关键作用。 以下是本次演讲 实录 经数字开物团队编译整理 01 AI 正以前所未有的规模和算法进步改变计算范式 Jeff Dean: 今天我将和大家探讨 AI 的重要趋势。我们会回顾:这个领域是如何发展到今天这个模型能力水平的?在当前的技 术水平下,我们能做些什么?以及,我们该如何塑造 AI 的未来发展方向? 这项工作是与 Google 内外的众多同仁共同完成的,所以并非全是我个人的成果,其中许多是合作研究。有些工作 甚至并非由我主导,但我认为它们都非常重要,值得在此与大家分享和探讨。 我们先来看一些观察发现,其中大部分对在座各位而言可能显而易见。首先,我认为最重要的一点是,机器学习 彻底改变了我们对计算机能力的认知和期待。回想十年前,当时的计算机视觉技术尚处初级阶段,计算机几乎谈 ...
Cartesia: 3 个月融资 9100 万美元,从 Transformer 到 Mamba 重塑语音 AI
海外独角兽· 2025-04-03 12:04
作者:linlin 编辑:haina 2025 年 3 月 11 日,语音生成初创公司 Cartesia 宣布完成 6400 万美元 A 轮融资,距其 2700 万美元种 子轮融资仅过去不到 3 个月。本轮融资由 Kleiner Perkins 领投,Lightspeed、Index、A*、Greycroft、 Dell Technologies Capital 和 Samsung Ventures 等跟投。Cartesia 还同时推出了其旗舰产品 Sonic 2.0, 系统延迟从 90 毫秒缩短至 45 毫秒,为语音 AI 领域高效、实时且低成本的多模态交互提供了新动 力。 Cartesia 的核心团队均来自 Stanford AI labs,包括 Karan Goel、Albert Gu、Arjun Desai、Brandon Yang 四位校友及其共同导师 Chris Ré。团队共同的研究方向在于 SSM(状态空间模型)。从 S4 到 Mamba 的 SSM 系列研究,以线性时间复杂度,为解决 LLMs 主流架构 Transformer 在上下文长度的 固有局限提供了潜在解决方案,意味着更快的生成速度、 ...
3700 次预训练寻找 “线性注意力” 非共识,MiniMax-01 开发者讲述 4 年探索
晚点LatePost· 2025-03-09 12:00
"我们跑的是下半场,赌的就是未来的长文本需求。" MiniMax 在今年 1 月发布了参数为 4560 亿的开源大模型 MiniMax-01,该模型就用到了他们开发的线 性注意力机制 "Lightning Attention"。 我们邀请了这个项目的负责人,MiniMax 高级研究总监钟怡然,来与我们一起聊线性注意力的研发过 程。钟怡然在 MiniMax 负责大模型网络架构设计,目前正开发多模态深度推理模型。 钟怡然曾担任上海人工智能实验室青年科学家,是新架构探索组的 PI(项目负责人);他在澳洲国立大 学获得博士学位,师从李宏东教授和 Richard Hartley 院士。他和他的团队已在一些国际顶级学术会议和 期刊上发表了 20 余篇关于模型新架构的论文,覆盖了当前多类非 Transformer 架构,如线性注意力机制 (线性注意力)、长卷积(Long Convolution)和线性循环网络(Linear RNN)。 在 2021 年,线性注意力还是一个 "看起来很美好的泡泡",怡然和团队就开始探索线性架构的实现。 嘉宾 丨 钟怡然 整理 丨 刘倩 程曼祺 上期播客中, 我们与清华的两位博士生,肖朝军和傅 ...
【广发金工】神经常微分方程与液态神经网络
广发金融工程研究· 2025-03-06 00:16
广发证券首席金工分析师 安宁宁 anningning@gf.com.cn 广发证券资深金工分析师 陈原文 chenyuanwen@gf.com.cn 联系人:广发证券金工研究员 林涛 gflintao@gf.com.cn 广发金工安宁宁陈原文团队 摘要 神经常微分方程: 在机器学习国际顶会NeurIPS 2018上,Chen等人发表的论文《Neural Ordinary Differential Equations》获得了大会的最佳论文奖。简单来 说,一个常见的ResNet网络通常由多个形如h_{t+1}=f(h_t,_t)+h_t的残差结构所组成。在常规求解中,需计算出每一个残差结构中最能拟合训练数据的网 络参数。而该论文提出,假设当ResNet网络中的残差结构无限堆叠时,则每一个残差结构的参数都可以通过求解同一个常微分方程来获得。 液态神经网络: 基于上述工作,来自麻省理工学院的Ramin Hasani等人,创新性地以常微分方程的形式描述循环神经网络的隐藏状态变化,提出了一类被 称之为液态神经网络的模型,这些研究成果被发表在《Nature:Machine Intelligence》等国际顶级期刊上。此类模 ...
AI芯片的双刃剑
半导体行业观察· 2025-02-28 03:08
Core Viewpoint - The article discusses the transformative shift from traditional software programming to AI software modeling, highlighting the implications for processing hardware and the development of dedicated AI accelerators. Group 1: Traditional Software Programming - Traditional software programming is based on writing explicit instructions to complete specific tasks, making it suitable for predictable and reliable scenarios [2] - As tasks become more complex, the size and complexity of codebases increase, requiring manual updates by programmers, which limits dynamic adaptability [2] Group 2: AI Software Modeling - AI software modeling represents a fundamental shift in problem-solving approaches, allowing systems to learn patterns from data through iterative training [3] - AI utilizes probabilistic reasoning to make predictions and decisions, enabling it to handle uncertainty and adapt to changes [3] - The complexity of AI systems lies in the architecture and scale of the models rather than the amount of code written, with advanced models containing hundreds of billions to trillions of parameters [3] Group 3: Impact on Processing Hardware - The primary architecture for executing software programs has been the CPU, which processes instructions sequentially, limiting its ability to handle the parallelism required for AI models [4] - Modern CPUs have adopted multi-core and multi-threaded architectures to improve performance, but still lack the massive parallelism needed for AI workloads [4][5] Group 4: AI Accelerators - GPUs have become the backbone of AI workloads due to their unparalleled parallel computing capabilities, offering performance levels in the range of petaflops [6] - However, GPUs face efficiency bottlenecks during inference, particularly with large language models (LLMs), where theoretical peak performance may not be achieved [6][7] - The energy demands of AI data centers pose sustainability challenges, prompting the industry to seek more efficient alternatives, such as dedicated AI accelerators [7] Group 5: Key Attributes of AI Accelerators - AI processors require unique attributes not found in traditional CPUs, with batch size and token throughput being critical for performance [8] - Larger batch sizes can improve throughput but may lead to increased latency, posing challenges for real-time applications [12] Group 6: Overcoming Hardware Challenges - The main bottleneck for AI accelerators is memory bandwidth, often referred to as the memory wall, which affects performance when processing large batches [19] - Innovations in memory architecture, such as high bandwidth memory (HBM), can help alleviate memory access delays and improve overall efficiency [21] - Dedicated hardware accelerators designed for LLM workloads can significantly enhance performance by optimizing data flow and minimizing unnecessary data movement [22] Group 7: Software Optimization - Software optimization plays a crucial role in leveraging hardware capabilities, with highly optimized kernels for LLM operations improving performance [23] - Techniques like gradient checkpointing and pipeline parallelism can reduce memory usage and enhance throughput [23][24]