Workflow
Transformer
icon
Search documents
X @Avi Chawla
Avi Chawla· 2025-06-14 20:03
Model Architecture - Explains Transformer vs Mixture of Experts (MoE) in LLMs with visuals [1] - Focuses on clearly explaining Mixture of Experts in LLMs [1]
X @Avi Chawla
Avi Chawla· 2025-06-14 06:30
LLM 技术 - Transformer 与 Mixture of Experts (MoE) 在 LLMs 中的对比分析 [1] - 行业关注 DS (数据科学), ML (机器学习), LLMs (大型语言模型), 和 RAGs (检索增强生成) 的教程和见解 [1] 社交媒体互动 - 鼓励用户分享信息 [1] - 行业专家 Avi Chawla 在社交媒体上分享相关内容 [1]
X @Avi Chawla
Avi Chawla· 2025-06-14 06:30
LLM Architectures - The report compares Transformer and Mixture of Experts (MoE) architectures in Large Language Models (LLMs) [1] - The report provides clear explanations and visuals to illustrate the differences between the two architectures [1] Focus - The report focuses on explaining Transformer and MoE architectures in LLMs [1]
下一个十年,AI的大方向
Hu Xiu· 2025-06-12 01:16
Core Insights - The article reflects on the evolution of artificial intelligence (AI) over the past decade, highlighting the rise and decline of major players in the industry, particularly the "AI Four Dragons" [3][4] - It suggests that the next decade (2025-2035) may shift focus from visual recognition to visual generation technologies [4][5] - The article discusses the emergence of various AI models in China, including those from major companies like Baidu, Alibaba, and Tencent, indicating a competitive landscape [4][6] Industry Developments - The AI landscape has seen significant advancements in large models, with a variety of applications emerging, such as text generation, audio generation, image generation, and video generation [4][5][6] - The article notes that these advancements are being monetized, with many companies starting to charge for their services, except for code generation in China [6] Historical Milestones - Key milestones in AI development include the introduction of the Transformer model in 2017, which revolutionized the field by consolidating various specialized models into a more unified approach [7] - The launch of ChatGPT in 2023 marked a significant turning point, prompting major companies like Google to accelerate their AI initiatives [8] - The article also references the release of OpenAI's Sora visual model in 2024, which highlighted the industry's challenges and led to renewed focus on text and context generation [8] Philosophical Considerations - The article raises questions about the future direction of AI, debating whether the next decade will be dominated by Artificial General Intelligence (AGI) or AI-Generated Content (AIGC) [11] - It draws parallels with the skepticism surrounding reusable rocket technology, suggesting that innovation often faces initial resistance before its value is recognized [13][14][15]
苹果憋一年终超同参数 Qwen 2.5?三行代码即可接入 Apple Intelligence,自曝如何做推理
AI前线· 2025-06-10 10:05
整理 | 华卫、核子可乐 在今年的 WWDC 全球开发者大会上,苹果推出新一代专为增强 Apple Intelligence 功能所开发的语 言基座模型。经过优化的最新基座模型可在苹果芯片上高效运行,包括一个约 3B 参数的紧凑型模型 和一个基于服务器的混合专家模型,后者为专门针对私有云量身定制的全新架构。 这两大基座模型,均隶属于苹果为支持用户而打造的生成式模型家族。这些模型改进了工具使用与推 理能力,可以理解图像与文本输入,速度更快、效率更高,而且能够支持 15 种语言及平台中集成的 各种智能功能。 据介绍,苹果通过开发新的模型架构来提高这两个模型的效率。对于设备端模型,将整个模型按 5: 3 的深度比分为两块。块 2 中的所有键值(KV)缓存都直接与块 1 最后一层生成的缓存共享,由此 将键值缓存的内存占用量降低了 38.5%,同时显著改善了首个 token 生成时间(time-to-first- token)。 苹果还引入并行轨道专家混合 (PT-MoE) 设计,为服务器端模型开发出一套新架构。此模型由多 个较小的 Transformer(即「轨道」)组成,它们独立处理各 token,仅在各轨道块的输 ...
裁员了,很严重,大家做好准备吧!
猿大侠· 2025-06-04 02:55
Core Viewpoint - The article emphasizes the urgency for technology professionals to adapt to the rapid growth of AI applications, highlighting the need for skills in AI model development and application to avoid job displacement and to seize high-paying opportunities in the industry [1][2]. Group 1: Industry Trends - The demand for AI talent is surging, with major companies like Alibaba and ByteDance actively hiring AI model developers while simultaneously laying off traditional tech roles [1]. - There is a growing consensus among large firms regarding the urgency of accelerating AI application deployment, shifting focus from traditional coding skills to AI model experience [1][2]. Group 2: Learning Opportunities - The article promotes a free training program aimed at equipping participants with AI model application development skills, emphasizing the importance of understanding AI principles, application technologies, and practical project experience [2][4]. - The training includes live sessions with industry experts, covering typical business scenarios, technical architecture, and core principles of AI model technologies such as RAG, Agent, and Transformer [2][11]. Group 3: Career Development - The program offers insights into current job market trends for AI model roles, including salary expectations and career progression strategies from the perspective of hiring managers [6]. - Participants will have access to internal referral opportunities, enhancing their chances of securing high-paying job offers directly from major companies [6][8]. Group 4: Practical Application - The training includes hands-on experience with popular AI applications, allowing participants to build a portfolio of practical projects that can be showcased in job applications [8][11]. - The course aims to bridge the gap between technical knowledge and real-world application, helping participants to effectively implement AI solutions in various business contexts [4][11].
DeepSeek技术溯源及前沿探索报告
Zhejiang University· 2025-05-22 01:20
浙江大学DS系列专题 DeepSeek技术溯源及前沿探索 主讲人:朱强 浙江大学计算机科学与技术学院 人工智能省部共建协同创新中心(浙江大学) https://person.zju.edu.cn/zhuq 1 Outline 一、语言模型 三、ChatGPT 二、Transformer 四、DeepSeek 五、新一代智能体 2 语言模型:终极目标 Language Modeling 对于任意的词序列,计算出这个序列是一句话的概率 我们每天都和语言模型打交道: I saw a cat I saw a cat on the chair I saw a cat running after a dog I saw a ca car I saw a cat in my dream 3 语言模型:基本任务 编码:让计算机理解人类语言 She is my mom 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 只有一个1,其余均为0 One-hot Encoding有什么缺点吗? One-hot Encoding 4 编码:让计算机理解人类语言 Word Embedding A bottle of tez ...
Google首席科学家万字演讲回顾AI十年:哪些关键技术决定了今天的大模型格局?
机器人圈· 2025-04-30 09:10
Google 首席科学家Jeff Dean 今年4月于在苏黎世联邦理工学院发表关于人工智能重要趋势的演讲,本次演讲回顾 了奠定现代AI基础的一系列关键技术里程碑,包括神经网络与反向传播、早期大规模训练、硬件加速、开源生 态、架构革命、训练范式、模型效率、推理优化等。算力、数据量、模型规模扩展以及算法和模型架构创新对AI 能力提升的关键作用。 以下是本次演讲 实录 经数字开物团队编译整理 01 AI 正以前所未有的规模和算法进步改变计算范式 Jeff Dean: 今天我将和大家探讨 AI 的重要趋势。我们会回顾:这个领域是如何发展到今天这个模型能力水平的?在当前的技 术水平下,我们能做些什么?以及,我们该如何塑造 AI 的未来发展方向? 这项工作是与 Google 内外的众多同仁共同完成的,所以并非全是我个人的成果,其中许多是合作研究。有些工作 甚至并非由我主导,但我认为它们都非常重要,值得在此与大家分享和探讨。 我们先来看一些观察发现,其中大部分对在座各位而言可能显而易见。首先,我认为最重要的一点是,机器学习 彻底改变了我们对计算机能力的认知和期待。回想十年前,当时的计算机视觉技术尚处初级阶段,计算机几乎谈 ...
Cartesia: 3 个月融资 9100 万美元,从 Transformer 到 Mamba 重塑语音 AI
海外独角兽· 2025-04-03 12:04
作者:linlin 编辑:haina 2025 年 3 月 11 日,语音生成初创公司 Cartesia 宣布完成 6400 万美元 A 轮融资,距其 2700 万美元种 子轮融资仅过去不到 3 个月。本轮融资由 Kleiner Perkins 领投,Lightspeed、Index、A*、Greycroft、 Dell Technologies Capital 和 Samsung Ventures 等跟投。Cartesia 还同时推出了其旗舰产品 Sonic 2.0, 系统延迟从 90 毫秒缩短至 45 毫秒,为语音 AI 领域高效、实时且低成本的多模态交互提供了新动 力。 Cartesia 的核心团队均来自 Stanford AI labs,包括 Karan Goel、Albert Gu、Arjun Desai、Brandon Yang 四位校友及其共同导师 Chris Ré。团队共同的研究方向在于 SSM(状态空间模型)。从 S4 到 Mamba 的 SSM 系列研究,以线性时间复杂度,为解决 LLMs 主流架构 Transformer 在上下文长度的 固有局限提供了潜在解决方案,意味着更快的生成速度、 ...
3700 次预训练寻找 “线性注意力” 非共识,MiniMax-01 开发者讲述 4 年探索
晚点LatePost· 2025-03-09 12:00
"我们跑的是下半场,赌的就是未来的长文本需求。" MiniMax 在今年 1 月发布了参数为 4560 亿的开源大模型 MiniMax-01,该模型就用到了他们开发的线 性注意力机制 "Lightning Attention"。 我们邀请了这个项目的负责人,MiniMax 高级研究总监钟怡然,来与我们一起聊线性注意力的研发过 程。钟怡然在 MiniMax 负责大模型网络架构设计,目前正开发多模态深度推理模型。 钟怡然曾担任上海人工智能实验室青年科学家,是新架构探索组的 PI(项目负责人);他在澳洲国立大 学获得博士学位,师从李宏东教授和 Richard Hartley 院士。他和他的团队已在一些国际顶级学术会议和 期刊上发表了 20 余篇关于模型新架构的论文,覆盖了当前多类非 Transformer 架构,如线性注意力机制 (线性注意力)、长卷积(Long Convolution)和线性循环网络(Linear RNN)。 在 2021 年,线性注意力还是一个 "看起来很美好的泡泡",怡然和团队就开始探索线性架构的实现。 嘉宾 丨 钟怡然 整理 丨 刘倩 程曼祺 上期播客中, 我们与清华的两位博士生,肖朝军和傅 ...