Workflow
Transformer架构
icon
Search documents
宜信好望角:AI深度赋能,将如何改变创业格局
Jin Tou Wang· 2025-10-10 01:34
Group 1 - The AI startup landscape in 2025 is characterized by divergent paths, focusing on either B-end or C-end applications, and whether to concentrate on domestic or global markets [1] - B-end applications are seen as having a mature business model with clear payment logic, particularly in the "cost reduction and efficiency enhancement" sector, making it a preferred area for investment [1][2] - C-end markets, despite challenges like payment difficulties, hold potential opportunities through continuous observation and rapid iteration, leveraging domestic talent and evolving model technologies [1] Group 2 - The technical characteristics of AI determine the landing logic in different scenarios, with a focus on customized development for complex enterprise environments [2] - Globalization is viewed as a crucial strategy to break competitive deadlocks, with faster growth opportunities concentrated overseas, supported by the global capabilities of Chinese product managers [2] - Chinese companies possess unique advantages in going global, combining strong AI technology capabilities with a complete supply chain system to create high-cost performance smart devices [2] Group 3 - The emergence of institutional incubation models empowers startups, with organizations like Innovation Works significantly reducing risks by investing in scarce directions 1.5-2 years ahead [3] - The dual drivers of technological iteration and market evolution are clarifying the AI entrepreneurial landscape, emphasizing the importance of precise demand insights and flexible strategy adjustments [3]
刚刚,DeepSeek开源V3.2-Exp,公开新稀疏注意力机制DSA
机器之心· 2025-09-29 10:29
Core Viewpoint - DeepSeek has released the experimental version DeepSeek-V3.2-Exp, which introduces a new sparse attention mechanism aimed at optimizing training and inference efficiency in long-context scenarios [3][5][10]. Summary by Sections Model Release - DeepSeek-V3.2-Exp has been open-sourced with a parameter count of 685 billion [3]. - The release includes a paper detailing the new sparse attention mechanism [5]. Sparse Attention Mechanism - The DeepSeek Sparse Attention (DSA) is the only architectural improvement in version 3.2, focusing on enhancing computational efficiency when processing extended text sequences [5][6][10]. - DSA achieves fine-grained sparse attention while maintaining nearly the same output quality as its predecessor, DeepSeek-V3.1-Terminus [9]. Performance Comparison - A comparison of benchmark results between DeepSeek-V3.1-Terminus and DeepSeek-V3.2-Exp shows that the new version performs comparably across various tasks [11]. - Specific benchmark results include: - MMLU-Pro: 85.0 (V3.1) vs. 85.0 (V3.2) - AIME 2025: 88.4 (V3.1) vs. 89.3 (V3.2) - Codeforces: 2046 (V3.1) vs. 2121 (V3.2) [11]. Future Developments - The upcoming release of Z.ai's GLM-4.6 model is noted, with GLM-4.5 being the previous flagship model [12].
人工智能产业“十四五”复盘与“十五五”展望:“两个变局”下的AI要素化跃
Sou Hu Cai Jing· 2025-09-26 17:47
今天分享的是:人工智能产业"十四五"复盘与"十五五"展望:"两个变局"下的AI要素化跃迁-中国银河 报告共计:49页 《人工智能产业"十四五"复盘与"十五五"展望:"两个变局"下的AI要素化跃迁-中国银河》聚焦AI产业在"十四五"期间的发展 成果与"十五五"趋势,围绕技术演进、产业生态、政策支持及应用拓展展开分析。技术层面,大模型成核心突破方向,参数 量增长提速,从2018年GPT-2的15亿参数跃升至2024年GPT-4的1.76万亿参数,2025年呈现"高参数量+轻量化"并行分化,海外 OpenAI、Meta、Google与国内百度、阿里等企业持续推出迭代模型;算力硬件方面,GPU仍占主导(Nvidia占比70%), ASIC、FPGA等异构芯片加速发展,寒武纪MLU370R-X8等加速卡实现训推一体,海光等企业推动x86与深度计算处理器协 同,液冷等高效散热方案在数据中心普及。产业生态上,AI要素化进程加快,数据经历资源化、资产化、资本化阶段,数据 确权、定价、交易体系逐步完善,政策端2024年数字经济重点工作强调数据要素潜能释放,2025年持续推动标准建设与可信 社会构建;智能体(Agent)生态崛起 ...
专访中昊芯英CTO郑瀚寻:国产AI芯片也将兼容不同平台
Core Insights - The demand for AI computing is driving attention towards AI chips beyond GPUs, with companies like Google and Groq leading the way in alternative technologies [1][3] - In the domestic market, ASIC custom chip manufacturers are rapidly developing, as the cost of specialized chips decreases, allowing more firms to explore personalized AI capabilities [2][4] AI Chip Market Trends - The trend of seeking development opportunities outside of GPU chips is becoming more pronounced, with companies recognizing that innovation is necessary to compete with NVIDIA [3][4] - The success of GPUs is largely attributed to NVIDIA's established engineering teams, which are not easily replicable by newcomers [3] Technological Advancements - The introduction of Tensor Cores in NVIDIA's Tesla V100 series has highlighted the efficiency of tensor processing units (TPUs) in handling large data volumes, offering significant computational advantages [4][5] - The scaling laws in AI models continue to demand higher performance from underlying AI computing clusters, presenting challenges for domestic XPU chips [5] Interconnectivity and Infrastructure - Companies are focusing on enhancing interconnectivity between chips, cabinets, and data centers to meet the demands of high-speed data transmission [5][6] - 中昊芯英 is exploring advanced interconnect technologies, such as OCS all-optical interconnects, to improve its capabilities [6] Competitive Landscape - NVIDIA's InfiniBand protocol is seen as a competitive advantage for large-scale data center deployments, while domestic firms are leaning towards Ethernet protocols for their flexibility and improved performance [6] - The development of software ecosystems is crucial for domestic AI chip platforms, as they need to build their own software stacks to compete with NVIDIA's established CUDA ecosystem [6][7] Future Directions - The evolution of AI models, particularly those based on the Transformer architecture, continues to shape the landscape, with ongoing optimizations and adaptations [7] - The compatibility and smooth operation of various platforms will be essential for the success of domestic AI chips, similar to the early days of the Android ecosystem [7]
中昊芯英CTO郑瀚寻:国产AI芯片也将兼容不同平台
21世纪经济报道记者骆轶琪 太原报道 旺盛的AI智算需求驱动下,越来越多GPU路线之外的AI芯片正获得更多市场关注。 从美股市场看,博通(Broadcom)水涨船高的订单量和股价大涨背后,少不了众多云服务厂商寻求英 伟达GPU生态之外技术路线的支持,以谷歌(Google)为代表的TPU(张量计算单元)芯片、Groq为代 表的LPU芯片都是其中典型。 在国内市场同样如此,立足于ASIC定制芯片的众多厂商正在快速发展。 对于目前市场中XPU广泛发展的情况,中昊芯英联合创始人兼CTO郑瀚寻接受21世纪经济报道记者专访 时指出,"在计算技术发展迭代过程中,产业界持续追求更高费效比的路径,可能会逐渐向某个方向收 敛,这是可以预见的趋势。" 他进一步表示,过去,业界普遍认为ASIC芯片从流片到最终落地应用过程中,需要付出较高成本,但 随着专用芯片持续发展,其成本不再那么高昂时,会有越来越多厂商愿意借力自研专用芯片架构,探索 推进个性化AI能力落地。这是ASIC芯片备受关注的原因。"好比在架构方面,天下大势,合久必分、分 久必合。" TPU跃起 寻找GPU芯片之外的发展机会早已是一种新趋势。 郑瀚寻对记者分析,近些年间硅谷 ...
AI解数学题只靠最后一个token
量子位· 2025-09-14 05:05
Core Insights - The research indicates that in mental arithmetic tasks, the majority of calculations are concentrated on the last token, rather than being distributed across all tokens, suggesting that global information access is not necessary for specific tasks like mental arithmetic [1][11]. Group 1: Research Methodology - Researchers employed Context-Aware Mean Ablation (CAMA) and attention-based peeking techniques to conduct a series of ablation experiments on models like Llama-3-8B [2][22]. - The experiments aimed to identify the "minimum computation" required for models to perform well by systematically removing or altering parts of the model [3]. - A sparse subgraph termed "All-for-One" (AF1) was identified, which allows efficient computation with minimal layers and limited information transfer [4][5]. Group 2: Model Structure and Functionality - In the AF1 structure, initial layers (L_wait) do not perform calculations related to their own values but instead focus on general preparatory tasks [7]. - Information is transferred to the last token through intermediate layers (L_transfer), which then independently performs the final calculations [8][9]. - This separation of general computation and input-specific computation highlights the model's efficiency in handling arithmetic tasks [10]. Group 3: Experimental Findings - The experiments revealed that Llama-3-8B requires only the first 14 layers for general computation, followed by 2 layers for information transfer, with the remaining layers dedicated to the last token's self-computation [24][26]. - AF1_llama demonstrated high fidelity across eight tasks, maintaining performance levels close to the original model [28][29]. - The importance of specific attention heads in arithmetic calculations was confirmed, with the model retaining approximately 95% accuracy even after removing nearly 60 heads, indicating redundancy in attention heads [30]. Group 4: Generalization and Limitations - AF1_llama was tested for its ability to generalize to other arithmetic forms, showing high accuracy in direct arithmetic tasks but failing in tasks requiring semantic understanding, such as word problems and Python code [32][34]. - Similar AF1-like subgraphs were found in Pythia and GPT-J models, although these models exhibited shorter waiting periods and less clear performance boundaries compared to Llama [35][36]. Group 5: Contributions and Innovations - This research contributes to the understanding of arithmetic reasoning and cross-token computation mechanisms in large language models [37]. - The methodologies introduced, CAMA and ABP, offer innovative approaches that could extend beyond arithmetic tasks to broader applications [37].
当导师让我去看多模态感知研究方向后......
自动驾驶之心· 2025-09-07 23:34
传统的融合方式主要分为三种:早期融合直接在输入端拼接原始数据,但计算量巨大;中期融合则是在传感器数 据经过初步特征提取后,将不同模态的特征向量进行融合,这是目前的主流方案,例如将所有传感器特征统一到 BEV 视角下进行处理,这解决了不同传感器数据空间对齐的难题,并与下游任务无缝连接;后融合则是每个传 感器独立完成感知,最后在决策层面进行结果融合,可解释性强但难以解决信息冲突。 在这些基础上, 基于Transformer的端到端融合是当前最前沿的方向 。这种架构借鉴了自然语言处理和计算机 视觉领域的成功经验,通过其跨模态注意力机制,能够学习不同模态数据之间的深层关系,实现更高效、更鲁棒 的特征交互。这种端到端的训练方式减少了中间模块的误差累积,能够直接从原始传感器数据输出感知结果,如 3D目标框,从而更好地捕捉动态信息并提升整体性能。 我们了解到, 不少在读的研究生和博士生都在主攻多模态感知融合方向 ,前面我们推出了端到端和VLA方向的 1V6小班课,很多同学也在咨询我们多传感器融合方向,急需大佬辅导...... 模态感知融 科研2 7 课题背景 为克服单一传感器局限,多模态融合技术通过结合 激光雷达、毫米波雷 ...
晚点独家丨理想自研智驾芯片上车路测,部分计算性能超英伟达 Thor-U
晚点LatePost· 2025-08-28 06:09
Core Viewpoint - Li Auto's self-developed autonomous driving chip M100 has successfully passed key pre-mass production stages and is expected to be mass-produced next year, aiming to enhance efficiency and cost-effectiveness in its autonomous driving algorithms [4][6]. Summary by Sections Chip Development - Li Auto's M100 chip has completed functional and performance testing, demonstrating significant computational capabilities, such as matching the effective computing power of 2 NVIDIA Thor-U chips for large language model tasks and 3 Thor-U chips for traditional visual tasks [4][6]. - The company has allocated a budget of several billion dollars for the development of its self-research chip project, indicating the high costs associated with chip development [6]. Strategic Approach - Li Auto is adopting a dual strategy: relying on external partners like NVIDIA and Horizon for current market competitiveness while developing its own chip for future core advantages [7][8]. - The CTO of Li Auto, Xie Yan, is leading a strategy that combines hardware and software development to maximize chip performance and efficiency [6]. Market Positioning - In its current electric vehicle lineup, Li Auto is using NVIDIA's high-performance chips in flagship models, while employing a mixed strategy in its range-extended models by using either NVIDIA Thor-U or Horizon Journey 6M chips based on different autonomous driving versions [8]. - The core reason for developing its own chip is to optimize performance specifically for Li Auto's algorithms, enhancing cost-effectiveness and efficiency [8].
独家丨理想自研智驾芯片上车路测,部分计算性能超英伟达 Thor-U
晚点Auto· 2025-08-28 03:51
Core Viewpoint - Li Auto's self-developed autonomous driving chip M100 has successfully passed key pre-mass production stages and is expected to be mass-produced next year, enhancing the company's competitive edge in the autonomous driving market [3][5]. Group 1: Chip Development and Performance - The M100 chip has demonstrated specific performance characteristics, providing effective computing power comparable to 2 NVIDIA Thor-U chips for large language model tasks and equivalent to 3 Thor-U chips for traditional visual tasks like image recognition [3][5]. - Li Auto has allocated a budget of several billion dollars for the development of its self-research chip project, indicating the significant investment required for such technology [5]. Group 2: Strategic Partnerships and Current Solutions - Until the M100 chip is mass-produced, Li Auto will continue to rely on existing partnerships with NVIDIA and Horizon Robotics for its current chip solutions [5][7]. - The company employs a mixed strategy for its range-extended models, using either NVIDIA Thor-U or Horizon's Journey 6M chips based on the specific version of its AD Max and AD Pro autonomous driving systems [7]. Group 3: R&D Strategy and Challenges - Li Auto's CTO, Xie Yan, is driving a strategy that combines hardware and software development to maximize chip performance and efficiency, aiming to outperform competitors [5][6]. - The integration of hardware and software in chip development is complex, requiring deep technical expertise and effective collaboration across departments [6].
Meta没做的,英伟达做了,全新架构吞吐量狂飙6倍,20万亿Token训练
3 6 Ke· 2025-08-19 02:33
Core Insights - NVIDIA has launched a new 9B model, the NVIDIA Nemotron Nano 2, utilizing a revolutionary Mamba-Transformer hybrid architecture that achieves up to 6 times higher inference throughput compared to the industry benchmark Qwen3-8B, while maintaining or exceeding performance in complex reasoning tasks [1][23]. Group 1: Model Architecture and Performance - The Nemotron Nano 2 model is based on the innovative Mamba-2 architecture, which replaces most self-attention layers in traditional Transformer architectures, resulting in significant speed improvements during complex reasoning tasks [10][15]. - The model demonstrates competitive accuracy in various benchmarks, including mathematics, code generation, and general reasoning, performing on par or better than similar open-source models like Qwen3-8B and Gemma3-12B [23][24]. - In specific benchmarks, the model achieved notable scores, such as 97.8% in MATH500 and 72.1% in AIME25, showcasing its capabilities in mathematical reasoning and general knowledge [24]. Group 2: Training and Data Utilization - The training process for the Nemotron Nano 2 involved a massive dataset of 20 trillion tokens, utilizing advanced FP8 training techniques to create a foundational model with 120 billion parameters, which was later distilled to 9 billion parameters [17][22]. - The model's training included high-quality data from various sources, focusing on mathematics, code, and multilingual question-answering, ensuring a robust pre-training dataset [18][25]. - NVIDIA has also released a comprehensive pre-training dataset, Nemotron-Pre-Training-Dataset-v1, which includes 6.6 trillion tokens from diverse domains, further enhancing the model's training foundation [25][27]. Group 3: Open Source Commitment - NVIDIA has committed to open-sourcing the Nemotron models on the HuggingFace platform, providing access to the 9B model, its base version, and the larger 12B model, along with the associated datasets [25][30]. - This move reflects NVIDIA's ongoing efforts to contribute to the open-source community, contrasting with other companies that are shifting towards more closed-source strategies [27].