Workflow
软硬件协同设计
icon
Search documents
理想CTO谢炎在云栖大会分享理想自动驾驶芯片设计思路
理想TOP2· 2025-09-27 08:58
视频版: 压缩版: 理想VLA做L两个原因,技术原因是图语言的长推理能力,需要语言的token输入输出是次要的。非技术原因是更容易价值观对齐。 认为最后5%10%corner case很难靠数据或世界模型自己撞出来,而需要具备类似人的推理能力。 和业界一样,在思考GPGPU是不是AI时代的终极答案。从CPU到GPU到GPGPU,本质上是冯诺依曼架构,冯诺依曼架构核心本质是程序主要关注的是 计算不是数据,数据是第二等公民,计算是一等公民。 在AI时代,计算的算子没那么多,提出的问题是,能不能让程序更多关注数据,而不是关注计算。 理想自研的车端计算架构主要是NPU,不是SOC。SOC无非是前处理后处理的CPU Cluster,加一些IO在外面与内存访存控制器。NPU里面是一个重合架 构,加一个CCB(Central Control Computing Block)用来做一些前处理后处理,不适合非张量的计算,每个class是同构的,用Mesh Bus连在一起,也提供 Ring Bus(环形总线)做广播。原话"这个是我们完全是我们独创的一个AI推理架构,目前国内没有这么做的。" 比较挑战的是编译器(涉及很多编程模型和 ...
理想自动驾驶芯片最核心的是数据流架构与软硬件协同设计
理想TOP2· 2025-09-05 04:56
Core Viewpoint - The article discusses the advancements in Li Auto's self-developed chip architecture, particularly focusing on the VLA architecture and its implications for autonomous driving capabilities [1][2]. Group 1: Chip Development and Architecture - Li Auto's self-developed chip is designed with a data flow architecture that emphasizes hardware-software co-design, making it suitable for running large neural networks efficiently [5][9]. - The chip is expected to achieve 2x performance compared to leading chips when running large language models like GPT and 3x for vision models like CNN [5][8]. - The development timeline from project initiation to vehicle deployment is approximately three years, indicating a rapid pace compared to similar projects [5][8]. Group 2: Challenges and Innovations - Achieving real-time inference on the vehicle's chip is a significant challenge, with efforts focused on optimizing performance through various engineering techniques [3][4]. - Li Auto is implementing innovative parallel decoding methods to enhance the efficiency of action token inference, which is crucial for autonomous driving [4]. - The integration of CPU, GPU, and NPU in the Thor chip aims to improve versatility and performance in processing large amounts of data, which is essential for autonomous driving applications [3][6]. Group 3: Future Outlook - The company expresses strong confidence in its innovative architecture and full-stack development capabilities, which are expected to become key differentiators in the future [7][10]. - The relationship between increased computing power and improved performance in advanced driver-assistance systems (ADAS) is highlighted, suggesting a predictable enhancement in capabilities as technology evolves [6][9].
沉寂一个月,openPangu性能飙升8%!华为1B开源模型来了
机器之心· 2025-09-05 04:31
Core Viewpoint - Huawei's Pangu Embedded-1B model represents a significant advancement in edge AI, enabling powerful AI capabilities on resource-constrained devices, thus paving the way for intelligent upgrades in various industries [1][5]. Group 1: Model Performance and Efficiency - The openPangu Embedded-1B model, with 1 billion parameters, achieves a new state-of-the-art (SOTA) record in performance and efficiency, demonstrating that smaller models can deliver substantial capabilities [2][3]. - The model's overall average score reached 63.90, surpassing similar models and matching larger models like Qwen3-1.7B, showcasing its parameter efficiency [3][4]. - In mathematical reasoning, the model scored 82.76% on the GSM8K benchmark and 81.83% on the MATH dataset, significantly outperforming its peers [3][4]. Group 2: Technical Innovations - The model employs a soft-hardware collaborative design, optimizing its architecture to align with the characteristics of Ascend hardware, ensuring efficient resource utilization [9][10]. - A two-stage curriculum learning approach is utilized to enhance the model's reasoning capabilities, simulating a human-like learning process [15][16]. - The introduction of offline On-Policy knowledge distillation allows for a more flexible and effective training process, improving the model's accuracy and generalization [18][19]. Group 3: Reinforcement Learning and Future Directions - The model incorporates a multi-source reward reinforcement learning mechanism, enhancing its performance through targeted feedback based on task complexity [22][25]. - Future developments aim to integrate fast and slow thinking processes within a single model, allowing for adaptive responses based on problem difficulty, thus improving both speed and accuracy [29][30].
CoDesign 2025国际研讨会在大阪召开 共探高性能计算与AI融合新路径
Cai Jing Wang· 2025-07-18 04:22
Group 1 - The CoDesign 2025 International Symposium was successfully held in Osaka, Japan, focusing on the challenges of large-scale computing and big data, emphasizing the importance of hardware-software co-design for the development of high-performance computing (HPC) [1] - The conference highlighted four core areas: algorithms, application systems, system software and middleware, and hardware-software co-design architecture, covering key fields of high-performance and scalable computing [2] - Keynote speeches and technical presentations showcased cutting-edge research and developments, including the challenges of system fragmentation and the need for collaborative design between hardware and software [3] Group 2 - Roundtable discussions addressed the integration of HPC and AI, with experts sharing differing views on the future direction of computing architectures and the role of AI in scientific programming [4] - The pursuit of Zeta Scale computing was discussed, with experts identifying system reliability and power consumption as core obstacles to scaling [4] - The symposium provided a platform for global experts to share insights and reach consensus, which will significantly advance the integration of HPC and AI, addressing future challenges and opportunities in the computing field [4]