Workflow
软硬件协同设计
icon
Search documents
人工智能开始革命这类芯片
半导体行业观察· 2026-03-01 03:13
Core Insights - The article discusses the increasing role of artificial intelligence (AI) in the design and management of programmable logic, particularly in simplifying and accelerating certain aspects of the design process [2] - Despite the efficiency of FPGAs and DSPs being lower than fixed architecture chips, they remain valuable in rapidly changing markets such as life sciences, AI processing, automotive electronics, and 5G/6G chips [2] - The programmability of FPGAs provides a future-proof solution for new protocols, standards, and architectural modifications, likened to a blank canvas for loading any workload [2] Group 1: AI and FPGA Design - AI is expected to accelerate FPGA design, although it may not fully assist users in completing FPGA programming [5] - Current AI capabilities in generating RTL code from high-level code or natural language are still limited, but there is potential for innovation in this area [5][6] - The introduction of high-level synthesis technologies has made FPGA programming simpler, allowing engineering teams to convert algorithms or C code into RTL [6][8] Group 2: Challenges in FPGA Programming - The complexity and time-consuming nature of FPGA design remain significant challenges, requiring specialized knowledge in RTL design [2][6] - Users transitioning to AI-enhanced FPGA design face challenges, particularly in integrating hardware design with software algorithms [6][8] - The need for experienced hardware designers is critical as the integration of algorithms into FPGAs becomes more prevalent [6] Group 3: Software and Compiler Development - The demand for intelligent compilers that can optimize RTL code generation from high-level languages is increasing, but such tools are still scarce [6][12] - The industry is shifting towards software-driven design, with a focus on flexible and scalable embedded memory solutions to support unique AI algorithms [18] - The evolution of AI models necessitates a balance between programmability, efficiency, and flexibility in FPGA and AI system design [11][19] Group 4: Future of Programmable Logic - The future of FPGA applications will be determined by technical architects who decide which parts are best suited for FPGA implementation versus other chip types [19] - Key advantages of FPGAs include I/O flexibility, deterministic low latency, and the ability to integrate various uncontrollable workloads [19] - The overall cost of ownership and the ability to adapt to market demands will be crucial in determining the success of FPGA implementations [19]
DeepSeek-V4大模型发布在即,野村研报看好:将有效打破“芯片墙”与“内存墙”
Zhi Tong Cai Jing· 2026-02-12 14:00
Core Insights - The article highlights the emergence of various applications from leading domestic AI companies, showcasing the maturity of Chinese large models and the upcoming release of DeepSeek's flagship language model V4, which is expected to accelerate innovation in the Chinese AI industry and narrow the gap with global counterparts [1][8]. Group 1: Technical Innovations - DeepSeek's DS-V4 integrates two core technologies, mHC and Engram, which address key bottlenecks in large model development by enhancing inter-layer information flow and optimizing memory efficiency, marking a shift from scale competition to architecture and system optimization [2][7]. - The mHC mechanism restructures inter-layer information flow by introducing strict mathematical constraints to avoid signal amplification and training failures, significantly improving training efficiency and stability [3][4]. - Engram focuses on decoupling memory and computation to alleviate the "memory wall" issue in large models, enhancing memory efficiency during training and inference, which is crucial for addressing hardware limitations in the Chinese AI industry [5][6]. Group 2: Industry Impact - DS-V4 is expected to play a pivotal role in driving the commercialization of large models globally, while also serving as a key enabler for the Chinese AI industry to overcome hardware bottlenecks and accelerate the entire industry chain's upgrade [8][10]. - The model's efficiency improvements will help alleviate capital expenditure pressures for global enterprises investing in AI infrastructure, facilitating faster technology deployment and integration into various applications [9][10]. - In the Chinese market, DS-V4's innovations will support local hardware development and enhance the capabilities of AI applications, transitioning AI agents from simple tools to intelligent assistants [10][12]. Group 3: Trends in the AI Ecosystem - The evolution from V3/R1 to V4 reflects a significant trend in the global large model industry, where performance enhancement is shifting from parameter accumulation to architectural design and system optimization, creating opportunities for China to close the gap with global leaders [13][14]. - The open-source large model market in China is expected to thrive, with DeepSeek's innovations setting benchmarks for local enterprises, allowing them to transition from following to competing and potentially leading in the field [13][14]. - The launch of DS-V4 is anticipated to accelerate the commercialization cycle of AI applications in China, benefiting software companies that leverage large model technologies for product upgrades [12][14].
理想CTO谢炎在云栖大会分享理想自动驾驶芯片设计思路
理想TOP2· 2025-09-27 08:58
Core Viewpoint - The article discusses the evolution of intelligent driving algorithms and the importance of data flow architecture in the context of autonomous driving technology, emphasizing the need for advanced computational architectures to handle increasing demands for processing power and reasoning capabilities. Group 1: Evolution of Intelligent Driving Algorithms - The evolution of autonomous driving algorithms can be divided into three phases: the initial phase relied on rule-based algorithms, the second phase shifted towards end-to-end (E2E) learning, and the current phase is focusing on integrating visual language models (VLM) with reinforcement learning (RL) to enhance decision-making capabilities [4][5][6]. Group 2: Importance of Language Models - Language models are deemed essential for achieving long reasoning capabilities in autonomous driving, as they enable the system to generalize and handle corner cases that cannot be addressed solely through data collection or world models [7][8]. - The psychological aspect of having a driving model that aligns with human values and reasoning is highlighted, suggesting that language models can help instill a human-like worldview in autonomous systems [8][9]. Group 3: Computational Architecture - The article critiques the traditional von Neumann architecture, which prioritizes computation over data, and proposes a shift towards data-driven computation to better handle the complexities of AI processing [12][13]. - The company has developed a unique NPU architecture that focuses on data flow rather than traditional SOC designs, aiming to improve efficiency and performance in AI inference tasks [17][18]. Group 4: Performance Metrics - The performance of the company's NPU architecture is reported to be significantly higher than existing solutions, achieving up to 4.4 times the performance in CNN tasks and 2 to 3 times in LlaMA2 7B tasks, while maintaining similar transistor counts [2][18].
理想自动驾驶芯片最核心的是数据流架构与软硬件协同设计
理想TOP2· 2025-09-05 04:56
Core Viewpoint - The article discusses the advancements in Li Auto's self-developed chip architecture, particularly focusing on the VLA architecture and its implications for autonomous driving capabilities [1][2]. Group 1: Chip Development and Architecture - Li Auto's self-developed chip is designed with a data flow architecture that emphasizes hardware-software co-design, making it suitable for running large neural networks efficiently [5][9]. - The chip is expected to achieve 2x performance compared to leading chips when running large language models like GPT and 3x for vision models like CNN [5][8]. - The development timeline from project initiation to vehicle deployment is approximately three years, indicating a rapid pace compared to similar projects [5][8]. Group 2: Challenges and Innovations - Achieving real-time inference on the vehicle's chip is a significant challenge, with efforts focused on optimizing performance through various engineering techniques [3][4]. - Li Auto is implementing innovative parallel decoding methods to enhance the efficiency of action token inference, which is crucial for autonomous driving [4]. - The integration of CPU, GPU, and NPU in the Thor chip aims to improve versatility and performance in processing large amounts of data, which is essential for autonomous driving applications [3][6]. Group 3: Future Outlook - The company expresses strong confidence in its innovative architecture and full-stack development capabilities, which are expected to become key differentiators in the future [7][10]. - The relationship between increased computing power and improved performance in advanced driver-assistance systems (ADAS) is highlighted, suggesting a predictable enhancement in capabilities as technology evolves [6][9].
沉寂一个月,openPangu性能飙升8%!华为1B开源模型来了
机器之心· 2025-09-05 04:31
Core Viewpoint - Huawei's Pangu Embedded-1B model represents a significant advancement in edge AI, enabling powerful AI capabilities on resource-constrained devices, thus paving the way for intelligent upgrades in various industries [1][5]. Group 1: Model Performance and Efficiency - The openPangu Embedded-1B model, with 1 billion parameters, achieves a new state-of-the-art (SOTA) record in performance and efficiency, demonstrating that smaller models can deliver substantial capabilities [2][3]. - The model's overall average score reached 63.90, surpassing similar models and matching larger models like Qwen3-1.7B, showcasing its parameter efficiency [3][4]. - In mathematical reasoning, the model scored 82.76% on the GSM8K benchmark and 81.83% on the MATH dataset, significantly outperforming its peers [3][4]. Group 2: Technical Innovations - The model employs a soft-hardware collaborative design, optimizing its architecture to align with the characteristics of Ascend hardware, ensuring efficient resource utilization [9][10]. - A two-stage curriculum learning approach is utilized to enhance the model's reasoning capabilities, simulating a human-like learning process [15][16]. - The introduction of offline On-Policy knowledge distillation allows for a more flexible and effective training process, improving the model's accuracy and generalization [18][19]. Group 3: Reinforcement Learning and Future Directions - The model incorporates a multi-source reward reinforcement learning mechanism, enhancing its performance through targeted feedback based on task complexity [22][25]. - Future developments aim to integrate fast and slow thinking processes within a single model, allowing for adaptive responses based on problem difficulty, thus improving both speed and accuracy [29][30].
CoDesign 2025国际研讨会在大阪召开 共探高性能计算与AI融合新路径
Cai Jing Wang· 2025-07-18 04:22
Group 1 - The CoDesign 2025 International Symposium was successfully held in Osaka, Japan, focusing on the challenges of large-scale computing and big data, emphasizing the importance of hardware-software co-design for the development of high-performance computing (HPC) [1] - The conference highlighted four core areas: algorithms, application systems, system software and middleware, and hardware-software co-design architecture, covering key fields of high-performance and scalable computing [2] - Keynote speeches and technical presentations showcased cutting-edge research and developments, including the challenges of system fragmentation and the need for collaborative design between hardware and software [3] Group 2 - Roundtable discussions addressed the integration of HPC and AI, with experts sharing differing views on the future direction of computing architectures and the role of AI in scientific programming [4] - The pursuit of Zeta Scale computing was discussed, with experts identifying system reliability and power consumption as core obstacles to scaling [4] - The symposium provided a platform for global experts to share insights and reach consensus, which will significantly advance the integration of HPC and AI, addressing future challenges and opportunities in the computing field [4]