Workflow
软硬件协同
icon
Search documents
Deepseek V3.1的UE8M0 FP8和英伟达的FP8格式有什么区别
傅里叶的猫· 2025-08-24 12:31
Core Viewpoint - The introduction of UE8M0 FP8 by Deepseek for the upcoming domestic chips signifies a strategic move to enhance compatibility and efficiency in the Chinese AI ecosystem, addressing the unique requirements of domestic hardware [5][10][12]. Group 1: UE8M0 and FP8 Concept - FP8 is an 8-bit floating-point format that significantly reduces memory usage by 75% compared to 32-bit formats, enhancing computational speed and efficiency for large model training and inference [7][13]. - UE8M0 is a specific encoding format for FP8 tensor data, designed to optimize compatibility with domestic chips, differing from Nvidia's E4M3 and E5M2 formats which focus on precision and dynamic range [9][10]. - The Open Compute Project (OCP) introduced UE8M0 as part of its MXFP8 formats, aiming to standardize FP8 usage across various hardware platforms [8]. Group 2: Strategic Importance of UE8M0 - The development of UE8M0 is crucial for ensuring that domestic chips can effectively utilize FP8 without relying on foreign standards, thus reducing dependency on Nvidia's technology [12]. - Deepseek's integration of UE8M0 into its model development process aims to ensure that models can run stably on upcoming domestic chips, facilitating a smoother transition from development to deployment [11][12]. - The focus of UE8M0 is not to outperform foreign FP8 standards but to provide a viable solution that allows domestic chips to leverage FP8 efficiency [14]. Group 3: Performance and Limitations - UE8M0 can save approximately 75% in memory usage compared to FP32, allowing for larger models or increased request handling during inference [13]. - The inference throughput using UE8M0 can be about twice that of BF16, making it particularly beneficial for large-scale AI applications [13]. - However, UE8M0 is not a one-size-fits-all solution; certain calculations still require higher precision formats like BF16 or FP16, and effective calibration is necessary to avoid errors in extreme value scenarios [15].
抖音全新推荐大模型RankMixer,参数翻70倍,推理成本不涨
量子位· 2025-08-01 09:05
Core Viewpoint - The article discusses the innovative recommendation algorithm architecture, RankMixer, developed by ByteDance, which significantly enhances the efficiency and effectiveness of video recommendations on platforms like Douyin while maintaining low inference costs [2][40]. Group 1: RankMixer Model Overview - RankMixer represents a new recommendation model architecture that increases the parameter scale from tens of millions (16M) to billions (1B), enhancing model performance without increasing inference latency [4][26]. - The model design focuses on aligning with GPU hardware characteristics, allowing for efficient computation through large matrix multiplications, thus overcoming memory bottlenecks [9][41]. - RankMixer incorporates innovative features such as TokenMixing and Per-Token SparseMoE, which improve the model's ability to capture diverse feature interactions and enhance parameter efficiency [12][24]. Group 2: Performance Metrics and Improvements - In the Douyin recommendation scenario, the RankMixer-1B model has shown a cumulative increase of over 0.3% in user active days and more than 1% in average daily usage time, indicating improved user engagement [4][35]. - The model's efficiency is highlighted by a 70-fold increase in parameters while keeping the inference cost stable, achieved through various optimization techniques [26][30]. - Offline metrics show that RankMixer-1B outperforms traditional DNN models, with an AUC increase of over 0.9% and UAUC improvement exceeding 1% [32]. Group 3: Technical Innovations - RankMixer employs Automatic Feature Tokenization to align input features into a uniform token sequence, facilitating parallel processing and maximizing hardware utilization [15][16]. - The TokenMixing module allows for efficient information exchange between tokens, enhancing the model's ability to leverage global information for better recommendations [19][20]. - The Per-Token SparseMoE architecture enables differentiated modeling of semantic subspaces, significantly increasing parameter capacity while reducing computational overhead [21][24]. Group 4: Future Implications - The successful implementation of RankMixer across various ByteDance applications demonstrates its potential as a universal ranking model architecture [39]. - The exploration of RankMixer validates the importance of co-designing algorithms with infrastructure to optimize machine learning performance and resource utilization [43][44].
库克“发飙了”,iPhone16 Pro系列售价很给力,人人买得起了
Sou Hu Cai Jing· 2025-07-17 05:41
Core Viewpoint - The competitive landscape in the smartphone market is shifting, with domestic brands like Huawei, Xiaomi, OPPO, and vivo gaining traction, while Apple faces declining sales due to high prices and perceived lack of innovation. However, Apple has responded with significant price reductions for the iPhone 16 Pro series, making them more accessible to consumers [1][11]. Group 1: Price Adjustments - The iPhone 16 Pro Max saw a price drop of over 2300 yuan, while the iPhone 16 Pro dropped by 2100 yuan, making these previously high-priced models more appealing to consumers [3][11]. - This price reduction is viewed as a strategic adjustment by Apple in response to competitive pressure from domestic brands, indicating a shift in their pricing strategy [11]. Group 2: Product Features - The iPhone 16 Pro Max features a 6.9-inch display, marking Apple's first increase in screen size in recent years, enhancing the immersive experience for users [4]. - The device is equipped with a 4676mAh battery, providing extended usage time of up to one and a half days [4]. - The A18 Pro processor offers improved AI processing, energy efficiency, and image processing capabilities, ensuring stable performance over time [6]. - The camera system has been upgraded, with the ultra-wide lens increasing from 12 million pixels to 48 million pixels, enhancing image quality and low-light performance [6]. Group 3: User Experience Enhancements - The iPhone 16 Pro features a slightly larger 6.3-inch screen while maintaining a compact design, although it has a smaller battery capacity of 3650mAh, which may require users to carry a power bank for heavy usage [7]. - A new independent camera control button has been added to the iPhone 16 Pro series, improving the photography experience, especially for casual and professional users [9]. - The side frame material has been upgraded to matte titanium, enhancing durability and aesthetics while reducing fingerprints [9].
首设工业软件喻家山论坛,第三届软件创新发展大会即将开幕
Di Yi Cai Jing· 2025-06-05 05:37
Group 1 - The core objective of the conference is to achieve an overall software industry scale exceeding 800 billion yuan in Wuhan by 2030 [1][2] - The conference will focus on key software innovations, AI empowerment, open-source ecosystem cultivation, and the integration of software and hardware [1] - A new industrial software forum will be established to showcase the innovation capabilities of Huazhong University of Science and Technology and to create a new hub for industrial software transformation [2] Group 2 - The conference aims to enhance collaboration among six provincial capitals in Central China, focusing on high-end sectors such as aerospace, shipbuilding, and automotive [1] - Wuhan plans to introduce industry-specific open-source operating systems for sectors like healthcare, education, and automotive [2] - The conference will also promote the integration of artificial intelligence, blockchain, and the metaverse with the real economy [2]