0.3B参数，600MB内存！腾讯混元实现产业级2Bit量化，端侧模型小如手机App

Core Viewpoint - Tencent Hunyuan has launched a new ultra-small model, HY-1.8B-2Bit, designed for consumer-grade hardware, which is significantly smaller than many common mobile applications, making it suitable for edge deployment [2][13]. Group 1: Model Specifications and Performance - The HY-1.8B-2Bit model has a parameter count of only 0.3 billion and a memory footprint of just 600MB, making it ideal for deployment on edge devices [1][13]. - The model utilizes a unique 2-bit quantization scheme, which reduces the parameter count by six times compared to the original model while maintaining its full cognitive capabilities [2][6]. - Compared to the original precision model, the generation speed of HY-1.8B-2Bit is enhanced by 2-3 times on real edge devices, significantly improving user experience [2][6][13]. Group 2: Quantization Techniques - The model employs Quantization Aware Training (QAT) to mitigate the precision loss typically associated with 2-bit quantization, allowing it to approach the performance of full-precision models [6][11]. - The "Elastic Stretch Quantization" (SEQ) strategy is introduced to address the challenges of low precision, enhancing the model's ability to capture high-dimensional feature distributions [9][11]. - Data optimization strategies have been implemented, increasing the proportion of scientific data and incorporating long-text data to improve the model's overall capabilities [8][7]. Group 3: Training and Deployment - The training process for HY-1.8B-2Bit was optimized to require only 10% of the tokens needed for training the Bitnet-2B model, demonstrating efficiency in achieving low-bit model performance [12][11]. - The model is compatible with Arm computing platforms and has been tested on devices like the MacBook M4 and Dimensity 9500, showing significant acceleration in both latency and generation speed compared to original models [13][14]. - Future developments will focus on reinforcement learning and model distillation to further enhance the capabilities of low-bit quantized models, aiming to bridge the performance gap with full-precision models [15].