Workflow
Device-side Lightweight Model
icon
Search documents
手机能跑的3B推理模型开源,比Qwen 3-4B还快,超长上下文不降速
3 6 Ke· 2025-10-09 10:48
Core Insights - AI21 Labs, an Israeli AI startup, has open-sourced its lightweight reasoning model, Jamba Reasoning 3B, which outperforms leading models like Google's Gemma 3-4B and Qwen 3-4B [1][2] Performance Metrics - Jamba Reasoning 3B has 30 billion parameters and can run on various devices, achieving a performance efficiency increase of 2-5 times compared to competitors [1][3] - In benchmark tests, Jamba Reasoning 3B scored 61% on MMLU-Pro, 6% on Humanity's Last Exam, and 52% on IFBench, surpassing Qwen 3-4B and other models [2][6] Technical Advantages - The model utilizes a hybrid SSM-Transformer architecture, allowing it to handle longer context lengths of up to 1 million tokens without significant performance degradation [3][6] - Jamba Reasoning 3B maintains low memory usage with an 8x smaller key-value cache compared to the original Transformer architecture, generating 40 tokens per second on an M3 MacBook Pro [8][11] Applications and Use Cases - The model is designed for secure device-side applications, allowing users to customize it with their own files and operate offline [8][12] - It supports multiple languages, including English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, and Hebrew [11] Industry Implications - The emergence of lightweight models like Jamba Reasoning 3B addresses the economic inefficiencies of cloud-based large language models, with studies suggesting that 40%-70% of AI tasks can be handled by smaller models [12] - This shift towards decentralized AI could enhance real-time applications in manufacturing and healthcare, providing low-latency solutions and improved data privacy [12]