Workflow
小语言模型
icon
Search documents
NeurIPS 2025 | 英伟达发布Nemotron-Flash:以GPU延迟为核心重塑小模型架构
机器之心· 2025-12-01 00:40
Core Insights - The article discusses the limitations of small language models (SLMs) in terms of speed and performance, revealing that smaller models do not necessarily lead to lower latency or higher throughput when deployed on GPUs [2][9][10] - NVIDIA's Nemotron-Flash model addresses these issues by prioritizing real GPU latency in its design, achieving state-of-the-art accuracy while maintaining low latency and high throughput [2][21] Group 1: Reasons for Slow Performance of Small Models - Small models are often deep and narrow, which increases latency due to frequent kernel scheduling on GPUs, contradicting the expectation that smaller models would be faster [9] - The attention mechanism remains a significant bottleneck for achieving high throughput, with a lack of systematic methods to determine the optimal use of attention versus linear attention in model layers [10] - Training of small models often leads to premature stagnation, where weight scaling and effective gradient descent hinder performance, limiting the model's capacity to improve [10][11] Group 2: Core Methodology of Nemotron-Flash - The model optimizes the depth-width ratio, balancing the need for depth to maintain expressiveness and width to reduce latency, identifying a "golden point" for optimal structure [14] - It employs a mixed operator structure that defines clear roles for different operators, enhancing collaboration between them rather than simply replacing one with another [16] - Weight normalization is applied during training to prevent the formation of structured outliers in weight matrices, allowing for sustained learning and improved convergence quality [20] Group 3: Performance of Nemotron-Flash - The Nemotron-Flash-1B model shows a 5.5% accuracy improvement over Qwen3-0.6B, with a 1.9× faster inference latency and a maximum throughput increase of 45.6× [24] - The Nemotron-Flash-3B model achieves accuracy improvements of 2% to 5.5% compared to Qwen2.5-3B and Qwen3-1.7B, with latency reductions of 1.3× to 1.7× and throughput enhancements of 6.4× to 18.7× [24] - The model's design enables scalable deployment in various applications, providing reliable and low-latency experiences in high-demand scenarios such as online services and edge devices [25] Conclusion - The future of small models lies not in being smaller but in being faster, more stable, and stronger, with Nemotron-Flash offering a new foundational logic for small model design [27]
“小而美”语言模型正崛起
Huan Qiu Wang Zi Xun· 2025-09-11 02:10
Core Insights - The belief in large language models (LLMs) is declining as the industry shifts focus towards smaller, more tailored models that meet specific business needs [1][2] - The latest release of ChatGPT-5 has not generated as much excitement as the iPhone 17, indicating a potential stagnation in LLM advancements [1] - Companies are increasingly favoring small language models (SLMs) due to their cost-effectiveness and efficiency in specific applications, such as human resource management [1][2] Group 1 - The comparison of LLMs to early smartphones highlights that while initial releases were revolutionary, current iterations resemble minor upgrades [1] - SLMs are gaining traction in enterprises as they are easier to deploy and less costly, making them more appealing for specific tasks [1][2] - The rise of SLMs is driven by the need for models that can operate efficiently within existing IT systems and devices sensitive to energy consumption [1][2] Group 2 - There is no clear definition of "small language models," but they typically have fewer training parameters compared to LLMs, with some models having as few as 100 million parameters [2] - The demand for SLMs is expected to grow at twice the rate of LLMs this year, driven by user fatigue with LLM issues like "AI hallucinations" [2] - SLMs can perform standardized tasks without the resource demands of LLMs, making them a more economical choice for businesses [2] Group 3 - SLMs are positioned to become central to "agent-based AI," allowing for cost-effective task completion and modular combinations of specialized models [3] - While LLMs will continue to dominate consumer applications, SLMs are likely to be more prevalent in enterprise and device-level AI solutions [3] - OpenAI is also utilizing models of varying sizes internally to allocate resources based on task complexity [3]
英伟达最新研究:小模型才是智能体的未来
3 6 Ke· 2025-08-05 09:45
Core Viewpoint - Small Language Models (SLMs) are considered the future of AI agents, as they are more efficient and cost-effective compared to large language models (LLMs) [1][3]. Group 1: Advantages of SLMs - SLMs are powerful enough to handle most repetitive and specialized tasks within AI agents [3]. - They are inherently better suited for the architecture of agent systems, being flexible and easy to integrate [3]. - Economically, SLMs significantly reduce operational costs, making them a more efficient choice for AI applications [3]. Group 2: Market Potential - The AI agent market is projected to grow from $5.2 billion in 2024 to $200 billion by 2034, with over half of enterprises already utilizing AI agents [5]. - Current AI agent tasks are often repetitive, such as "checking emails" and "generating reports," making the use of LLMs inefficient [5]. Group 3: SLM Characteristics - SLMs can be deployed on standard consumer devices, such as smartphones and laptops, and have fast inference speeds [9]. - Models with fewer than 1 billion parameters are classified as SLMs, while larger models typically require cloud support [9]. - SLMs are likened to a "portable brain," balancing efficiency and ease of iteration, unlike LLMs which are compared to "universe-level supercomputers" with high latency and costs [9]. Group 4: Performance Comparison - Cutting-edge small models like Phi-3 and Hymba can perform tasks comparable to 30B to 70B large models while reducing computational load by 10-30 times [11]. - Real-world tests showed that 60% of tasks in MetaGPT, 40% in Open Operator, and 70% in Cradle could be replaced by SLMs [11]. Group 5: Barriers to Adoption - The primary reason for the limited use of SLMs is path dependency, with significant investments (up to $57 billion) in centralized large model infrastructure [12]. - There is a strong industry bias towards the belief that "bigger is better," which has hindered the exploration of small models [12]. - SLMs lack the marketing hype that large models like GPT-4 have received, leading to fewer attempts to explore more cost-effective options [13].
2025年AI在多个方面持续取得显著进展和突破
Sou Hu Cai Jing· 2025-06-23 07:19
Group 1 - In 2025, multimodal AI is a key trend, capable of processing and integrating various forms of input such as text, images, audio, and video, exemplified by OpenAI's GPT-4 and Google's Gemini model [1] - AI agents are evolving from simple chatbots to more intelligent assistants with contextual awareness, transforming customer service and user interaction across platforms [3] - The rapid development and adoption of small language models (SLMs) in 2025 offer significant advantages over large language models (LLMs), including lower development costs and improved user experience [3] Group 2 - AI for Science (AI4S) is becoming a crucial force in transforming scientific research paradigms, with multimodal large models aiding in the analysis of complex multidimensional data [4] - The rapid advancement of AI brings new risks related to security, governance, copyright, and ethics, prompting global efforts to strengthen AI governance through policy and technical standards [4] - 2025 is anticipated to be the "year of embodied intelligence," with significant developments in the industry and technology, including the potential mass production of humanoid robots like Tesla's Optimus [4]
英伟达揭示RL Scaling魔力!训练步数翻倍=推理能力质变,小模型突破推理极限
机器之心· 2025-06-04 04:41
Core Insights - The article discusses the potential of Prolonged Reinforcement Learning (ProRL) in enhancing reasoning capabilities in language models, suggesting that it can lead to significant improvements in model performance rather than merely optimizing existing knowledge retrieval [1][15]. Group 1: ProRL Framework - ProRL framework significantly increases the training steps from hundreds to over 2000, unlocking the hidden potential of smaller models [3]. - The framework incorporates a diverse set of verifiable rewards from various domains, providing reliable supervision signals for RL training [5]. - The combination of GRPO and DAPO algorithms enhances training efficiency by avoiding policy update imbalances and filtering ineffective samples [7]. Group 2: Performance Improvements - The Nemotron-Research-Reasoning-Qwen-1.5B model demonstrates remarkable performance across various tasks, outperforming larger models in specific areas [9][10]. - ProRL leads to a 14.7% improvement in mathematical tasks, surpassing 7B models, and a 6.5% lead in code generation over DeepCoder-1.5B [12]. - In logical reasoning, accuracy improves by 54.8%, showcasing the model's enhanced capabilities [12][13]. Group 3: Creativity and Reasoning Expansion - ProRL enables models to solve problems that base models could not, achieving a pass@k of 100% in previously unsolvable tasks [13]. - The training process fosters creativity, allowing models to generate new problem-solving paths rather than relying on rote answers [6][14]. - The longer the training, the stronger the model's ability to deviate from pre-training data, resulting in richer and more creative reasoning strategies [14]. Group 4: Future Implications - The research indicates that ProRL could be the key to developing small language models with strong reasoning capabilities, low deployment costs, and high generalization abilities [16][17].
智能体引领下一波AI浪潮 联发科“兵分三路”布局
Core Insights - The AI industry is experiencing rapid growth, with a new wave of intelligent AI experiences emerging, particularly in mobile chip manufacturing [1] - MediaTek is focusing on three main areas: chip development, development tools, and ecosystem building to leverage the opportunities presented by intelligent AI [1] Chip Development - MediaTek launched the Dimensity 9400+ flagship 5G mobile chip, featuring a second-generation all-large core architecture and enhanced AI capabilities [1] - The Dimensity 9400+ integrates MediaTek's eighth-generation AI processor NPU 890, supporting the DeepSeek-R1 inference model and enhanced decoding technology (SpD+), improving inference speed for intelligent AI tasks by 20% [1][2] Development Tools - The Dimensity AI Developer Suite 2.0 supports four key technologies: Mixture of Experts (MoE), Multi-Token Prediction (MTP), Multi-Head Latent Attention (MLA), and FP8 Inferencing, doubling token generation speed and reducing memory bandwidth usage by 50% [2] Ecosystem Collaboration - MediaTek has initiated the "Dimensity Intelligent Experience Leadership Program" in collaboration with major companies like Alibaba Cloud, Motorola, OPPO, and Xiaomi to enhance the AI ecosystem [2] Financial Performance - MediaTek's revenue for 2024 is projected to reach NT$530.586 billion, a year-on-year increase of 22.4%, with a consolidated gross margin of 49.6% [2] - The revenue from the Dimensity flagship chip business exceeded expectations, reaching $2 billion, and the ASIC business is expected to surpass $1 billion in revenue by 2026 due to AI demand [2] Industry Trends - The focus in AI development is shifting from large-scale parameters to efficiency, with smaller language models gaining attention for their ability to perform complex tasks without extensive computational resources [3] - The mobile chip industry is evolving towards heterogeneous computing, energy efficiency optimization, and multi-task integration, with AI models being trained and inferred on the device side to meet local computing, data privacy, and energy efficiency requirements [5]