Small Language Models
Search documents
A CPU-CENTRIC PERSPECTIVE ON AGENTIC AI
2026-01-22 02:43
Summary of Key Points from the Conference Call Industry and Company Overview - The discussion revolves around **Agentic AI** frameworks, which enhance traditional Large Language Models (LLMs) by integrating decision-making orchestrators and external tools, transforming them into autonomous problem solvers [2][4]. Core Insights and Arguments - **Agentic AI Workloads**: The paper profiles five representative agentic AI workloads: **Haystack RAG**, **Toolformer**, **ChemCrow**, **LangChain**, and **SWE-Agent**. These workloads are analyzed for latency, throughput, and energy metrics, highlighting the significant role of CPUs in these metrics compared to GPUs [3][10][20]. - **Latency Contributions**: Tool processing on CPUs can account for up to **90.6%** of total latency in agentic workloads, indicating a need for joint CPU-GPU optimization rather than focusing solely on GPU improvements [10][34]. - **Throughput Bottlenecks**: Throughput is bottlenecked by both CPU factors (coherence, synchronization, core over-subscription) and GPU factors (memory capacity and bandwidth). This dual limitation affects the performance of agentic AI systems [10][45]. - **Energy Consumption**: At large batch sizes, CPU dynamic energy consumption can reach up to **44%** of total dynamic energy, emphasizing the inefficiency of CPU parallelism compared to GPU [10][49]. Important but Overlooked Content - **Optimizations Proposed**: The paper introduces two key optimizations: 1. **CPU and GPU-Aware Micro-batching (CGAM)**: This method aims to improve performance by capping batch sizes and using micro-batching to optimize latency [11][50]. 2. **Mixed Agentic Workload Scheduling (MAWS)**: This approach adapts scheduling strategies for heterogeneous workloads, balancing CPU-heavy and LLM-heavy tasks to enhance overall efficiency [11][58]. - **Profiling Insights**: The profiling of agentic AI workloads reveals that tool processing, rather than LLM inference, is the primary contributor to latency, which is a critical insight for future optimizations [32][34]. - **Diverse Computational Patterns**: The selected workloads represent a variety of applications and computational strategies, showcasing the breadth of agentic AI systems and their real-world relevance [21][22]. Conclusion - The findings underscore the importance of a CPU-centric perspective in optimizing agentic AI frameworks, highlighting the need for comprehensive strategies that address both CPU and GPU limitations to enhance performance, efficiency, and scalability in AI applications [3][10][11].
KPMG and Uniphore form AI agent collaboration for regulated industries
Yahoo Finance· 2026-01-20 09:25
KPMG has entered into a strategic relationship with software company Uniphore to deploy AI agents powered by industry-specific small language models (SLMs), with a focus on regulated sectors including banking, insurance, energy and healthcare. Under the agreement, KPMG will use Uniphore’s Business AI Cloud as the platform for building and operationalising agentic AI and fine-tuned SLMs across both internal and client-facing workflows. The platform is built on a sovereign, composable and secure architect ...
Straker Limited (ASX: STG) Announces Extension and Expansion of IBM Partnership
Prnewswire· 2025-10-30 07:29
Core Insights - Straker Limited has renewed and expanded its strategic partnership with IBM for an additional three years, effective January 1, 2026, with a contract value of approximately NZ$28 million (US$16.1 million) over the initial term [2][3]. Agreement Details - The renewed agreement allows IBM to extend the contract for an additional year beyond the initial three years and is based on customer usage, which may lead to revenue fluctuations [2][3]. - The agreement maintains core terms from the previous contract but emphasizes deploying Straker's AI-driven solutions across IBM's global operations, where 10,000 users are already utilizing Straker's AI-driven Slack translation application [4]. Expanded Strategic Partnership - Straker is now recognized as part of the IBM Ecosystem Partner network, with the collaboration managed primarily through IBM Japan, enhancing Straker's integration within IBM's technology ecosystem [5]. - A significant focus of the partnership includes the joint development of customized small language models using IBM's watsonx AI technology and Straker's Tiri platform, which has shown promising early results [6][7]. CEO Commentary - The CEO of Straker highlighted that the renewal and expansion of the partnership with IBM validate the company's strategy and provide a strong foundation for future growth, emphasizing the transformation of translation services and broader enterprise AI opportunities [8].
X @Solana
Solana· 2025-10-14 19:04
RT Sam Hogan 🇺🇸 (@0xSamHogan)I'm excited to announce @inference_net's $11.8M Series Seed funding round, led by @multicoincap & @a16zcrypto CSX, with participation from @topology_vc, @fdotinc, and an incredible group of angels.The next wave of AI adoption will be driven by companies building AI natively into their products at scale.As scaling laws continue to demand larger models and more compute, margins become thin, and operating at scale becomes untenable.We're taking a different approach -- training task ...
X @The Economist
The Economist· 2025-09-14 14:40
Market Trends - Corporate demand for small language models is projected to grow twice as fast as it is for large models [1] - The growth of small language models is starting from a much lower base [1]