Core Insights - GSI Technology announced preliminary benchmark results for its Gemini-II Compute-in-Memory processor, achieving a time-to-first-token (TTFT) of 3 seconds for multimodal large language models at the edge, with a power consumption of approximately 30 watts [1][2]. Performance Metrics - The Gemini-II processor demonstrated a TTFT of 3 seconds, which is the lowest reported for a multimodal 12B model on an embedded edge processor [2]. - Competitive platforms reported TTFTs of approximately 12 seconds on Qualcomm Snapdragon X Elite at 30W and 3 seconds on NVIDIA Jetson Thor at over 100W, indicating that Gemini-II offers superior performance at lower power levels [3]. Market Implications - The performance profile of Gemini-II is well-suited for "physical AI" markets, including drones and smart city applications, where power and thermal constraints are critical [4]. - The shift from cloud-assisted models to local inference in edge physical AI is expected to enhance latency, reliability, and operational efficiency [5]. Development and Collaboration - GSI's engineering team is focused on optimizing the responsiveness of the Gemini-II processor while collaborating with partners like G2 Tech for system integration and proof-of-concept activities [6].
GSI Technology Reports 3-Second Time-to-First-Token for Edge Multimodal LLM Inference on Gemini-II