0.3B，谷歌开源新模型，手机断网也能跑，0.2GB内存就够用

Core Insights - Google has launched a new open-source embedding model called EmbeddingGemma, designed for edge AI applications with 308 million parameters, enabling deployment on devices like laptops and smartphones for retrieval-augmented generation (RAG) and semantic search [2][3] Group 1: Model Features - EmbeddingGemma ranks highest among open multilingual text embedding models under 500 million parameters on the MTEB benchmark, trained on over 100 languages and optimized to run on less than 200MB of memory [3][5] - The model is designed for flexible offline work, providing customizable output sizes and a 2K token context window, making it suitable for everyday devices [5][13] - It integrates seamlessly with popular tools such as sentence-transformers, MLX, and LangChain, facilitating user adoption [5][12] Group 2: Performance and Quality - EmbeddingGemma generates high-quality embedding vectors, crucial for accurate RAG processes, enhancing the retrieval of relevant context and the generation of contextually appropriate answers [6][9] - The model's performance in retrieval, classification, and clustering tasks surpasses that of similarly sized models, approaching the performance of larger models like Qwen-Embedding-0.6B [10][11] - It utilizes Matryoshka representation learning (MRL) to offer various embedding sizes, allowing developers to balance quality and speed [12] Group 3: Privacy and Efficiency - EmbeddingGemma operates effectively offline, ensuring user data privacy by generating document embeddings directly on device hardware [13] - The model's inference time on EdgeTPU is under 15ms for 256 input tokens, enabling real-time responses and smooth interactions [12][13] - It supports new functionalities such as offline searches across personal files and personalized chatbots, enhancing user experience [13][15] Group 4: Conclusion - The introduction of EmbeddingGemma signifies a breakthrough in miniaturization, multilingual capabilities, and edge AI, potentially becoming a cornerstone for the proliferation of intelligent applications on personal devices [15]