Gemma3

Search documents
2G 内存跑 Gemma 3n 完整版!全球首个 10B 内模型杀疯 LMArena:1300 分碾压记录
AI前线· 2025-06-27 04:58
整理 | 褚杏娟 当地时间 6 月 26 日,在上个月的 Google I/O 上首次亮相预览后,谷歌如今正式发布了 Gemma 3n 完整版,可以直接在本地硬件上运行。 "迫不及待地想看看这些 Android 的性能!"正式发布后有开发者说道。 Gemma 系列是谷歌推出的一组开源大模型。与 Gemini 不同:Gemma 面向开发者,可供下载和修 改,而 Gemini 是谷歌的封闭专有模型,更注重性能与商业化。 据悉,此次正是发布的 Gemma 3n 现已具备输入图像、音频和视频的能力,支持文本输出,还能在 最低 2GB 内存的设备上运行,在编程与推理等任务上据称表现更佳。具体看,主要更新亮点包括: 至于基准测试,Gemma 3n 的 E4B 模型成为首个在参数规模低于 10 B 的前提下,LMArena 测评得 分突破 1300 的模型,表现优于 Llama 4 Maverick 17 B、GPT 4.1-nano、Phi-4。 效果好不好? 天生多模态设计:原生支持图像、音频、视频和文本的输入,以及文本输出。 端侧优化设计:Gemma 3n 着眼于运行效率,提供两种基于"有效参数"的尺寸:E2B 和 ...
首创像素空间推理,7B模型领先GPT-4o,让VLM能像人类一样「眼脑并用」
量子位· 2025-06-09 09:27
Core Viewpoint - The article discusses the transition of Visual Language Models (VLM) from "perception" to "cognition," highlighting the introduction of "Pixel-Space Reasoning" which allows models to interact with visual information directly at the pixel level, enhancing their understanding and reasoning capabilities [1][2][3]. Group 1: Key Developments in VLM - The current mainstream VLMs are limited by their reliance on text tokens, which can lead to loss of critical information in high-resolution images and dynamic video scenes [2][4]. - "Pixel-Space Reasoning" enables models to perform visual operations directly, allowing for a more human-like interaction with visual data [3][6]. - This new reasoning paradigm shifts the focus from text-mediated understanding to native visual operations, enhancing the model's ability to capture spatial relationships and dynamic details [6][7]. Group 2: Overcoming Learning Challenges - The research team identified a "cognitive inertia" challenge where the model's established text reasoning capabilities hinder the development of new pixel operation skills, creating a "learning trap" [8][9]. - To address this, a reinforcement learning framework was designed that combines intrinsic curiosity incentives with extrinsic correctness rewards, encouraging the model to explore visual operations [9][12]. - The framework includes constraints to ensure a minimum rate of pixel-space reasoning and to balance exploration with computational efficiency [10][11]. Group 3: Performance Validation - The Pixel-Reasoner, based on the Qwen2.5-VL-7B model, achieved impressive results across four visual reasoning benchmarks, outperforming models like GPT-4o and Gemini-2.5-Pro [13][19]. - Specifically, it achieved an accuracy of 84.3% on the V* Bench, significantly higher than its competitors [13]. - The model demonstrated a 73.8% accuracy on TallyQA-Complex, showcasing its ability to differentiate between similar objects in images [19][20]. Group 4: Future Implications - The research indicates that pixel-space reasoning is not a replacement for text reasoning but rather a complementary pathway for VLMs, enabling a dual-track understanding of the world [21]. - As multi-modal reasoning capabilities evolve, the industry is moving towards a future where machines can "see more clearly and think more deeply" [21].
三星芯片,大搞AI
半导体芯闻· 2025-05-09 11:08
但由于DS Assistant的运作方式较为封闭,在利用外部数据方面存在局限性。三星电子内部有声音 表示,"我们需要接受外部AI并使用最好的工具",因为它无法用于加强半导体设计等业务竞争 力。 三星电子在内部运行外部AI模型(内部部署)。这是因为只需在工作场所内安装数据服务器即可 在公司内部运行AI,因此不必担心内部信息泄露到外部。三星电子相关人士表示,"我们计划按工 作类型审查并支持有助于提高工作效率的开源AI模型。" 参考链接 如果您希望可以时常见面,欢迎标星收藏哦~ 来源:内容 编译自 chosun ,谢谢 。 据韩媒报道,三星电子DS(半导体)部门继Meta之后,又引入了谷歌和微软(MS)的模型,作 为 内 部 工 作 辅 助 的 人 工 智 能 ( AI ) 。 这 似 乎 是 试 图 打 破 以 自 主 技 术 构 建 的 内 部 AI"DS Assistant"为中心的封闭运营政策,将AI的应用扩展到半导体设计和开发,以提高工作效率。 据业内人士9日透露,三星电子DS部门近日通过内部通知宣布,"我们将把内部AI作为开放的多模 型环境来运营"。这使得在业务安全至关重要的 DS 领域引入外部开源 A ...