Claude Sonnet 4.5
Search documents
国家下场
小熊跑的快· 2025-12-23 00:57
Group 1 - The U.S. Department of Energy has launched a national AI "Genesis Project" in collaboration with major companies like OpenAI, Google, Microsoft, and NVIDIA, marking a strategic shift towards collective efforts in technology development [1] - The AI models and computing platforms will be applied to significant scientific research areas such as controlled nuclear fusion, energy material discovery, climate simulation, and quantum computing algorithms [1] - This initiative signifies a transition from individual efforts to a systematic approach in tackling major scientific challenges in the U.S. technology sector [1] Group 2 - The U.S. Department of Energy has previously been a major client for companies like AMD and NVIDIA, indicating strong ties between government projects and these tech firms [2] - NVIDIA has seen a rebound in its stock performance, while Tesla's robotaxi profitability logic is gaining recognition among overseas investment banks [3] - The total AI model performance metrics indicate a significant weekly pace of +819 billion, with the total reaching 5.16 trillion [5]
Claude 4.5 杀疯了,能一口气写出一万多行代码… | 极客时间
AI前线· 2025-12-22 05:01
前段时间,Anthropic 正式发布了 Claude Sonnet 4.5,对它的定位是"世界上最好的编码模型"和"构建复杂智能体的最强模型"。 凭什么这么说呢,Anthropic 在客户测试中观察到,Claude 4.5 能连续专注干活超过 30 小时,而上一代的数据是 7 个小时。以前是代替 1 个程序员, 现在可以代替 4 个了呢。 更夸张的是,它能连续敲出约 11,000 行代码,快速搞出来一款聊天应用。我费劲工作一个月还不如 AI 轻松工作一天。好好好,既生我,何生 AI …… | | Claude | Claude | Claude | GPT-5 | Gemini | | --- | --- | --- | --- | --- | --- | | | Sonnet 4.5 | Opus 4.1 | Sonnet 4 | | 2.5 Pro | | | 77.2% | 74.5% | 72.7% | 72.8% GPT-5 | | | Agentic coding SWE-bench Verified | 82.0% | 79.4% | 80.2% | | 67.2% | | | with pa ...
谷歌甩出“价格屠夫”,Gemini 3 Flash超Pro,成本仅1/4,速度如“闪电”
3 6 Ke· 2025-12-18 03:09
Core Insights - Google has launched Gemini 3 Flash, a cost-effective AI model that aims to provide cutting-edge intelligence at a lower price point compared to its predecessors, achieving performance levels that meet or exceed flagship models like Claude Sonnet 4.5 and GPT-5.2 [1][5] Pricing and Performance - Gemini 3 Flash offers a price of $0.50 per million tokens, which is only 20% of Claude Sonnet 4.5's cost and 21% of GPT-5.2's cost, while outperforming these models in benchmark tests [1][9] - Compared to Gemini 3 Pro, Flash is priced at 25% of Pro's cost, yet it surpasses Pro in key benchmarks such as MMMU-Pro and SWE-bench Verified [1][5] Features and Capabilities - Designed for iterative development, Gemini 3 Flash combines the reasoning capabilities of Gemini 3 Pro with lower latency and cost, making it suitable for real-time applications [5][6] - The model excels in multimodal reasoning, allowing it to analyze images and videos quickly, providing actionable insights and enhancing user interaction [6][7] - It can convert unstructured ideas into functional applications using voice input, demonstrating its versatility in application development [7] Benchmark Performance - In various benchmark tests, Gemini 3 Flash achieved scores such as 90.4% in GPQA Diamond and 81.2% in MMMU-Pro, outperforming Gemini 2.5 Pro and matching Gemini 3 Pro [8][9] - The model also shows significant improvements in coding tasks, scoring 78% in SWE-bench Verified, which is higher than both Gemini 2.5 series and Gemini 3 Pro [8][9] Efficiency and Cost-Effectiveness - Gemini 3 Flash is designed to optimize efficiency, achieving 30% less token usage on average compared to Gemini 2.5 Pro while maintaining high performance [11] - The model's development aims to push the boundaries of quality, cost, and speed, making it a competitive option in the AI landscape [11] Conclusion - The introduction of Gemini 3 Flash fills a gap in the Gemini 3 family by offering a lightweight, cost-effective solution that meets the demands of developers in real-world applications [12] - Its enhanced performance and affordability are expected to facilitate broader integration of AI into everyday applications and business systems [12]
狙击Open AI!谷歌一个月内连发「数弹」
Xin Lang Ke Ji· 2025-12-18 01:39
轻量化模型不再"弱"。 "为速度而生的前沿智能",12月18日凌晨,谷歌发布博客,官宣又一王炸Gemini 3 Flash,这是Gemini 3 系列速度最快、性价比最 高的模型,也是但同时,这次被行业关注的点在于,这个Flash模型在做到又快又便宜的同时,部分性能甚至能比旗舰模型要好。 值得注意的是,这也是谷歌一个月内在大模型领域的第四次动作更新。 **Acknowledgments** I would like to thank my supervisor, for his kind of support. I would like to thank my supervisor, for his kind of support. 谷歌CEO桑达尔·皮查伊(Sundar Pichai)发帖介绍,Gemini 3 Flash性能和效率均突破了帕累托极限,它的性能超越了上一代旗舰 模型2.5 Pro,同时速度提升了 3 倍,而价格却低得多。 "Gemini 3 Flash 证明,速度和规模无需以牺牲智能为代价。"在博客中,官方放出豪言。从评测数据来看确实如此。 在用于评估编程能力的基准测试 SWE-bench ...
狙击Open AI!谷歌一个月内连发“数弹”
第一财经· 2025-12-18 00:58
2025.12. 18 本文字数:1541,阅读时长大约3分钟 作者 | 第一财经 刘晓洁 "为速度而生的前沿智能",12月18日凌晨,谷歌发布博客,官宣又一王炸Gemini 3 Flash,这是 Gemini 3 系列速度最快、性价比最高的模型,也是但同时,这次被行业关注的点在于,这个Flash模 型在做到又快又便宜的同时,部分性能甚至能比旗舰模型要好。 值得注意的是,这也是谷歌一个月内在大模型领域的第四次动作更新。 根据大模型竞技场Imarena.ai的数据,目前Gemini 3 Flash在文本、图像和编程领域排名前 5,数学 和创意写作类别排名第2,是性价比最高的前沿模型,输入仅0.5 美元/百万Tokens,输出3美元/百万 Tokens。 作为对比,Claude Sonnet 4.5的输出是15美元/百万Tokens,GPT-5.2的输出是14美元/百万Tokens, 是Gemini 3 Flash定价的近5倍。 谷歌表示,在最高思维水平下进行处理时,Gemini 3 Flash 能够灵活调整其思考时间。对于更复杂的 应用场景,它可能需要更长的思考时间,但根据典型流量的测试结果,它平均使用的令牌数 ...
Gemini 3 Flash 倒反天罡了:关键性能居然超过了 Pro
3 6 Ke· 2025-12-18 00:54
12 月 17 日,Google 正式发布 Gemini 3 Flash。一个定价只有 Claude 1/5、GPT 1/4 的"轻量模型",在编码上超过 Claude Sonnet 4.5,在推理和多模态上 全面碾压,和 GPT-5.2 也互有胜负。 | | Gemini 3 Flash | Claude Sonnet 4.5 | GPT-5.2 | | --- | --- | --- | --- | | 输入价格 | $0.5 | $3 | $1.75 | | 输出价格 | दें र | $15 | $14 | | SWE-bench 编码 | 78% | 77.2% | 80% | | GPQA 科学推理 | 90.4% | 83.4% | 92.4% | | MMMU-Pro 多模态 | 81.2% | 68.0% | 79.5% | MMMU-Pro,多模态的评估效果: 更夸张的是,它甚至超过了自家旗舰:在 SWE-bench 上,Gemini 3 Flash 78%,Gemini 3 Pro 76.2%,这也是 Flash 系列诞生以来超过同代 Pro 模型的第 一次。 数据可能还是有点抽象,直 ...
狙击Open AI!谷歌一个月内连发“数弹”
Di Yi Cai Jing· 2025-12-18 00:29
Core Insights - Google has launched the Gemini 3 Flash model, which is the fastest and most cost-effective model in the Gemini 3 series, outperforming flagship models in certain performance metrics while being cheaper [1][3]. Performance and Efficiency - Gemini 3 Flash surpasses the previous flagship model 2.5 Pro in performance and is three times faster, with significantly lower pricing [3]. - In benchmark tests, Gemini 3 Flash scored 78% in SWE-bench Verified, outperforming Gemini 3 Pro and Claude Sonnet 4.5, and achieved 81.2% in MMMU-Pro, exceeding GPT-5.2 by several percentage points [3][4]. Cost Structure - The input cost for Gemini 3 Flash is $0.50 per million tokens, while the output cost is $3.00 per million tokens, making it the most cost-effective model compared to Claude Sonnet 4.5 ($15.00) and GPT-5.2 ($14.00) [5][6]. User Adoption and Market Position - The model is expected to reduce costs for developers by 50%-70% compared to previous models like GPT-4o or Gemini 3 Pro, making it accessible even to free users [8]. - Since its launch, Gemini 3 has been widely adopted, processing over 1 trillion tokens daily, and is used for various applications including code simulation and interactive game design [8][9]. Competitive Landscape - With the introduction of Gemini 3 Flash, Google aims to solidify its position as a leader in the large model space, having recently surpassed OpenAI in market recognition [8][9].
罗福莉执掌小米大模型首秀!定调下一代模型,全新MiMo-V2开源还横扫Agent第一梯队
AI前线· 2025-12-17 08:00
作者 | 木子 MiMo-V2-Flash,是小米在今天凌晨发的 新一代 MiMo 模型,而且还给开源了 。 今天上午,在 2025 小米人车家全生态合作伙伴大会上, 罗福莉首次公开亮相 ,Title 是 Xiaomi MiMo 大模型负责人 。 罗福莉还在会上发表演讲,解读了小米的全新大模型 MiMo-V2-Flash 以及背后团队的故事。 这里简单回顾下 MiMo 模型是什么:它是小米自研的大语言模型(LLM)系列;而 MiMo-V2-Flash 不仅 在通用基准测试中和 DeepSeek-V3.2 相当 , 同时 还拉爆性价比,对 Agent 场景友好。 "这只是我们在 AGI 路线图上的第二步。" MiMo-V2-Flash 采用了当前很流行但工程难度也很高的 MoE(混合专家)架构 ,其 总参数规模达 3090 亿 ,但在每次推理时, 真正被"点亮"的只有约 150 亿参数。 此外,它还搭载了 多词元预测(MTP)技术 ,专为高速推理和 Agent 工作流程而设计。与很多追求"参数越大越好"的模型不同,MiMo-V2-Flash 的设 计目标可谓是:"要跑得快、跑得久、被高频调用也跑得起"。 不过在 ...
X @Tesla Owners Silicon Valley
Tesla Owners Silicon Valley· 2025-12-17 00:27
🚨 BREAKING: xAI’s Grok Code Fast 1 DOMINATES the OpenRouter token usage leaderboard! 🚀Crushing the competition with 548 BILLION tokens processed — that’s 38% market share and way ahead of Gemini 2.5 Flash (449B) and Claude Sonnet 4.5 (420B).Real-world adoption doesn’t lie: developers are choosing speed, efficiency, and power. ...
GPT-5.2“发布在即”,微软CEO宣布:周五将揭晓“下一代”Agentic AI模型
Hua Er Jie Jian Wen· 2025-12-11 06:07
| Benchmark | Description | GPT-S.2 | Gemini 3 Pro | Gemini 2.5 Pro | Claude Sonnet 4.5 | | --- | --- | --- | --- | --- | --- | | Humanity's Last Exam | Academic reasoning | 67.4% | 37.5% | 21.6% | 13.7% | | ARC-AGI-2 | Visual reasoning puzzies | 62.2% | 31.1% | 4.9% | 13.6% | | GPQA Diamond | Sclentific knowledge | 95.8% | 91.9% | 86.4% | 83.4% | | AIME 2025 (No tools) | Mathematics | 100% | 95.0% | 88.0% | 87.0% | | AIME 2025 (With code) | | 100% | 100% | - | 100% | | MathArena Apex | Chalienging Math Con ...