Claude 4 Sonnet

Search documents
Grok4全网玩疯,成功通过小球编程测试,Epic创始人:这就是AGI
猿大侠· 2025-07-12 01:45
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 发布还不到一天,马斯克的Grok4就已经被网友们玩疯了。 比如有网友报告,Grok-4已经 成功通过了著名的六边形小球氛围编程测试 。 只见随着六边形的不断旋转,小球错落有致地从开口下落。 拿着显微镜捉虫的网友发现小球在返回中心位置时会穿墙,但作者表示这是故意为之。 | Plutus � @PlutusCosmos · 17小时 | | | | | --- | --- | --- | --- | | The balls penetrate the walls when the go back to the center. Is it intended? | | | | | O3 | U | ♡ 74 | 111 2.5万 | | Flavio Adamo � @flavioAd · 17小时 | | | | | yes | | | | | 01 | 17 | C 59 | 1 1 2.5万 | | SoyTeslike � @soyteslike · 16小时 | | | | | damn, already screenshotted but it wa ...
Grok4全网玩疯,成功通过小球编程测试,Epic创始人:这就是AGI
量子位· 2025-07-11 07:20
只见随着六边形的不断旋转,小球错落有致地从开口下落。 发布还不到一天,马斯克的Grok4就已经被网友们玩疯了。 比如有网友报告,Grok-4已经 成功通过了著名的六边形小球氛围编程测试 。 克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 拿着显微镜捉虫的网友发现小球在返回中心位置时会穿墙,但作者表示这是故意为之。 | Plutus � @PlutusCosmos · 17小时 | | | | | --- | --- | --- | --- | | The balls penetrate the walls when the go back to the center. Is it intended? | | | | | O3 | U | ♡ 74 | 111 2.5万 | | Flavio Adamo � @flavioAd · 17小时 | | | | | yes | | | | | () 1 | 11 | C 59 | 111 2.5万 | | SoyTeslike � @soyteslike · 16小时 | | | | | damn, already screenshotted but it ...
马斯克发布“全球最强AI模型”Grok 4,称这是人工智能第一次能够解决真实世界中难以解决的复杂工程问题
Sou Hu Cai Jing· 2025-07-10 11:42
Core Insights - Musk announced the release of Grok 4, claiming it is the first AI capable of solving complex engineering problems that cannot be found in the internet or books [4] Group 1: Product Features - Grok 4 is a reasoning model that supports both text and image inputs, function calls, and structured outputs [2] - It has a context window of 256K tokens, which is lower than Gemini 2.5 Pro's 1M tokens but higher than Claude 4 Sonnet and Opus (200K tokens) and R1 0528 (128K tokens) [2] - The pricing for Grok 4 is similar to Grok 3, at $3/15 per million input/output tokens, with cache input tokens priced at $0.75 per million [2] Group 2: Performance Metrics - Grok 4 outputs 75 tokens per second, which is slower than o3 (188 tokens/s), Gemini 2.5 Pro (142 tokens/s), and Claude 4 Sonnet Thinking (85 tokens/s), but faster than Claude 4 Opus Thinking (66 tokens/s) [3] - It ranks first in various benchmarks such as Humanity's Last Exam, MMLU-Pro, AIME 2024, AIME 25, and GPQA, outperforming OpenAI's o3 and Google's Gemini 2.5 Pro [3] Group 3: Future Developments - xAI announced upcoming products, including an AI programming model set to launch in August, a multimodal agent in September, and a video generation model in October [5]
1.93bit版DeepSeek-R1编程超过Claude 4 Sonnet,不用GPU也能运行
量子位· 2025-06-10 04:05
并且aider是一个接近现实软件工程任务的榜单,不是靠做题就能取胜。 △ 图中R1为一月份的0120满血版 克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 1.93bit量化 之后的 DeepSeek-R1(0528),编程能力依然能超过Claude 4 Sonnet? 最新优化版R1 在编程榜单aider上取得了60%的成绩 ,不仅超过了Claude 4 Sonnet的56.4分,也超过了1月版的满血R1。 体积方面,相比8bit原始版,这个1.93bit版本, 文件大小降低了70%以上 。 看到如此轻量级的版本能有这样的表现,连作者本人都感到震惊。 而R1-0528的满血版在aider上则是取得了71.4分,超过了不开启思考的Claude 4 Opus。 量化版R1,不用GPU也能跑 这个量化版本来自Unsloth工作室,从1.66到5.5bit,Unsloth一共制作了9个量化版本。 | MoE Bits | Type + Link | Disk Size | Details | | --- | --- | --- | --- | | 1.66bit | TQ1_0 7 | 162GB | 1. ...
DeepSeek-R1 再进化,这次的更新好强啊...
3 6 Ke· 2025-06-04 03:32
Core Viewpoint - DeepSeek has released an upgraded version of its R1 model, named DeepSeek-R1-0528, which shows significant improvements in reasoning, programming, and reducing hallucinations compared to its predecessor [1][3][22]. Model Improvements - The new version retains the base model from December 2024 but has enhanced computational power, allowing for deeper reasoning and more detailed problem-solving [4][6]. - The average token usage for the AIME 2025 test increased from 12K to 23K tokens, resulting in an accuracy improvement from 70% to 87.5% [4][5]. Benchmark Performance - In various benchmarks, DeepSeek-R1-0528 achieved notable scores, such as 87.5% in the AIME 2025 math competition, outperforming its predecessor and showing competitive results against models like OpenAI's and Gemini 2.5 [5][15]. - The model's performance in coding tasks has reached levels comparable to OpenAI's models, with successful outputs in complex coding challenges [10][14]. Reduction of Hallucinations - The hallucination rate in the new model has decreased by 45% to 50%, leading to more reliable outputs in tasks such as summarization and reading comprehension [18]. Creative Writing Capabilities - DeepSeek-R1-0528 has shown improvements in creative writing, producing coherent and logical narratives without the previous issues of "getting stuck" [19][21]. User Reception - While some users express skepticism about the update's impact, many remain optimistic about DeepSeek's potential as a representative of domestic AI technology [22][23].
整理:每日科技要闻速递(5月23日)
news flash· 2025-05-23 00:02
1. 台积电等厂商纷纷建言美商务部,提议豁免半导体相关关税。 2. 小米公司:玄戒O1旗舰处理器,采用16核GPU,搭载最新Immortalis-G925,处理器实验室跑分突破 300万。 3. 英特尔:全新Xeon 6系列处理器上市,目前三款中其中一款已被用作英伟达DGX B300的主机CPU。 人工智能: 金十数据整理:每日科技要闻速递(5月23日) 新能源汽车: 集成电路(芯片): 1. 高合汽车复活:出资1亿美元,黎巴嫩商人成为高合汽车老板。 2. 机构:比亚迪4月欧洲纯电车销量首次超过特斯拉。 3. 长安汽车:未来3年将推出35款数智新汽车 2026年实现固态电池装车验证。 4. 小米汽车:2025年累计交付258000台,4月交付超28000台。 5. 小米发布首款SUV,小米YU 7零百加速3.23s,最高时速253km/h。 6. 美国参议院投票通过终止加州禁售燃油车的法案,并将该法案送交特朗普签署。 1. 众擎机器人格斗赛将于12月在深圳举办。 2. 彩虹-YH1000无人物流机成功首飞。 3. Anthropic发布Claude 4 Opus和Claude 4 Sonnet人工智能模型。 ...
人工智能公司Anthropic发布Claude 4 Opus和Claude 4 Sonnet人工智能模型。
news flash· 2025-05-22 16:40
人工智能公司Anthropic发布Claude 4 Opus和Claude 4 Sonnet人工智能模型。 ...