Workflow
OpenAI
icon
Search documents
闹玩呢!首届大模型对抗赛,DeepSeek、Kimi第一轮被淘汰了
机器之心· 2025-08-06 04:31
机器之心报道 机器之心编辑部 从目前战况来看,Grok 4 是夺冠热门。 在玩游戏方面,到底哪个模型最厉害?为了回答这个问题,谷歌近日发起了首届大模型国际象棋对抗赛。 这场比赛为期三天,参赛选手包括: 刚刚,我们拿到了第一轮比赛的结果:Gemini 2.5 Pro、o4-mini、Grok 4 和 o3 均以 4-0 的战绩分别击败 Claude 4 Opus、DeepSeek R1、Gemini 2.5 Flash 和 Kimi k2,晋级半决赛。 以下是模型对阵图。 这个比赛是在一个名叫「Kaggle Game Arena」的平台上进行的。这是 Kaggle 公司的一个新项目,旨在跳出平时的基准测试框架,探索像 Gemini、DeepSeek 等 LLM 在动态和竞争环境中表现如何。 在昨天的报道中,我们详细描述了这场比赛的规则,比如不允许模型调用 Stockfish 等国际象棋引擎。(详情请参见《 谷歌约战,DeepSeek、Kimi 都要上,首届大 模型对抗赛明天开战 》) 以下是对战的详细信息: Kimi k2 对阵 o3:0-4 Kimi k2 与 o3 的对局较早结束,四局比赛都在八步棋内完成。 ...
Claude 小升级就赢了OpenAI 9年“开源神作”?高强度推理直接歇菜、幻觉率高达50%,写作还被Kimi 2吊锤?
AI前线· 2025-08-06 04:25
Core Viewpoint - OpenAI has released its first open-source language model series, gpt-oss, which includes gpt-oss-120b and gpt-oss-20b, both of which are fully customizable and support structured output [2][3]. Model Specifications - gpt-oss-120b requires 80GB of memory to run, while gpt-oss-20b only needs 16GB [2]. - The models utilize a mixture of experts (MoE) architecture, activating 5.1 billion parameters per token for gpt-oss-120b and 3.6 billion for gpt-oss-20b, with total parameters of 117 billion and 21 billion respectively [9]. - Both models support a context length of up to 128k and are designed for efficient deployment on consumer-grade hardware [10]. Training and Performance - The training process for gpt-oss models combines reinforcement learning and techniques from OpenAI's advanced internal models, focusing on reasoning capabilities and efficiency [8]. - gpt-oss models have shown strong performance in reasoning tasks, with gpt-oss-120b performing comparably to OpenAI's proprietary models in core inference benchmarks [10]. Comparison with Competitors - Claude Opus 4.1 has demonstrated superior programming performance with a score of 74.5% in SWE-bench Verified programming evaluations, outperforming previous versions [5]. - Independent benchmark tests indicate that gpt-oss-120b is less intelligent than DeepSeek R1 and Qwen3 235B, although it has advantages in efficiency due to its smaller parameter size [13]. User Feedback and Limitations - Users have reported mixed experiences with gpt-oss models, noting that gpt-oss-120b is particularly unstable for coding tasks, while gpt-oss-20b performs better [6][17]. - The models exhibit a higher hallucination rate, with gpt-oss-120b and gpt-oss-20b generating hallucinations at rates of 49% and 53% respectively, significantly higher than OpenAI's previous models [16]. Open Source and Accessibility - gpt-oss models are released under the flexible Apache 2.0 license, making them accessible for various applications, including agent workflows and tool usage [11][10]. - The models are available for free download on Hugging Face, promoting wider adoption and experimentation within the developer community [2][3].
最高达250%!特朗普称将在“未来一周左右”宣布对芯片与药品进口征税
贝塔投资智库· 2025-08-06 04:01
点击蓝字,关注我们 除半导体之外, 特朗普还警告制药行业将面临更具破坏性的关税政策,目标是迫使制药产业回归美 国。他近日已向主要药品供应商施压,要求其大幅降低价格,否则将面临更多未公开的惩罚性措 施。 目前,默沙东(MRK.US)和礼来(LLY.US)等全球制药巨头在全球范围内设有大量制造工厂。据美国生 物技术创新组织数据,美国近九成的生物技术企业,其已获批产品中至少有一半依赖进口原材料或 关键组件。 此次涉及药品、金属等关键 行业的关税措施,均源自依据《贸易扩展法》第232条进行的国家安全 调查。 这类调查周期一般为九个月,较特朗普此前动用紧急权力对特定国家征税具有更强的法律基 础。 后者目前正面临法院的法律挑战 。 美国总统特朗普周二表示,美国政府将在"未来一周左右"宣布对半导体和制药行业的进口产品加征 关税,意在重塑全球贸易格局,并迫使关键产业回流美国本土制造。 特朗普在接受专访时称:"我们将先对药品加征一项初步的小幅关税,但在一年到一年半之内,这项 关税将升至150%,最终达到250%,因为我们希望药品在美国本土生产。" 他还表示,美国政府将"另行宣布"对半导体和芯片的关税措施。美国商务部自今年4月起 ...
御三家打起来了:OpenAI 开源、谷歌发布可交互的世界模型、Claude 4.1 成了编程新旗舰
Founder Park· 2025-08-06 03:43
同一天,硅谷模型三巨头连续发布了新的模型(到底也不知道谁截胡谁了)。 OpenAI 终于发布了新的开源模型,gpt-oss-120b 和 gpt-oss-20b,上次开源 GPT-2 已经是 6 年前的事情了。从目前的评测成绩来看,两款模型能力接近 o4- mini,虽然编程能力略弱,但这个 SOTA 级别的能力表现,很期待接下来的开源生态的发展。 DeepMind 也发了个大招,一个看起来基本进入可用阶段的世界模型 Genie 3,一句话直接生成可交互的 3D 世界、角色和道具,目前尚未对外开放,但演 示片很震撼。 Claude 发布了旗舰模型 Opus 的小版本升级——Claude Opus 4.1,编程能力依旧没得说,这次强化了 Agent 能力。 接下来,该期待 DeepSeek R2 了。 文章内容编译自「机器之心」、部分官博文章。 超 10000 人的「AI 产品市集」社群!不错过每一款有价值的 AI 应用。 邀请从业者、开发人员和创业者,飞书扫码加群: 进群后,你有机会得到: 01 OpenAI 开源两个推理模型, o4-mini 水平 最新、最值得关注的 AI 新品资讯; 不定期赠送热门新品的 ...
X @Cointelegraph
Cointelegraph· 2025-08-06 03:30
Regulatory & Legal Developments - Binance co-founder CZ files request for dismissal from $18 billion FTX trust lawsuit, arguing US bankruptcy court lacks jurisdiction [1] - US SEC signals that crypto liquid staking setups and receipt tokens are not considered as securities, a key signal for staking ETF issuers [1] - Brazil to conduct public hearing on August 20 on proposed national Bitcoin reserve [1] - President Trump says the Fed governor decision will be made before the end of the week, considering four people for Fed chair [1] - Trump claims banks like JPMorgan and BofA previously discriminated against him and conservative groups [2] Company Initiatives & Investments - OpenAI releases their first open source AI model, GPT-OSS [1] - Coinbase plans to raise $2 billion through convertible notes to offset dilution and support tech, operations, and potential acquisitions [1] - Michigan state pension buys 200 thousand more shares of ARK's Bitcoin ETF [3] - Coinbase now lets Canadian users buy and sell crypto using PayPal [3] Cryptocurrency & Digital Asset Adoption - Swiss AMINA Bank becomes the first in the world to support trading and custody for $SUI [3]
X @Cointelegraph
Cointelegraph· 2025-08-06 03:00
🚨 JUST IN: OpenAI is considering for secondary share sale at $500B valuation for current and former employees, up from the previous $300B, per Bloomberg. https://t.co/1BDFA5scF4 ...
对话启明创投周志峰:科技投资要追求「逐浪而行」,也要讲究「以史为鉴」
IPO早知道· 2025-08-06 02:42
在具体谈及两大热门科技赛道时, 周志峰认为, 基础大模型的第一梯队目前已经比较清晰,且现阶 段 任何大模型 在基准测评榜上的领先优势不会超过 三个月, 中国和 美国 都是如此 ,这几家 会处 于 你追我赶 、交替上升的局面 , 并不会明显分出高低名次 。 拉长时间线来看,若干年后评判一 家大模型企业,一定不再是看其模型的特点或测试排名,而是从应用落地和营收规模这两个角度出 发。 启明创投是人工智能领域中国乃至亚洲最活跃、最具影响力的投资机构。 本文为IPO早知道原创 作者| Stone Jin 微信公众号|ipozaozhidao 据 IPO早知道消息,启明创投日前在 WAIC 2025 成功举办 "启明创投·创业与投资论坛——创业投 资开启AI技术与应用共振周期" 。 作为中国在 AI领域最早投资且布局最丰富的投资机构,启明创投 已 连续第三年主办 WAIC 的创业 与投资 论坛 ,并连续第三年发布 启明创投 AI十大展望。 (详见: 启明创投于WAIC 2025再发AI 十大展望 ) 这里不妨补充一点, 从 AI 1.0到AI 2.0,启明创投累计投资100余个AI项目,投资企业覆盖AI产业 全链条,助推 ...
四巨头“烧钱凶猛”,非美和二线云厂被低估,GB200良率提升!大摩对AI服务器非常乐观
美股IPO· 2025-08-06 02:25
Core Viewpoint - The AI-driven global cloud infrastructure competition is accelerating, with significant capital expenditure increases expected from major cloud service providers, indicating a robust growth trajectory for the AI server market, particularly from non-U.S. regions and Tier 2 cloud providers [1][3][10]. Group 1: Capital Expenditure Projections - Morgan Stanley has significantly raised its capital expenditure forecasts for the four major U.S. cloud service providers—Amazon, Google, Meta, and Microsoft—projecting a combined capital expenditure of $359 billion in 2025, a 57% year-over-year increase, and $454 billion in 2026, a 26% increase [3][5]. - The capital expenditure for these four companies is expected to reach $100 billion in Q4 2025, reflecting a 39% year-over-year increase [5]. - Expanding the view to the top 11 global cloud service providers, total capital expenditure is projected to reach $445 billion in 2025, significantly higher than the previous estimate of $400 billion [5]. Group 2: Emerging Demand from Non-U.S. and Tier 2 Providers - The report highlights that the market may be underestimating the demand for PCIe/HGX servers from non-U.S. countries, with strong recovery in demand for B200 servers and anticipated growth for B300 servers [8]. - Tier 2 cloud service providers are catching up, with their AI server reserves potentially surpassing those of leading cloud providers, and are expected to significantly increase capital expenditure in the second half of 2026 [8]. Group 3: Supply Chain Improvements - Supply chain issues are easing, with improvements in the assembly yield of NVIDIA's next-generation GB200 chips, which is crucial for meeting the rising demand for AI servers [9]. - The GB300 sample testing is set to begin in Q3, with no significant issues reported, indicating a positive outlook for supply chain capabilities [9]. - Large-scale projects like "Stargate," involving OpenAI, SoftBank, and Oracle, are moving beyond planning stages and are engaging with Asian supply chains for server cabinet procurement, indicating a shift from "order-based" to "project-based" demand [9]. Group 4: Overall Industry Outlook - Morgan Stanley maintains a positive outlook on the cloud semiconductor industry, citing strong global demand, underappreciated growth areas, and improving supply chains as solid foundations for sustained industry growth in the coming years [10].
X @The Economist
The Economist· 2025-08-06 02:20
Two books tell a similar tale about OpenAI. It is worrying https://t.co/E30HrdUOBj ...
国元证券晨会纪要-20250806
Guoyuan Securities2· 2025-08-06 01:58
2025 年 8 月 6 日星期三 【实时热点】 请务必阅读免责条款 1 证 券 研 究 报 告 【美国债市】 资料来源:BLOOMBERG、AASTOCKS、WIND、格隆汇、国元证券经纪(香港)整理 美国 6 月贸易逆差 602 亿美元 为 2023 年 9 月以来最小 美国 7 月 ISM 非制造业指数由前值 50.8 降至 50.1,不及预 期 美国财政部本周拟创纪录发行千亿四周期国债 国办:免除公办幼儿园学前 1 年保教费 中国 7 月标普全球服务业 PMI 升至 52.6 宇树科技发布新款四足机器狗 淘宝即将上线大会员体系 OpenAI 发布了两个开放权重 AI 模型,能够模仿人类推理过 程 AMD 第二季度净利润同比下降 31% 消息称 iPhone 17 系列将出现显著涨价 2 年期美债收益率涨 4.9 个基点报 3.720% 5 年期美债收益率涨 3.96 个基点报 3.78% 10 年期美债收益率涨 1.17 个基点报 4.208% 【经济数据】 | 重要指数 | 收市价 | 涨跌(%) | 海外市场重要指数 | 收市价 | 涨跌(%) | | --- | --- | --- | --- ...