OpenAI
Search documents
闭源美国,开源中国!Kimi代码称王,通义数学夺冠,这份榜单必须转发
Xin Lang Cai Jing· 2026-02-06 00:30
Core Insights - The SuperCLUE report for 2025 highlights significant advancements in Chinese AI models, indicating a shift from "follower" to "peer" status in the global AI landscape, particularly in code generation and mathematical reasoning [1][11][12] Group 1: Global Model Rankings - The top three global models are closed-source, with Anthropic's Claude-Opus-4.5-Reasoning scoring 68.25, Google's Gemini-3-Pro-Preview at 65.59, and OpenAI's GPT-5.2 at 64.32 [2][12] - The highest-ranked Chinese model, Kimi-K2.5-Thinking, achieved 61.50, placing it fourth overall, while Qwen3-Max-Thinking from Alibaba Cloud secured the sixth position with 60.61 [2][12] Group 2: Performance in Specific Domains - In the code generation category, Kimi-K2.5-Thinking scored 53.33, surpassing both GPT-5.2 and Gemini-3-Pro, showcasing its strong potential in algorithm logic and cross-language adaptability [5][15] - In mathematical reasoning, Qwen3-Max-Thinking and Google's Gemini-3-Pro-Preview tied for first place with a score of 80.87, marking a significant achievement for Chinese models in complex reasoning tasks [5][15] Group 3: Open Source Model Dominance - All top five open-source models are from China, with Kimi-K2.5-Thinking leading the pack, demonstrating substantial advancements in scientific reasoning and knowledge application [6][16] - The rise of Chinese open-source models is seen as a victory for the AI ecosystem, providing low-cost and high-control AI solutions for various sectors [6][16] Group 4: Evolution of Chinese AI Models - The Chinese AI industry has transitioned from a focus on "parameter competition" to "capability enhancement," emphasizing precision, stability, and safety [6][17] - Innovations in model architecture and data processing are driving this evolution, supported by national standards in AI safety and compliance [7][17]
X @Decrypt
Decrypt· 2026-02-06 00:29
OpenAI and Anthropic Roll Out Rival AI Models as Competition for Enterprise Heats Uphttps://t.co/vng78WrV9V ...
陈丹琦入职Mira翁荔公司,原来是有IOI三金王赛友
量子位· 2026-02-06 00:15
鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 陈丹琦首次转身工业界,第一站就选择Mira初创的理由找到了—— 有个赛友也在这儿,还足足"潜伏"了一年之久。 这人就是和陈丹琦同年拿下IOI金牌的 Neal Wu 。 还不止一届,Neal Wu可是足足拿了三次IOI金牌,是美国队当之无愧的顶梁柱。 他还是全球首个AI程序员、此前炸翻硅谷的 Devin 缔造者之一。 而他的存在,原本一直被Mira视作 顶级机密 来着。 直到这场公司内讧,多名创始人集体"叛逃"回OpenAI,这位传奇程序员的行踪才意外浮出水面。 不过相对于老朋友陈丹琦,Neal Wu则显得更为低调。 其公开资料中从未透露过具体职位,仅隐晦地表示自己正在以联合创始人兼顾问的身份参与一项新计划。 开始时间是一年前,和当初Mira宣布成立新公司的时间线高度重合。 08年同为金牌的 陈丹琦 ,目前是普林斯顿大学计算机系副教授,以及NLP小组的联合负责人,还曾收获斯隆奖。 有趣的是,以前是对手现在成战友。 那么,Neal Wu究竟有什么过人之处,值得Mira如此大费周章地将他 "藏" 起来? Neal Wu其人 翻开Neal Wu的履历,可谓是天才少 ...
Claude新模型4.6来了!更多饭碗没了:华尔街财务、编译器、安全白帽、PPT…通通失守
量子位· 2026-02-06 00:15
Core Viewpoint - Anthropic's new model, Claude Opus 4.6, has significantly impacted the market, causing declines in major financial data service providers and indices due to concerns over AI's potential to disrupt various industries [1][2][3]. Model Performance - Claude Opus 4.6 outperforms OpenAI's GPT-5.2 by 144 Elo in the GDPval-AA evaluation, indicating superior performance in financial analysis and research tasks [7][42]. - In programming capabilities, Opus 4.6 achieved the highest score in the Terminal-Bench 2.0 assessment, demonstrating its advanced task planning and debugging abilities [30][31]. New Features - The model introduces a 1M token context window, significantly improving its ability to handle long texts and reducing context decay [12][14]. - Opus 4.6 features Adaptive Thinking, allowing it to autonomously determine when to engage in deep reasoning, enhancing its flexibility in various tasks [19][20]. - Context Compaction is a new feature that summarizes and replaces old content when approaching context limits, facilitating longer conversations and tasks [23][24]. Pricing and Accessibility - The pricing for Opus 4.6 remains unchanged at $5 per million tokens for input and $25 for output, with additional charges for exceeding 200k tokens in the 10M token context version [11][50][51]. Security and Ethical Considerations - Opus 4.6 has demonstrated unexpected capabilities in cybersecurity, identifying over 500 previously unknown high-risk zero-day vulnerabilities during testing [62][63]. - Anthropic has implemented new security detection mechanisms to mitigate potential misuse of these capabilities [68]. Development and Testing - The model has been developed using its own capabilities, with Anthropic engineers utilizing Claude Code for internal projects, indicating a self-reinforcing development cycle [69].
Amazon CEO: "Top 500 U.S. startups use AWS." ☁️
Yahoo Finance· 2026-02-06 00:13
More of the top 500 US startups use AWS as their primary cloud provider than the next two providers combined. Since our last call, we announced new agreements with OpenAI, Visa, MBA, Black Rockck, Perplexity, Lyft, United Airlines, Door Dash, Salesforce, US Air Force, Adobe, Thompson Reuters, AT&T, S&P Global, National Bank of Canada, the London Stock Exchange, Choice Hotels, Accenture, Indeed, HSBC, Crowdstrike, and many Four. ...
奥特曼发长文回应Anthropic超级碗广告;西门子收购法国半导体量测软件公司Canopus AI丨AIGC日报
创业邦· 2026-02-06 00:08
Group 1 - Alibaba has unified its AI branding under the name "Qwen," which includes both foundational and specialized models to eliminate confusion from multiple previous names [2] - OpenAI's CEO Sam Altman responded to Anthropic's Super Bowl advertisement, criticizing their portrayal of OpenAI's advertising model and promoting OpenAI's commitment to providing free access and developer tools [2] - The International Olympic Committee announced the launch of the first official Olympic model based on Alibaba's Qwen, which will be used at the Milan Winter Olympics to assist national Olympic committee staff with multilingual support [2] Group 2 - Siemens announced the acquisition of French semiconductor measurement software company Canopus AI, which aims to enhance the precision and efficiency of wafer and mask measurement processes using AI technology [2]
马斯克刚提拔了一位武汉理工校友;贾跃亭:FF发布3大系列机器人;美团7.17亿美元收购叮咚买菜;阿里巴巴大模型品牌统一为千问丨邦早报
创业邦· 2026-02-06 00:08
Group 1 - Meituan announced the acquisition of Dingdong Maicai for approximately $717 million, aiming to enhance operational efficiency and align with their mission of improving food quality and living standards [3] - Dingdong Maicai's CEO emphasized that the merger will not diminish their core competencies in product quality, service, and supply chain efficiency, but rather enhance their value on a larger platform [3] Group 2 - Faraday Future launched three series of EAI robots, with the Futurist series starting at $34,990, and received 1,211 paid pre-orders by the end of the launch event [7] - Tesla promoted Phil Duan to the position of Director of Autonomous Driving Engineering, coinciding with the launch of their Robotaxi service [9] Group 3 - Alibaba unified its AI model branding under "Qwen" to eliminate confusion from multiple names, with the core brand now being Qwen [11] - Baidu's Wenxin Assistant faced issues with WeChat blocking its red envelope sharing links, leading to a shift to a "password red envelope" format [11] Group 4 - Xiaomi reduced the safety mileage threshold for its assisted driving feature from 1,000 km to 300 km to help users gradually familiarize themselves with the technology [11] - Leap Motor's COO set a target of 1.05 million units for 2026, emphasizing a steady and efficient approach to growth [12] Group 5 - Bosch China denied rumors of layoffs, clarifying that personnel adjustments were limited to specific departments and were normal in the context of industry changes [15] - Tesla's Shanghai Gigafactory is projected to account for over half of Tesla's global deliveries by 2025 [15] Group 6 - TSMC plans to invest $17 billion in its Japan factory to mass-produce 3nm chips, supported by government subsidies [16] - OpenAI's CEO responded to competition from Anthropic, defending their advertising model and emphasizing their commitment to providing affordable AI services [16] Group 7 - North Chip Life successfully listed on the STAR Market, marking it as the first innovative medical device company to pass the new standards [17] - Qian Gu Technology completed a C round financing of 700 million yuan, with participation from multiple investment firms [18] Group 8 - The AI comic market in China is expected to see a significant increase, with over 80,000 related companies registered by 2025, reflecting a 37.1% year-on-year growth [22] - The global automotive market is projected to see China's market share reach 35.6% by 2025, with a notable increase in vehicle sales [22][23]
中门对狙!Claude Opus 4.6和GPT-5.3 Codex同时发布,这下真的AI春晚了。
数字生命卡兹克· 2026-02-05 23:58
在全网翘首以盼的等了两天之后,在凌晨2点。 Anthropic的新模型Cluade Opus 4.6正式更新了。 我说实话,我是真的最近因为AI圈这些模型和产品,熬夜熬的有点扛不住了。 但其实最颠最绝望的是,20分钟之后,OpenAI也发了新模型。。。 GPT 5.3 Codex也来了。 这尼玛,真的是中门对狙了。 要了亲命了。。。 这两模型都还是得看,因为之前GPT和Claude几乎就是我最常用的维二最主力的模型,GPT-5.2用来做各种各样的搜索和事实核查还有研究还有编程改 BUG,Opus 4.5做创作和主力编程。 现在,两个都来了。 太刺激了。 一个一个说吧。 一. Claude Opus 4.6 这就意味着Claude越来越会用电脑了,它能更好地操作鼠标、点击按钮、在不同应用之间切换,在Coding能力提升的同时,电脑操作的能力也有大幅提 升,这是真的要奔着全面Agent化去了。 还有一个 BrowseComp ,也是让我意外的,测的是Agent在网上搜索信息的能力,Opus 4.6拿了84.0%,远超其他模型。 第二名GPT-5.2 Pro是77.9%,差了6个多点。 这次 Anthropic其实 ...
硬碰硬!刚刚,Claude Opus 4.6与GPT-5.3-Codex同时发布
机器之心· 2026-02-05 23:45
Core Insights - The article discusses the recent releases of advanced AI models by Anthropic and OpenAI, specifically Claude Opus 4.6 and GPT-5.3-Codex, highlighting their significant improvements in performance and capabilities [2][15]. Summary of Claude Opus 4.6 - Claude Opus 4.6 represents a major upgrade for Anthropic's flagship AI model, featuring a more cautious planning approach and the ability to maintain longer autonomous workflows [5]. - The model introduces a context window of 1 million tokens, allowing it to process and reason with significantly more information than previous versions [6]. - It includes a "smart agent team" feature, enabling multiple AI agents to work on different aspects of coding projects simultaneously [6]. - Opus 4.6 outperformed competitors in various assessments, achieving the highest scores in Terminal-Bench 2.0 and leading in the "Humanity's Last Exam" [7]. - In GDPval-AA, Opus 4.6 scored approximately 144 Elo points higher than OpenAI's GPT-5.2 and 190 points higher than its predecessor, Claude Opus 4.5 [7]. - The model's performance in MRCR v2 testing showed a score of 76%, significantly higher than Sonnet 4.5's 18.5%, indicating a qualitative leap in context utilization [9]. Summary of GPT-5.3-Codex - OpenAI's GPT-5.3-Codex claims to have the best coding performance to date, achieving record scores in multiple benchmarks, including 56.8% in SWE-Bench Pro and 77.3% in Terminal-Bench 2.0 [16][19]. - The model integrates the advanced coding capabilities of GPT-5.2-Codex with enhanced reasoning and expertise from GPT-5.2, resulting in a 25% speed improvement [19][20]. - GPT-5.3-Codex is designed to function as a comprehensive work assistant, capable of handling tasks across the software lifecycle, including debugging, deployment, and user research [25]. - The model allows for real-time interaction, enabling users to guide and supervise multiple working agents without losing context [27]. - OpenAI emphasizes that the advancements in GPT-5.3-Codex have fundamentally changed the workflow of their research and engineering teams, enhancing productivity and interaction quality [28][29]. Conclusion - The article concludes that the competitive landscape of AI models is intensifying, with both Anthropic and OpenAI making significant strides in capabilities and performance, setting the stage for further developments in the industry [31].
Anthropic just dropped Opus 4.6...
Matthew Berman· 2026-02-05 23:35
Claude Opus 4.6% is here and it is a big step forward, an improvement over Opus 4.5% and I actually got early access to it. I've been playing around with it and yes, it is that good. Let me tell you everything about it.According to the blog post, it plans more carefully, sustains a gentic task for longer, can operate more reliably in larger code bases, and has better code review and debugging skills to catch its own mistakes. This is the key line. sustains agentic tasks for longer.That is the direction that ...