O3

Search documents
Grok4全网玩疯,成功通过小球编程测试,Epic创始人:这就是AGI
猿大侠· 2025-07-12 01:45
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 发布还不到一天,马斯克的Grok4就已经被网友们玩疯了。 比如有网友报告,Grok-4已经 成功通过了著名的六边形小球氛围编程测试 。 只见随着六边形的不断旋转,小球错落有致地从开口下落。 拿着显微镜捉虫的网友发现小球在返回中心位置时会穿墙,但作者表示这是故意为之。 | Plutus � @PlutusCosmos · 17小时 | | | | | --- | --- | --- | --- | | The balls penetrate the walls when the go back to the center. Is it intended? | | | | | O3 | U | ♡ 74 | 111 2.5万 | | Flavio Adamo � @flavioAd · 17小时 | | | | | yes | | | | | 01 | 17 | C 59 | 1 1 2.5万 | | SoyTeslike � @soyteslike · 16小时 | | | | | damn, already screenshotted but it wa ...
马斯克吹牛了吗?Grok 4第一波实测出炉:既能完虐o3,也菜到数不清6根手指
机器之心· 2025-07-11 08:27
机器之心报道 机器之心编辑部 网友氪重金体验Grok4。 昨天,马斯克亮相 Grok 4 发布会 ,一脸骄傲地表示:Grok 现在所有学科都达到博士后水平,没有例外,甚至可以在今年内实现科学新发现。 这一下子激起全球网友的兴趣,即使 Grok 4 的价格不菲,不少网友还是自愿氪金去体验一把。 他用相同的提示词对比了 Grok 4 和 o3 的生成效果。 提示词:Create a HTML, CSS, and javascript where a ball is inside a rotating hexagon. The ball is affected by Earth's gravity and friction from the hexagon walls. The bouncing must appear realistic.(创建一个包含 HTML、CSS 和 JavaScript 的项目,实现一个在旋转六边形内部的球 体,该球体受到地球引力和六边形壁摩擦力的影响,其反弹效果必须看起来逼真。 ) 可能会有小伙伴提出质疑,在往期的测试中,o3-mini 不是都能顺利完成任务吗?详见机器之心文章《 o3 ...
Grok4全网玩疯,成功通过小球编程测试,Epic创始人:这就是AGI
量子位· 2025-07-11 07:20
只见随着六边形的不断旋转,小球错落有致地从开口下落。 发布还不到一天,马斯克的Grok4就已经被网友们玩疯了。 比如有网友报告,Grok-4已经 成功通过了著名的六边形小球氛围编程测试 。 克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 拿着显微镜捉虫的网友发现小球在返回中心位置时会穿墙,但作者表示这是故意为之。 | Plutus � @PlutusCosmos · 17小时 | | | | | --- | --- | --- | --- | | The balls penetrate the walls when the go back to the center. Is it intended? | | | | | O3 | U | ♡ 74 | 111 2.5万 | | Flavio Adamo � @flavioAd · 17小时 | | | | | yes | | | | | () 1 | 11 | C 59 | 111 2.5万 | | SoyTeslike � @soyteslike · 16小时 | | | | | damn, already screenshotted but it ...
深度|Sam Altman回应与微软分歧及行业诉讼:这是一段有着广阔未来的合作关系
Z Potentials· 2025-07-11 06:11
图片来源: Hard Fork Z highlights Casey Newton 是科技领域撰稿人,常聚焦 AI 行业动态与深度对话; Kevin Roose 作为《纽约时报》科技记者,追踪 AI 发展及相关争议;二者均是播客节 目《 Hard Fork 》的主持人。 Sam Altman 是 OpenAI 首席执行官; Brad Lightcap 负责 OpenAI 业务管理,助力 AI 技术落地与生态构建。本次访谈是 Hard Fork 直播的一个片段,该访谈首发于 2025 年 6 月 26 日 Hard Fork 频道。 轻松又诚意满满的对话开始 ——AI 大佬们的 " 轻松开麦 " 时刻 Casey Newton : 节目大概进行到一半了,我想简单地问一下,你们现在感觉怎么样?我们在这边玩得挺开心的,哇哦。你是认真的吗?(笑)不太可能 吧。好吧,那挺好。 Sam Altman : 他想回去了 . Casey Newton : 别这样,来吧,来吧。是的,是的。你好, Sam 。你好, Brad 。 Sam Altman : 见到你真好。 Casey Newton : 我太喜欢了,今天我们是现场直播, ...
AI们数不清六根手指,这事没那么简单
Hu Xiu· 2025-07-11 02:54
昨天Grok4发布完以后,我随手刷了一下X。 然后看到了一个非常有趣的帖子,来自@lepadphone。 我以为,这就是Grok4的问题,模型能力不太行,把一个恶搞的6根手指,数成了5根。 我自己也去测了一下,确实数是5根。 我本来没当回事。 直到我随手把它扔到了OpenAI o3里,发现事情开始不对了起来。因为,o3回复的也是5根手指。 我瞬间皱了眉头,然后扔给了o3 pro。在推理了48秒之后,还是5根。 然后我又把这张图扔给了豆包、kimi、Gemini等几乎所有有多模态的模型。 无一例外,所有的模型,给我的回复都是5根。唯独有一个活口,Claude 4,偶尔会回答正确。 我瞬间一股子冷汗就下来了。一个模型数错了,可能是幻觉,所有的模型都数错,那模型底层肯定有一些问题。 我深夜在群里试图问了一下,结果石沉大海。 那就只能靠自己了,再搜了一堆资料,用DeepReaserch做了深度搜索以后,我找到了一篇能完美解答这个现象的论文:《Vision Language Models are Biased》(视觉语言模型存在偏见)。 这篇论文发表于今年5月29号,至今也才1个多月的时间,还蛮新的。 我花了一些时间, ...
全球最强AI模型?马斯克发布Grok 4!重仓国产AI产业链的589520单日吸金3922万元!
Xin Lang Ji Jin· 2025-07-11 01:17
Group 1: AI Model Development - xAI's Grok 4 achieved an accuracy rate of 25.4% in "Humanity's Last Exam," surpassing Google's Gemini 2.5 Pro at 21.6% and OpenAI's o3 at 21% [1] - The emergence of multi-modal large models is expected to create significant investment opportunities in both computational power and applications [1] - The AI sector is likely to see further catalytic events in the second half of the year, including the release of new models and platforms from companies like OpenAI and NVIDIA [1] Group 2: Investment Trends - The AI investment trend is gaining momentum, particularly following NVIDIA's market capitalization reaching 4 trillion [2] - The Huabao ETF, focused on the domestic AI industry chain, saw a net inflow of 39.22 million yuan on July 10, with 8 out of the last 10 trading days showing net inflows totaling 50.65 million yuan [2] - Analysts emphasize the importance of experiencing the benefits of the AI era and recognizing the long-term investment value in the rapidly evolving AI technology landscape [4] Group 3: Domestic AI Development - Domestic AI model DeepSeek has made significant advancements, breaking through overseas computational barriers and establishing a foundation for local AI companies [5] - The Huabao ETF is strategically positioned in the domestic AI industry chain, benefiting from the acceleration of AI integration in edge computing and software [5]
AI们数不清六根手指,这事没那么简单。
数字生命卡兹克· 2025-07-10 20:40
昨天Grok4发布完以后,我随手刷了一下X。 然后看到了一个非常有趣的帖子,来自@lepadphone。 我以为,这就是Grok4的问题,模型能力不太行,把一个恶搞的6根手指,数成了5根。 我自己也去测了一下,确实数是5根。 我本来没当回事。 直到,我随手扔到了OpenAI o3里,发现,事情开始不对了起来。因为,o3回复,也是5根手指。 我瞬间皱了眉头,然后扔给了o3 pro。 在推理了48秒之后,还是5根。 然后我又把这张图扔给了豆包、kimi、Gemini等等所有的有多模态的模型。 而无一例外,所有的模型,给我回复的,都是5根。 唯独有一个活口,Claude 4,偶尔会回答正确。 瞬间一股子冷汗就下来了。 一个模型数错了,可能是幻觉,所有的模型都数错,那,模型的底层肯定有一些问题。 深夜在群里试图问了一下,结果石沉大海。 那就只能靠自己了,再搜了一堆资料,用DeepReaserch做了深度搜索以后,我找到了一篇能完美解答这个现象的论文。 《Vision Language Models are Biased》(视觉语言模型存在偏见) 这篇论文发表于今年5月29号,至今也才1个多月的时间,还蛮新的。 我花了 ...
X @Elon Musk
Elon Musk· 2025-07-10 18:57
Pretty good, but room for improvementAlex Prompter (@alex_prompter):I tested Grok 4 and ChatGPT-o3 with same critical prompts.The results will blow your mind.Grok 4 Vs. ChatGPT-o3(Video demos are included) https://t.co/TL3kgPVNbh ...
马斯克新发布的“全球最强模型”含金量如何?
第一财经· 2025-07-10 15:07
Core Viewpoint - The article discusses the launch of Grok 4, an AI model developed by xAI, which is claimed to be the most powerful AI model globally, surpassing existing top models in various benchmarks [1][2]. Group 1: Grok 4 Performance - Grok 4 achieved a perfect score in the AIME25 mathematics competition and scored 26.9% in the "Human Last Exam" (HLE), which consists of 2,500 expert-level questions across multiple disciplines [1]. - The AI analysis index for Grok 4 reached 73, making it the top-ranked model, ahead of OpenAI's o3 and Google's Gemini 2.5 Pro, both at 70 [2]. - Grok 4 set a historical high score of 24% in the HLE, surpassing the previous record of 21% held by Google's Gemini 2.5 Pro [5]. Group 2: Development and Training - Grok 4's training volume is 100 times that of Grok 2, with over 10 times the computational power invested in the reinforcement learning phase compared to other models [5]. - The subscription fee for Grok 4 is set at $30 per month, while a more advanced version, Grok 4 Heavy, costs $300 per month [5]. Group 3: Financial Aspects and Funding - xAI has raised a total of $10 billion in its latest funding round, which includes $5 billion in debt and $5 billion in equity, bringing its total funding since 2024 to $22 billion [10]. - Despite the substantial funding, xAI faces high operational costs, reportedly spending $1 billion per month, with only $4 billion in cash remaining as of March 2025 [11]. - xAI's projected revenue for 2025 is $5 billion, significantly lower than OpenAI's expected $12.7 billion, indicating a lag in commercial progress [11]. Group 4: Future Outlook - xAI aims to leverage the vast data from X to train its models, potentially avoiding high data costs, with a goal to achieve profitability by 2027 [12]. - Upcoming releases include a programming model in August, a multi-agent model in September, and a video generation model in October, although previous delays raise questions about these timelines [12].