量子位

Search documents
大模型IMO25数学竞赛成绩公布了
量子位· 2025-07-18 06:16
Core Viewpoint - The article discusses the results of a mathematical model evaluation conducted by MathArena, highlighting that Gemini 2.5 Pro significantly outperformed its competitors in the IMO 2025 challenge, achieving over 30% higher total scores than the second-place model, o3, which was 89% lower than Gemini [1][2]. Group 1: Evaluation Process - The evaluation was organized by MathArena, selecting models based on their past performances in MathArena competitions, including Gemini 2.5 Pro, o3, o4-mini, Grok 4, and DeepSeek-R1 [4]. - A unified prompt template was used for all models to ensure fairness, aligning with the Open Proof Corpus evaluation [5]. - Each model was run with recommended hyperparameters and a maximum token limit of 64,000 [6]. Group 2: Scoring and Judging - Four experienced human judges with IMO-level mathematics expertise were hired to assess the models, with each problem scored out of 7 points [10][11]. - Each model generated 32 initial answers, from which they selected their best four for final scoring [8]. Group 3: Performance Insights - Many models scored between 3-4 points out of 7, a phenomenon less common in human testing, indicating a disparity in capabilities between humans and models [12]. - There was a notable reduction in models overly optimizing the final answer format, suggesting progress in handling open-ended mathematical reasoning tasks [13]. - Gemini showed improvement in avoiding the fabrication of non-existent "theorems" compared to previous evaluations [14]. Group 4: Problem-Solving Performance - The models faced challenges in geometry, with the second and sixth problems yielding the lowest scores, particularly the second problem where only Grok 4 scored 4% [26][27]. - The fourth problem saw most models using similar methods to humans but making logical errors, while the fifth problem identified correct strategies but failed to provide proofs [29].
Meta全新AI组织架构曝光,这范儿有点字节
量子位· 2025-07-18 06:16
编辑部 发自 纽凹非寺 量子位 | 公众号 QbitAI 就在Meta内部一系列组织调整后,全新的架构正在初步浮出水面。不过不看不知道,一看真是哪里见过…… 都知道小扎用人均上亿美元薪酬包组队"超级智能实验室",不过最新消息是内部围绕这个实验室,已经整合出了3400多人的新组织。这个新组 织的头号负责人是 Alexandr Wang (亚历山大·王) ,title是首席人工智能官 (CAIO) ,副手是前GitHub首席执行官Nat Friedman,主 要分管AI产品和应用。 扎克伯格哐哐哐挖人,现在算是大概清楚了。 这个调整之后,AI三巨头之一的图灵奖得主 Yann LeCun ,都转向97年MIT本科辍学的亚历山大汇报了。 不过这确实不是重点,重点是这3400多人都被如何重新分工。 据说总共有4组: 这这这,不就是扎克伯格这几年朝思暮想、咬牙切齿的字节跳动的AI架构吗? 吴永辉领导的Seed,搞最前沿的AGI研究。也有基础模型技术和架构。 然后产品团队在AI基台上应用和打造产品。 唯一不同的是Meta还有一个新团队搞Llama 5,因为亚历山大在开源和闭源这件事上正在动摇小扎,所以可能会搞出两条腿走路— ...
突破户外RGB-only SLAM尺度漂移难题,精确定位+高保真重建 | ICCV'25开源
量子位· 2025-07-18 06:16
S3PO-GS团队 投稿 量子位 | 公众号 QbitAI 户外SLAM的尺度漂移问题,终于有了新解法! 香港科技大学(广州) 的研究的最新成果: S3PO-GS ,一个专门针对户外单目SLAM的3D高斯框架,已被ICCV 2025接收。 项工作的亮点在于首次实现了RGB单目SLAM的全局尺度一致性。在Waymo、KITTI和DL3DV三大户外基准测试中,S3PO-GS不仅在新视角 合成任务中刷新了SOTA纪录,更是在DL3DV场景中将跟踪误差降低了77.3%。 这篇文章做了什么? 在自动驾驶、机器人导航及AR/VR等前沿领域,SLAM技术的鲁棒性直接影响系统性能。 当前基于3D高斯(3DGS)的SLAM方案虽在室内场景表现卓越,但在仅依赖RGB输入的无界户外环境中仍面临严峻挑战: 单目系统固有的深度先验缺失导致几何信息不足,而引入单目深度估计或端到端点云模型(如MASt3R)作为几何先验时,又因帧间尺度不一 致性引发系统级尺度漂移,该问题在复杂户外场景尤为突出。 针对这一双重瓶颈,香港科技大学(广州)研究团队提出创新框架 S3PO-GS ,首次实现RGB单目SLAM的全局尺度一致性。 该方案通过三大核心技术 ...
一年破千万美金,一款海外AI创意引擎爆发了
量子位· 2025-07-18 06:16
Core Viewpoint - Creati, an AI-driven creative engine, has rapidly gained traction in the advertising sector, amassing 10 million users and generating millions in annual revenue within just one year of its launch [5][6]. Group 1: AI Creative Engine - Creati focuses on automating the creative process in advertising, differentiating itself from competitors by leveraging influencer power for customized creative content [6][8]. - The platform allows businesses to transform popular influencer videos into tailored templates, significantly reducing the time and effort required to generate marketing materials [9][12]. - Creati's unique AI model enables the production of high-quality videos that rival traditional advertising efforts, attracting major brands like Shein and Cider [10][11]. Group 2: Market Disruption - The platform addresses the pain points of both influencers and small businesses by providing a stable income stream for influencers and simplifying the creative process for businesses [11][12]. - Creati's approach to content generation is designed specifically for e-commerce, recognizing the unique needs of online retailers compared to general video generation tools [18][20]. - The platform's ability to maintain consistency in product representation is a key advantage, particularly for e-commerce businesses [20]. Group 3: Data-Driven Innovation - Creati employs a data feedback loop to refine its AI creative model, allowing for continuous improvement based on user engagement metrics [21][22]. - The platform's ability to generate customized content based on brand characteristics and audience feedback enhances its effectiveness in driving marketing success [21][22]. - Creati's vision includes developing a creative agent that autonomously generates and optimizes advertising content, potentially revolutionizing the marketing landscape [24][25]. Group 4: Future Aspirations - The company aims to evolve into a comprehensive creative engine that can assist users in various aspects of content creation, beyond just advertising [29]. - Creati's long-term goal is to integrate advanced technologies, such as brain-computer interfaces, to further enhance its creative capabilities [29][30].
真热AI!米哈游5亿成立新公司
量子位· 2025-07-18 00:30
就在最近,米哈游全资成立了新公司:上海米哈游无定谷科技有限公司, 注册资本高达5亿 。 该公司经营范围不仅涵盖软件开发、动漫游戏开发,还延伸至人工智能应用软件等领域。 时令 发自 凹非寺 量子位 | 公众号 QbitAI 更早一些时候,蔡浩宇创业的AI游戏《Whispers from the Star》开放Steam试玩demo。 西方马斯克在造AI女友,东边 米哈游 却在造"无定谷"。 此外,官方也整大活,蹭马斯克热度,让Grok Ani体验游戏,AI女孩对话AI女孩。 5亿注册资本,米哈游从未这么大手笔 梳理来看,米哈游在AI领域布局已久。 成立AI相关公司的动作早已有之: 直到昨天,米哈游5亿成立无定谷科技公司,创下其在AI领域投资之最,足见其布局人工智能的雄心。 点开米哈游官网,首先映入眼帘的,是这样一句愿景: 这句话出自于米哈游CEO蔡浩宇。他曾经在分享中表示,米哈游的目标是"在未来10到30年内,能够做出像《黑客帝国》、《头号玩家》等电 影中所描绘的虚拟世界"。 为此,米哈游早在2018年开始涉足AI领域,成立了"逆熵研究部",拥有自研AI大模型Glossa。 "逆熵"团队的代表作为数字人鹿鸣,曾 ...
ChatGPT智能体正式发布,多个创业赛道昨夜无眠
量子位· 2025-07-18 00:30
白交 雷刚 发自 纽凹非寺 量子位 | 公众号 QbitAI 实用,太实用了!这才是OpenAI Agent该有的样子。 就在刚刚,OpenAI最新发布来了, ChatGPT Agent 正式对外亮相。 这是一个把 "想" 和 "干" 统一了的智能体,之前 深度研究 的思考和分析能力, Operator 的操作执行能力,在ChatGPT Agent实现了统 一。 而且ChatGPT Agent还可以接管你的整个电脑——这几乎就是全新的 操作系统 了。 能做什么? 工作场景 里,安排和改期会议、生成PPT、制定出差和外出议程、自动提交报销……几乎就是大厂高管才能配置的 助理 的核心工作。 生活场景 下,你个人的旅游行程规划设计、重大活动如婚礼晚宴安排……一些定期需要手动更新的认证证明……差不多也是董事长CEO们 个 人秘书 实现的能力。 但现在,ChatGPT Agent一夜之间人人都可拥有。OpenAI还专门配备了 专用模型 ,创造了全新的SOTA,刷新了模型能力新纪录。 之前,通用Agent们只敢自称"实习生",但OpenAI在自研底层模型能力的底气下,几乎就把"实习生"变成了"大秘书"。 之前一个创业赛道 ...
o1核心贡献者离职后首发声:AI是史上最强杠杆,超越人力、资本和代码
量子位· 2025-07-17 09:03
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI 又一位离职OpenAI的核心研究员发声! 刚刚被曝加入Meta的 Hyung Won Chung ,分享了他对AI未来的深刻思考:人工智能正在成为有史以来最强大的杠杆机制。 Hyung Won Chung和一同离开OpenAI 的Jason Wei 是长期搭档,他们的合作可以追溯到谷歌大脑时期,两人曾共同作为第一作者发表了 关于模型微调的重要论文《Scaling Instruction-Finetuned Language Models》。 Jason Wei曾称赞他: Hyung Won Chung识别新范式并完全舍弃任何沉没成本的能力给我留下了深刻的印象。2022年底,他意识到了强化学习的力量,并从 那时起就一直在宣扬它。 Hyung Won Chung在OpenAI期间,是o1、o1-preview和Deep Research等核心项目的贡献者。 AI成为个人能力的倍增器 Hyung Won Chung首先从一朵含苞待放的花讲起。 他指出,人类天生不擅长察觉那些以年为单位发生的缓慢变化,这种"缺陷"让我们可能 严重低估了AI带来的变革幅度 。 人工智 ...
Transformer危!谷歌MoR架构发布:内存减半推理速度还翻倍
量子位· 2025-07-17 09:03
鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 超越 Transformer ,谷歌推出全新底层架构—— Mixture-of-Recursions (MoR) ,注意不是MoE,它能推理速度提高2倍,而KV内存直接减半! 而且All in One, 首次 在单一框架中实现,用同一组参数处理不同任务的同时,进行动态分配计算资源。 就像给LLM开了个双层增强buff,模型性能和效率全都要。 谷歌DeepMind联合KAIST AI、Mila人团队通过 统一参数共享 、 自适应递归深度 和 高效KV缓存 ,在保持大模型性能的同时降低计算和内 存成本,形成新的效率最优解。 不少网友甚至将它形容为 Transformer Killer 。 更有甚者表示,该架构的出现或许能代表,潜在空间推理也许将会成为下一个LLM突破所在。 Transformer的出现虽然带来了优秀的少样本泛化和推理能力,但随之而来庞大的计算和内存需求还是让训练和部署成为难题。 目前相关优化方法主要是参数共享和自适应计算,但往往只能二选一,无法同时兼顾。 于是研究人员提出了递归混合模型 MoR ,可以在单一递归Transformer中同时融合两 ...
人类击败OpenAI守住编程冠军!10小时激战两次反超,AI最后关头功亏一篑
量子位· 2025-07-17 07:04
白交 发自 凹非寺 量子位 | 公众号 QbitAI 10小时激战!人类最后关头实现超越,获得编程总决赛冠军~ OpenAI 在大部分比赛中都排名第一,本以为就这样了。人类开始反超,结果还剩1小时20分钟的时候,OpenAI又重新领先。不过还是没有 坚持到最后。 | | Standings Exhibition with OpenAI | | | --- | --- | --- | | Rank | User | Score | | 1 | OpenAIAHC | 43542614363 | | 2 | Psyho | 42420277629 | | 3 | terry_u16 | 34248482621 | | 4 | nikaj | 33740582721 | | 5 | saharan | 31754963614 | OpenAI总裁Greg Brockman发来贺电,中间还夹带私货:OpenAI位居第二。 此时获得冠军的人类表示 要累死了 。 因为过去三天我估计只睡了10个小时,现在都快撑不住了。 而原本始终保持领先优势的OpenAI,最终屈居第二。 在刚刚落幕的AtCoder世界巡回总决赛上,12名 ...
Claude Code出逃的主创又回来了!Anthropic:过去俩月我收入暴涨5.5倍,别走
量子位· 2025-07-17 07:04
Core Viewpoint - The article discusses the rapid return of key personnel Boris Cherny and Cat Wu to Anthropic from Cursor, highlighting the competitive landscape in Silicon Valley and the implications for Anthropic's valuation and growth potential in the AI sector [1][6][7]. Group 1: Personnel Movements - Boris Cherny and Cat Wu, key figures at Claude Code, were initially recruited by Anysphere, the company behind Cursor, where they were set to develop "agent-like" functionalities [2][4][5]. - Just two weeks after their departure, both were lured back to Anthropic, indicating the company's strong position in retaining talent amidst fierce competition [6][7]. Group 2: Valuation and Financial Performance - Anthropic is reportedly in discussions for a new funding round with a target valuation of $100 billion, which would mark a significant increase from its previous valuation of $58 billion just four months prior [8][9][10]. - The company aims to improve its profitability metrics, with current gross margins from direct sales of AI models around 60%, moving towards a target of 70% [12][19]. Group 3: Revenue Growth and Market Strategy - Anthropic's revenue has seen a fourfold increase in the first half of the year, with annualized revenue exceeding $4 billion [20]. - The company is pursuing a "model-as-a-service + vertical solutions" strategy, offering tailored AI solutions across various industries, including finance, law, and healthcare [15][19]. Group 4: Product Development and User Engagement - The launch of Claude Code has significantly boosted user engagement, with a 300% increase in active users and a 5.5-fold revenue growth since the release of the Claude 4 series [21][26]. - Anthropic has introduced a comprehensive analytics dashboard for Claude Code, allowing enterprises to track their AI spending and usage metrics effectively [24][25]. Group 5: Investment and Future Prospects - Amazon is reportedly considering a new multi-billion dollar investment in Anthropic, potentially making it the largest shareholder, following a previous investment of $4 billion [28][31]. - This investment reflects a broader trend where companies are recognizing the long-term profitability potential of AI technologies beyond initial hype [32].