Workflow
量子位
icon
Search documents
GPT-5通过“哥德尔测试”!独创性解决博士生都得花几天时间的开放数学问题
量子位· 2025-09-25 13:00
henry 发自 凹非寺 量子位 | 公众号 QbitAI GPT-5,你这家伙! 究竟还有什么事是我不知道的? 在一篇最新论文中,研究人员让它挑战了5个尚未解决的优化猜想。 结果它居然解出了其中3个! 更令人吃惊的是,其中有一道题,它甚至给出了与研究者预期完全不同的、同样有效的证明方案。 它可不是"笨蛋"研究生,而是能展现出独创性的"聪明"博士生。 前微软研究副总裁、现OpenAI科学家Sebastien Bubeck表示: 和国际数学奥林匹克(IMO)那些为"人类天才高中生"准备的题目不同,这次的测试题需要博士水平的研究者花上几天才能完成。 在论文里,研究者们还特意"挑衅" 陶哲轩 对大语言模型数学能力的印象—— 这意味着GPT-5能够解决一些真正的开放性数学问题。 接下来,就让我们看看,这位AI数学天才是怎么炼成的。 "哥德尔"测试 如上所述,GPT-5这次挑战的并不是奥赛题,而是高等数学里的简单猜想。 求解这类问题不仅需要算术能力,还需要相当强的数学背景和逻辑推理能力。 研究人员把他们的测试称为: 哥德尔测试 。 哥德尔测试里的问题需要人自己动脑、经过训练才能解决,而且在现有文献中找不到现成答案。 ( ...
攻克结构化长文档检索难题!新框架让模型告别“结构性失明”
量子位· 2025-09-25 11:42
SEAL 团队 投稿 量子位 | 公众号 QbitAI AI读不懂HTML、Markdown长文档的标题和结构,找信息总踩坑? 解决方案来了—— SEAL 全新对比学习框架通过 带结构感知 + 元素对齐 ,让模型更懂长文。 | Method | HitRate@1 | HitRate@3 | HitRate@5 | MRR @ 5 | MRR@10 | NDCG@5 | NDCG@10 | | --- | --- | --- | --- | --- | --- | --- | --- | | mE5-large | 54.11 | 79.62 | 85.86 | 67.39 | 68.06 | 72.18 | 74.11 | | + Chunk | 56.85 | 82.94 | 88.79 | 70.12 | 71.45 | 74.78 | 77.42 | | + MCLS | 57.74 | 84.12 | 89.56 | 71.08 | 72.41 | 75.76 | 78.44 | | + SANTA | 55.79 | 81.76 | 88.02 | 69.01 | 70.49 | 73.79 | ...
你的AI助手更万能了!天禧合作字节扣子,解锁无限新功能
量子位· 2025-09-25 11:42
允中 发自 凹非寺 量子位 | 公众号 QbitAI 天禧个人超级智能体 和 字节跳动扣子官宣生态合作! 天禧超级智能体是联想集团推出的 新一代AI助手平台 ,是"一体多端"策略中的"一体", 即智能终端设备的"AI大脑",旨在成为人机交互的第 一入口 。它集成了语音、文本、视觉等多种交互能力、全时空记忆和自主规划执行三大超级能力,并提供AI操控、AI搜索、AI翻译、AI笔记 和AI服务五大黄金功能,通过端云混合部署架构,为用户提供跨设备、跨生态的超级智能体验。 继ChatExcel"对话做表"功能成为现象级亮点后,天禧选择合作开发者平台扣子,不仅是其AI功能上的扩容,更标志着联想AI发展已走向平台 化、生态化的整合阶段,AI生态赋能的核心属性得到全面加强。 天然流量入口+高效开发 据悉,扣子平台具备开发成本低、功能完善的核心优势,支持用户通过可视化界面构建AI应用。此次合作旨在解决AI开发者 "开发易,分发 难" 的核心痛点,开辟了一条从AI创意到商用的"高速公路"。 开发者可通过扣子平台 高效开发 个性化智能体,再通过天禧平台 天然流量入口与设备覆盖优势 ,将这些智能体无缝推送到搭载天禧的AI终 端上。双 ...
机器狗腿被锯了也能继续走!最新机器人大脑来自320亿估值独角兽
量子位· 2025-09-25 11:42
Core Viewpoint - Skild AI has developed a revolutionary AI brain, Skild Brain, capable of controlling various robots in unpredictable situations, achieving a valuation of $4.5 billion as of June 2023 [4][29]. Group 1: Skild Brain Capabilities - Skild Brain can adapt to different robot bodies and situations, allowing it to control robots even when they face unexpected challenges like motor jams or limb loss [7][12]. - The AI brain was trained in a virtual environment simulating 100,000 different robot postures over a simulated time of 1,000 years, leading to emergent control capabilities [4][12]. - It can learn from failures and improve its performance over time, demonstrating a memory capacity over 100 times longer than typical robot control strategies [17][24]. Group 2: Testing and Adaptation - Skild Brain successfully adapted to various scenarios, such as simulating limb loss and adjusting walking patterns accordingly, while traditional controllers failed [19][20]. - The AI demonstrated the ability to switch control strategies based on the robot's physical state, such as transitioning from a wheeled to a bipedal walking pattern when necessary [21][24]. - Initial instability in new configurations, like walking on stilts, was quickly overcome as the AI adjusted its movements to maintain balance [22][24]. Group 3: Company Background and Funding - Skild AI was founded in 2023, focusing on developing adaptive AI for different hardware and tasks, with a small team of approximately six employees [25]. - The company has raised a total of $414 million across seed, Series A, and Series B funding rounds, with notable investors including SoftBank, Nvidia, and Sequoia Capital [29]. - The valuation of Skild AI increased from $1.5 billion after Series A funding in July 2024 to $4.5 billion following a $100 million funding round in June 2023 [29].
“iFold”,苹果AI新成果
量子位· 2025-09-25 11:42
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 苹果这波跨界看来玩的是 化繁为简 。 起猛了,苹果怎么搞起跨界AI模型了?? 发布了一个基于 流匹配 的蛋白质折叠模型 SimpleFold ,被网友戏称为"iFold"。 SimpleFold没有花里胡哨的专属模块设计,就靠通用的Transformer模块,搭配流匹配生成范式,3B参数版本追平了该领域顶流模型谷歌 AlphaFold2的性能。 MacBook Pro跑起来不费力 首先来说说蛋白质折叠是怎么一回事。 核心是将"一串"氨基酸折成特定的3D形状,这样蛋白质才能发挥作用。 而蛋白质折叠模型就是从氨基酸的一级序列预测它的三维空间构象。 之前最厉害的模型,比如谷歌的AlphaFold2,虽然实现了突破,但用了很多复杂的专属设计。 比如要分析大量相似蛋白质的序列,依赖多序列对比(MSA)构建进化信息、靠三角注意力优化空间约束、推理时需调用超算级算力,普通 实验室不太能用得起。 但这款"iFold"用通用AI框架解决了这个问题。 SimpleFold在架构上采用多层Transformer编码器作为核心骨干,仅通过自适应层归一化适配蛋白质序列特征,相当于用 ...
京东AI一揽子开源!超多核心项目全开源,GitHub万star项目也有新进展了
量子位· 2025-09-25 11:42
Core Insights - The article highlights the advancements of domestic AI agents, particularly JoyAgent, which has achieved significant accuracy improvements in global evaluations, positioning itself among the top tier of AI agents worldwide [1][10][43]. Group 1: JoyAgent and Its Features - JoyAgent is the first fully open-source enterprise-level AI agent, allowing businesses to deploy it without additional development [7][10]. - The recent upgrade to JoyAgent 3.0 includes the open-sourcing of DataAgent and DCP data governance modules, addressing data utilization challenges in enterprises [11][13]. - JoyAgent 3.0 has achieved a validation accuracy of 77% and a test accuracy of over 67% in the GAIA evaluation, reflecting its robust performance [1][43]. Group 2: Open Source Initiatives - JD Cloud has systematically open-sourced its AI capabilities, including the medical model 京医千询2.0, which integrates trustworthy reasoning and multimodal capabilities [5][53]. - The OxyGent multi-agent framework allows developers to assemble AI teams using a simple Python interface, promoting flexibility and ease of use [46][48]. - The open-source strategy aims to create a comprehensive ecosystem that addresses industry pain points and facilitates the practical application of AI technologies [72][76]. Group 3: Industry Impact and Future Directions - JD's open-source efforts are designed to lower the barriers for enterprises to adopt AI technologies, transforming complex business scenarios into accessible solutions [73][76]. - The initiative encourages collaboration among developers, fostering a community that can innovate and create new applications based on proven technologies [73][75]. - By establishing a unified technical standard through projects like the DGP data governance protocol, JD aims to enhance interoperability and drive industry-wide advancements [75][76].
中国团队重新定义“星际之门”!全球首个太空计算星座已实现常态化商用
量子位· 2025-09-25 11:42
Core Insights - The article discusses the successful deployment of traffic recognition models on satellites, marking a significant advancement in the use of space-based AI for urban traffic analysis [4][15][22] - This achievement indicates the transition of space computing from experimental to operational, establishing a new paradigm for AI deployment in the industry [15][23] Group 1: Space-Based AI Capabilities - The complete process of image collection, model inference, and structured result transmission was executed in orbit, demonstrating the feasibility of on-satellite computation [2][10] - The task was supported by the space computing constellation launched by Guoxing Aerospace, which is now in regular commercial operation [5][6] - The system can support models with billions of parameters and has full-process capabilities including image acquisition, model inference, task scheduling, and communication [12][13] Group 2: Commercialization and Operationalization - The successful execution of the task by the team from Jiadu Technology signifies the first commercial use of the global space computing constellation [9][15] - Guoxing Aerospace has become the first company globally to provide regular satellite-level space computing services, marking a milestone in the AI field [15][22] - The "Star Computing" plan aims to establish a green, low-carbon space computing infrastructure with a total computing power exceeding 100,000 PetaFLOPS [12] Group 3: Implications for AI Deployment - The ability to run AI models in orbit allows for a new dimension in data processing, reducing response times significantly by processing data at the source [21][22] - This shift not only changes the physical location of computation but also adjusts the system architecture, enabling faster decision-making for industries requiring rapid assessments [20][22] - The initiative redefines space as an integral part of intelligent systems, transforming it from merely a data source to an active processing environment [19][23]
不止剪辑!剪映的未来是一站式AI视频平台
量子位· 2025-09-25 02:21
Core Viewpoint - The article emphasizes that Jianying (剪映) aims to transform from a simple video editing tool to a comprehensive AI creative partner, focusing on an all-in-one solution for video creation [2][4][30]. Group 1: AI Integration and Functionality - Jianying has upgraded its AI text-to-video capabilities, enhancing efficiency and storytelling coherence through deep integration with models like Doubao and DeepSeek [10][12]. - The platform now supports a wide range of materials, including raw images and videos, and offers a one-click AI rough cut feature that simplifies initial editing [15][16]. - New video transition features allow for seamless transitions between frames, creating a cinematic effect [18][19]. Group 2: Comprehensive Creative Process - Jianying's AI capabilities cover the entire creative process from inspiration and material generation to precise editing and output optimization [7][28]. - The introduction of AI music features, including lyric modification while retaining original melodies, enhances the audio editing experience [22]. - The platform has expanded its image creation capabilities, allowing for batch creative generation for cover and poster designs [24][25]. Group 3: Future Vision and Market Positioning - Jianying's slogan "All in AI, All in One" reflects its ambition to redefine video editing by integrating all necessary functions into a single platform [29][30]. - The company aims to become a co-creative partner that understands and anticipates creators' needs, thus streamlining the creative process [35][37]. - The focus on eliminating redundant tasks allows creators to concentrate on their imaginative processes, positioning Jianying as a leader in the AI creative tool market [38].
你的最快安卓芯片发布了!全面为Agent铺路
量子位· 2025-09-25 02:21
Core Insights - Qualcomm has launched the world's fastest Windows PC processor and mobile SoC processor, focusing on AI capabilities for both PCs and smartphones [1][5][27] - The Snapdragon X2 Elite Extreme is designed for high-end PCs, enabling advanced AI experiences and complex data analysis [15][24] - The Snapdragon 8 series mobile platform aims to support personalized AI assistants through continuous learning and real-time perception [1][27] Group 1: AI and Computing Architecture - AI is being positioned as the new user interface, shifting from smartphone-centric to agent-centric computing [6] - A new computing architecture is required to support this transition, with enhanced edge data relevance and mixed model development [6] - 6G technology is expected to bridge the cloud, edge, and terminal connections [6] Group 2: Snapdragon X2 Elite Series - The Snapdragon X2 Elite series utilizes a 3nm process and third-generation Oryon architecture, featuring 12 Prime cores and 6 Performance cores [7] - Compared to the previous generation, CPU efficiency has improved by 31%, and power consumption has decreased by 43% [10] - Peak performance metrics show a 39% increase in single-core CPU performance, 50% in multi-core, 2.3 times in GPU, and 78% in NPU [13] Group 3: Performance Comparisons - The Snapdragon X2 Elite Extreme achieves a 75% performance increase at the same power consumption compared to competitors, which would require an additional 222% energy to match [16][17] - In single-core performance, it leads by 44%, with competitors needing 144% more energy to catch up [20] - In GPU performance, it is 52% faster at the same power consumption, with competitors needing 92% more energy to achieve similar performance [22] Group 4: Snapdragon 8 Gen 2 - The fifth-generation Snapdragon 8 Gen 2 also employs a 3nm process and features a third-generation Oryon architecture [25] - It shows a 20% increase in single-core performance and a 17% increase in multi-core performance, becoming the fastest mobile CPU [27] - The upgraded Adreno GPU offers a 23% improvement in gaming performance and a 25% increase in ray tracing performance [28] Group 5: Power Efficiency and Features - Overall power consumption has decreased by 16%, with CPU power down by 35% and GPU by 20% [33] - The upgraded ISP supports advanced video encoding and AI enhancements for video processing [33] - The integrated X85 5G Modem-RF system enhances AI-driven WiFi capabilities, reducing gaming latency by 50% [34]
LeCun团队开源首个代码世界模型:能生成代码还能自测自修!传统编程模型一夜成古典
量子位· 2025-09-25 01:06
Core Insights - Meta FAIR has launched the Code World Model (CWM), a 32 billion parameter language model designed for code generation and reasoning, marking the first systematic introduction of world modeling into code generation [1][2][4]. Group 1: Model Capabilities - CWM distinguishes itself by not only generating code but also understanding its execution, simulating variable state changes and environmental feedback, thus enhancing overall code comprehension and debugging capabilities [2][9]. - The model demonstrates performance close to GPT-4, achieving a score of 65.8% on the SWE-bench Verified benchmark, outperforming all open-source models of similar scale [4][31]. - CWM introduces the concept of code world modeling during training, allowing the model to learn how program states evolve during execution, transitioning from static text understanding to dynamic execution comprehension [15][26]. Group 2: Enhanced Features - CWM can simulate code execution line by line, predicting how each line affects variable states and identifying potential errors during execution, paving the way for a "neural debugger" [18][19]. - The model is capable of self-testing and self-correcting, automatically generating test cases after code generation and attempting multiple modification paths to fix errors, mimicking the human programming cycle of writing, testing, and revising [22][24]. - CWM exhibits reasoning and planning abilities, enabling it to analyze problem descriptions, plan function structures, and generate and validate code through iterative logical reasoning [25]. Group 3: Model Architecture and Training - CWM employs a 64-layer decoder-only Transformer architecture with a parameter count of 32 billion and supports a long context input of 131,072 tokens, significantly enhancing its ability to handle complex projects and multi-file code [26][27]. - The training process consists of three phases: pre-training with 8 trillion tokens, mid-training with 5 trillion tokens focused on world modeling, and a final stage involving 100 billion tokens for supervised fine-tuning and 172 billion tokens for multi-task reinforcement learning [38][47]. - The model's training utilized advanced techniques such as FlashAttention-3 and distributed environments, ensuring robust performance across various tasks [50][51]. Group 4: Future Directions and Limitations - Currently, CWM's world modeling data is limited to Python, with plans to explore multi-language support in the future, aiming to create a universal framework for automated programming assistance [53][54]. - CWM is primarily intended for research purposes and is not designed for dialogue tasks or chatbot applications, emphasizing its focus on code understanding and complex reasoning research [55][56].