Workflow
量子位
icon
Search documents
OpenAI也缺卡!僧多粥少,自曝内部抢卡抢到发疯
量子位· 2025-10-20 10:29
Core Viewpoint - OpenAI is facing a significant scarcity of computing power, which is critical for innovation in the AI field [1][2][4] Resource Allocation Mechanism - OpenAI has a structured yet challenging resource allocation mechanism for its limited computing resources [8] - Resources are divided between research and application sides, with major decisions made by the executive team [9][10] - Within the research domain, allocation is determined by the chief scientist and research director [12] - A team led by Kevin Park manages the reallocation of idle GPUs to meet the demands of various projects [14][15] Industry Implications - The internal competition for computing resources at OpenAI reflects the broader dynamics of the AI industry, where computing power directly influences AI capabilities [16][17] - The founder of AI chip company Groq emphasized that controlling computing power equates to controlling AI [18] - OpenAI's computing power expenditure reached $7 billion last year, and the company is now building its own data centers, achieving nearly a trillion in computing transactions [19][20] Competitive Landscape - The competition for computing resources is not only internal but also extends to the entire AI computing market [20] - Meta's CEO, Mark Zuckerberg, highlighted the importance of computing resources as a competitive advantage for researchers [22] - The future of AI development places computing power at the forefront of strategic importance [23]
GPT-5≈o3.1!OpenAI首次详解思考机制:RL+预训练才是AGI正道
量子位· 2025-10-20 03:46
Core Insights - The article discusses the evolution of OpenAI's models, particularly focusing on GPT-5 as an iteration of the o3 model, suggesting that it represents a significant advancement in AI capabilities [1][4][23]. Model Evolution - Jerry Tworek, OpenAI's VP of Research, views GPT-5 as an iteration of o3, emphasizing the need for a model that can think longer and interact autonomously with multiple systems [4][23]. - The transition from o1 to o3 marked a structural change in AI development, with o3 being the first truly useful model capable of utilizing tools and contextual information effectively [19][20]. Reasoning Process - The reasoning process of models like GPT-5 is likened to human thought, involving calculations, information retrieval, and self-learning [11]. - The concept of "thinking chains" has become prominent since the release of the o1 model, allowing models to articulate their reasoning in human language [12]. - Longer reasoning times generally yield better results, but user feedback indicates a preference for quicker responses, leading OpenAI to offer models with varying reasoning times [13][14]. Internal Structure and Research - OpenAI's internal structure combines top-down and bottom-up approaches, focusing on a few core projects while allowing researchers freedom within those projects [31][33]. - The company has rapidly advanced from o1 to GPT-5 in just one year due to its efficient operational structure and talented workforce [33]. Reinforcement Learning (RL) - Reinforcement learning is crucial for OpenAI's models, combining pre-training with RL to create effective AI systems [36][57]. - Jerry explains RL as a method of training models through rewards and penalties, similar to training a dog [37][38]. - The introduction of Deep RL by DeepMind has significantly advanced the field, leading to the development of meaningful intelligent agents [39]. Future Directions - Jerry believes that the future of AI lies in developing agents capable of independent thought for complex tasks, with a focus on aligning model behavior with human values [53][54]. - The path to AGI (Artificial General Intelligence) will require both pre-training and RL, with the addition of new components over time [56][58].
AI助手Cici悄然霸榜海外,又是字节
量子位· 2025-10-20 03:46
Core Viewpoint - The article discusses the emergence of a new AI assistant application named Cici, developed by ByteDance, which has rapidly gained popularity in various countries, indicating a competitive landscape in the AI assistant market. Group 1: Cici's Rise and Features - Cici has achieved significant download growth, ranking as the top downloaded app in Mexico's Google Play Store and within the top 10 free apps in the UK Apple App Store [2] - The application utilizes technologies from ByteDance's other platforms, including image editing and code assistance tools, and incorporates OpenAI's GPT models and Google's Gemini for chat generation [8][9] - Cici's interface design is similar to that of Doubao, another ByteDance product, and it allows users to interact via text or voice, supporting image generation and analysis [10] Group 2: Competitive Landscape in AI Assistants - Doubao has maintained a dominant position in the domestic AI assistant market, with a cumulative download exceeding 100 million, while other competitors like Kimi, DeepSeek, and Tencent Yuanbao follow behind [16][22] - The top four AI assistant products, including Doubao, account for approximately 93% of the user base in the market, showcasing a significant "Matthew Effect" [17][24] - In terms of daily active users (DAU), Doubao leads with 33 million, followed by DeepSeek and Tencent Yuanbao with 25 million and 16 million respectively [23] Group 3: ByteDance's Global Strategy - The success of Cici reflects ByteDance's strategy to expand its AI capabilities globally, with a focus on specific markets such as the UK, Mexico, and Southeast Asia [12] - Despite Doubao's comprehensive lead in various dimensions, DeepSeek remains strong in the web-based AI assistant segment, indicating a competitive challenge for ByteDance [27]
1.58bit不输FP16!微软推出全新模型蒸馏框架,作者全是华人
量子位· 2025-10-20 03:46
Core Insights - Microsoft has introduced a new distillation framework called BitNet Distillation (BitDistill), which achieves model quantization with minimal performance loss while reducing memory consumption to 1/10 of FP16 [1][6][22]. Group 1: Framework Overview - BitDistill has been validated on models with 4 billion parameters and below, such as Qwen and Gemma, and is theoretically applicable to other Transformer models [2]. - The framework consists of three interconnected stages: Model Refinement, Continue Pre-training, and Distillation-based Fine-tuning [8]. Group 2: Model Structure Optimization - The primary goal of model structure optimization is to support the training of 1.58-bit models and address optimization instability issues common in low-precision training [9]. - BitDistill introduces a normalization module called SubLN in each Transformer layer to enhance training stability by controlling the variance of activations [10][12]. Group 3: Continue Pre-training - A lightweight continue pre-training phase is designed to help the model gradually adapt its weights from full precision to a distribution suitable for 1.58-bit representation [14][15]. - This phase allows the model to "learn how to be quantized," preventing information loss during the fine-tuning stage [16]. Group 4: Distillation-based Fine-tuning - BitDistill employs a dual distillation mechanism—Logits distillation and multi-head attention distillation—to recover the performance of the quantized model [18]. - Logits distillation uses the probability distribution from the full precision model as "soft labels" to guide the quantized model [19]. Group 5: Performance Evaluation - BitDistill demonstrates performance nearly equivalent to full precision models across various downstream tasks while significantly reducing memory usage and improving inference speed [22]. - In text classification tasks, the 1.58-bit model achieved accuracy levels comparable to full precision fine-tuned models, outperforming directly quantized models [23][24]. - In text summarization tasks, BitDistill's generated text quality was nearly identical to that of full precision models, with slight improvements in BLEU scores [25][27]. Group 6: Generalizability and Compatibility - BitDistill has been successfully applied to other pre-trained models like Gemma and Qwen2.5, showing high fidelity in performance recovery [28]. - The framework is compatible with various quantization strategies, proving its utility as an independent distillation solution applicable to multiple post-quantization optimization scenarios [28].
AI点外卖哪家强,美团LongCat团队做了个全面评测
量子位· 2025-10-20 01:16
美团LongCat团队投稿 发自 凹非寺 量子位 | 公众号 QbitAI 美团LongCat团队发布了当前高度贴近真实生活场景、面向复杂问题的大模型智能体评测基准—— VitaBench (Versatile Interactive Tasks Benchmark)。 VitaBench以 外卖点餐、餐厅就餐、旅游出行 三大高频生活场景为典型载体,构建了一个包含 66个工具 的交互式评测环境,并设计了跨场 景综合任务。 例如,在旅行规划任务中,要求智能体通过推理、调用工具与用户交互,完整完成从购票到预订餐厅的全流程。 团队首次从深度推理、工具使用与用户交互三大维度对智能体任务进行量化拆解,从而实现对复杂问题的可控构建。 评测结果显示,即便是当前先进的推理模型,在主榜(复杂跨场景任务)上的成功率也仅约 30% ,揭示了现有智能体与真实生活应用需求之 间的显著差距。 目前,VitaBench已全面开源,旨在为推动智能体在真实生活场景中的研发与落地提供重要基础设施。 研究背景:智能体评测与现实应用间存在巨大鸿沟 随着大语言模型在复杂推理与工具调用能力上的快速进步,基于LLM的智能体在真实生活场景中的应用日益广泛。 ...
OpenAI以为GPT-5搞出了数学大新闻,结果…哈萨比斯都觉得尴尬
量子位· 2025-10-20 01:16
Core Viewpoint - OpenAI's announcement of GPT-5 solving several Erdős mathematical problems was later revealed to be an exaggeration, as the AI merely retrieved existing solutions rather than independently solving the problems [5][13][14]. Group 1: Announcement and Initial Reactions - OpenAI researcher Mark Sellke claimed that GPT-5 had made significant breakthroughs in mathematics by solving 10 previously unsolved Erdős problems [5][7]. - The announcement led to widespread excitement, with many mistakenly believing that GPT-5 had independently cracked long-standing mathematical challenges [9]. - DeepMind CEO Demis Hassabis and Meta's Yann LeCun publicly criticized the claims, highlighting the embarrassment surrounding the situation [3][4][10][16]. Group 2: Clarification and Reality Check - Thomas Bloom, the creator of the website referenced by OpenAI, clarified that GPT-5 did not solve the problems but rather found existing solutions through online searches [12][13]. - The "unsolved" status on the website was due to Bloom's lack of awareness of the existing solutions, not because they had not been solved by the mathematical community [13][14]. - Following the backlash, researcher Sebastien Bubeck deleted his earlier tweet and acknowledged the misunderstanding, emphasizing the difficulty of literature retrieval [15]. Group 3: GPT-5's Capabilities and Context - Despite the controversy, GPT-5 has demonstrated notable mathematical abilities, such as solving complex problems and providing key proofs in a short time [18][19][22]. - Previous successes of GPT-5 in mathematics contributed to the inflated expectations surrounding its capabilities [17][22]. - The incident reflects a growing desensitization to AI advancements, suggesting that without genuine breakthroughs, exaggerated claims may lead to significant misinterpretations [27].
开源对机器人的价值,远超大模型时代的想象丨唐文斌深度对谈抱抱脸创始人
量子位· 2025-10-20 01:16
Core Viewpoint - The article discusses the challenges in current robotics research, particularly the gap between simulation and real-world application, and introduces RoboChallenge.ai as a solution to create a standardized, open, and reproducible evaluation platform for robotics [1][40][50]. Group 1: Current Challenges in Robotics - Many models perform well in simulations but fail in real-world scenarios, highlighting a significant pain point in robotics research [1][40]. - There is currently no unified, open, and reproducible benchmark system to fairly compare different methods, strategies, and models in the robotics field [42]. Group 2: Introduction of RoboChallenge.ai - RoboChallenge.ai is launched as an open, standardized platform for evaluating robotics models in real physical environments, allowing researchers to remotely test their models on real robots [5][50]. - The platform enables global researchers to submit models and conduct experiments remotely, bridging the gap between simulation and reality [50][52]. Group 3: Importance of Open Source in Robotics - Open source is crucial for advancements in robotics, as it allows for collaboration and the sharing of models, which can be adapted for various robots [12][21]. - The article emphasizes that open source models are essential for localizing operations within robots, enhancing safety and functionality [22][25]. Group 4: Evaluation Mechanisms and Community Involvement - The need for an independent evaluation mechanism in robotics is highlighted, as current assessments often lack fairness and reproducibility [34][36]. - The article discusses the potential for community involvement in data collection and model testing, which can enhance the diversity and robustness of robotic strategies [61][66]. Group 5: Future Directions and Expectations - The article anticipates that in three to five years, embodied intelligence research will evolve to enable robots to perform longer and more complex tasks [77]. - The goal of RoboChallenge.ai is to create a fair and open platform for evaluating various robotic models, contributing to the overall advancement of the field [76][78].
经济学诺奖得主的富二代人生:香奈儿老佛爷帮他写作业,AI时代反对向机器人征税
量子位· 2025-10-19 08:10
Core Viewpoint - The 2025 Nobel Prize in Economic Sciences was awarded to three scholars who highlighted the critical role of technological and scientific innovation in driving economic growth, emphasizing the importance of continuous investment in basic research for long-term economic advancement [2][5][3]. Group 1: Nobel Prize Winners and Their Contributions - The prize was shared equally between Joel Mokyr, Philippe Aghion, and Peter Howitt, who revealed how technology and scientific innovation interact with market competition to foster economic growth [5][7]. - Joel Mokyr's research demonstrated the self-reinforcing relationship between scientific breakthroughs and technological applications, which is essential for sustained economic growth [7][11]. - Aghion and Howitt developed a pioneering mathematical model in the 1990s that explains how firms improve production processes and introduce higher-quality products through R&D investments, ultimately replacing established market leaders [8][30]. Group 2: Historical Context and Economic Growth - Historically, economic growth was sporadic, with little change in living standards until the Industrial Revolution in the 18th century, which initiated a self-reinforcing cycle of innovation and economic growth [21][22]. - Over the past two centuries, many countries have maintained an average economic growth rate of about 2%, which, due to compounding effects, results in significant income increases over decades [23][25]. - Joseph Schumpeter's concept of "creative destruction" explains that economic progress is driven by innovation that disrupts existing industries and creates new growth opportunities [26][28]. Group 3: Mechanisms of Innovation and Economic Dynamics - Mokyr identified two types of "useful knowledge" that drive innovation: propositional knowledge (understanding natural laws) and normative knowledge (practical guidelines) [30][29]. - Aghion and Howitt's model illustrates that the continuous replacement of old firms with new ones is a key engine of economic growth, as new companies strive to innovate and outperform established players [34][36]. - The rise of AI is currently instigating another wave of creative destruction, reinforcing the relevance of the Nobel laureates' research [40][41]. Group 4: Implications of Innovation - Innovation leads to the emergence of new winners while potentially sidelining others, raising concerns about job displacement and inequality [41][42]. - A robust policy framework is necessary to manage the effects of innovation and prevent market failures, ensuring that the mechanisms behind creative destruction are maintained [43][44].
LSTM之父向何恺明开炮:我学生才是残差学习奠基人
量子位· 2025-10-19 06:10
Core Viewpoint - The article discusses the historical context and contributions of Sepp Hochreiter and Jürgen Schmidhuber in the development of residual learning and its impact on deep learning, emphasizing that the concept of residual connections was introduced by Hochreiter in 1991, long before its popularization in ResNet [3][12][26]. Group 1: Historical Contributions - Sepp Hochreiter systematically analyzed the vanishing gradient problem in his 1991 doctoral thesis and proposed the use of recurrent residual connections to address this issue [3][12]. - The core idea of recurrent residual connections involves a self-connecting neuron with a fixed weight of 1.0, allowing the error signal to remain constant during backpropagation [13][14]. - The introduction of LSTM in 1997 by Hochreiter and Schmidhuber built upon this foundational concept, enabling effective long-term dependency learning in tasks such as speech and language processing [18][19]. Group 2: Evolution of Residual Learning - The Highway network, introduced in 2015, successfully trained deep feedforward networks with hundreds of layers by incorporating the gated residual concept from LSTM [23]. - ResNet, which gained significant attention in the same year, utilized residual connections to stabilize error propagation in deep networks, allowing for the training of networks with hundreds of layers [24][26]. - Both Highway networks and ResNet share similarities with the foundational principles established by Hochreiter in 1991, demonstrating the enduring relevance of his contributions to deep learning [26]. Group 3: Ongoing Debates and Recognition - Jürgen Schmidhuber has publicly claimed that various architectures, including AlexNet, VGG Net, GANs, and Transformers, were inspired by his lab's work, although these claims have not been universally accepted [28][31]. - The ongoing debate regarding the attribution of contributions in deep learning highlights the complexities of recognizing foundational work in a rapidly evolving field [10][32].
让模型“看视频写网页”,GPT-5仅得36.35分!上海AI Lab联合发布首个video2code基准
量子位· 2025-10-19 04:10
Core Insights - The article discusses the introduction of IWR-Bench, a new benchmark for evaluating the interactive webpage reconstruction capabilities of large vision-language models (LVLMs) by assessing their ability to generate code from user interaction videos rather than static screenshots [1][2]. Group 1: IWR-Bench Overview - IWR-Bench shifts the focus from static image-to-code tasks to dynamic video-to-code tasks, requiring models to interpret user interaction videos along with all necessary static resources [2][5]. - The benchmark includes 113 real-world website tasks and 1001 interaction actions, providing a comprehensive evaluation of models' capabilities in generating interactive web code [5][12]. - The evaluation framework employs an automated agent to simulate user interactions, assessing both functional correctness (Interactive Functionality Score, IFS) and visual fidelity (Visual Fidelity Score, VFS) [10][11]. Group 2: Model Performance - In testing 28 mainstream models, the best-performing model, GPT-5, achieved a total score of 36.35%, with an IFS of 24.39% and a VFS of 64.25%, indicating significant shortcomings in generating interactive logic [5][14][16]. - The results reveal that all models exhibit higher visual fidelity compared to functional correctness, highlighting a critical gap in their ability to generate event-driven logic [16]. - Specialized video understanding models performed poorly compared to general multimodal models, suggesting that the task's nature differs significantly from traditional video understanding tasks [20]. Group 3: Key Findings - The primary bottleneck identified is the functionality implementation, where models struggle to generate operational logic despite achieving high visual fidelity [16]. - The "thinking" versions of models showed some improvement, but the overall enhancement was limited, indicating that the foundational model capabilities remain crucial [17][19]. - IWR-Bench represents a significant step in advancing AI from understanding static webpages to comprehending dynamic interactions, emphasizing the ongoing challenges in this domain [20].