量子位
Search documents
OpenAI以为GPT-5搞出了数学大新闻,结果…哈萨比斯都觉得尴尬
量子位· 2025-10-20 01:16
Core Viewpoint - OpenAI's announcement of GPT-5 solving several Erdős mathematical problems was later revealed to be an exaggeration, as the AI merely retrieved existing solutions rather than independently solving the problems [5][13][14]. Group 1: Announcement and Initial Reactions - OpenAI researcher Mark Sellke claimed that GPT-5 had made significant breakthroughs in mathematics by solving 10 previously unsolved Erdős problems [5][7]. - The announcement led to widespread excitement, with many mistakenly believing that GPT-5 had independently cracked long-standing mathematical challenges [9]. - DeepMind CEO Demis Hassabis and Meta's Yann LeCun publicly criticized the claims, highlighting the embarrassment surrounding the situation [3][4][10][16]. Group 2: Clarification and Reality Check - Thomas Bloom, the creator of the website referenced by OpenAI, clarified that GPT-5 did not solve the problems but rather found existing solutions through online searches [12][13]. - The "unsolved" status on the website was due to Bloom's lack of awareness of the existing solutions, not because they had not been solved by the mathematical community [13][14]. - Following the backlash, researcher Sebastien Bubeck deleted his earlier tweet and acknowledged the misunderstanding, emphasizing the difficulty of literature retrieval [15]. Group 3: GPT-5's Capabilities and Context - Despite the controversy, GPT-5 has demonstrated notable mathematical abilities, such as solving complex problems and providing key proofs in a short time [18][19][22]. - Previous successes of GPT-5 in mathematics contributed to the inflated expectations surrounding its capabilities [17][22]. - The incident reflects a growing desensitization to AI advancements, suggesting that without genuine breakthroughs, exaggerated claims may lead to significant misinterpretations [27].
开源对机器人的价值,远超大模型时代的想象丨唐文斌深度对谈抱抱脸创始人
量子位· 2025-10-20 01:16
henry 发自 凹非寺 量子位 | 公众号 QbitAI "很多模型在模拟器里完美运行,但一到现实就彻底失灵。" 在最新一次线上对谈中,Dexmal联合创始人 唐文斌 与Hugging Face联合创始人 Thomas Wolf 指出了当前机器人研究的最大痛点。 唐文斌是旷视科技联合创始人兼CTO,原力灵机(Dexmal)CEO、清华大学"姚班"出身、首届"Yao Award"金牌得主。 针对当前痛点,他和团队联合Hugging Face推出了 RoboChallenge.ai ——一个 开放、统一、可复现的真实世界机器人评测平台。 RoboChallenge.ai首次让全球研究者在物理环境中远程测试模型,通过独创的 Remote Robot ,模型可留在本地,用户仅通过API就可以控 制真实机器人。 在这场对谈中,唐文斌和Thomas探讨了: 这是全球协作的产物,包括来自中国、美国、欧洲的团队。我认为,这正是推动重大进步的方式。 我也期待机器人领域会出现同样的情况——通过保持一个活跃的开源社区,我们能让更多团队理解当前的技术前沿,共同头脑风暴新的发展方 向。 接下来,我们一起来看。 Q:Hugging F ...
经济学诺奖得主的富二代人生:香奈儿老佛爷帮他写作业,AI时代反对向机器人征税
量子位· 2025-10-19 08:10
Core Viewpoint - The 2025 Nobel Prize in Economic Sciences was awarded to three scholars who highlighted the critical role of technological and scientific innovation in driving economic growth, emphasizing the importance of continuous investment in basic research for long-term economic advancement [2][5][3]. Group 1: Nobel Prize Winners and Their Contributions - The prize was shared equally between Joel Mokyr, Philippe Aghion, and Peter Howitt, who revealed how technology and scientific innovation interact with market competition to foster economic growth [5][7]. - Joel Mokyr's research demonstrated the self-reinforcing relationship between scientific breakthroughs and technological applications, which is essential for sustained economic growth [7][11]. - Aghion and Howitt developed a pioneering mathematical model in the 1990s that explains how firms improve production processes and introduce higher-quality products through R&D investments, ultimately replacing established market leaders [8][30]. Group 2: Historical Context and Economic Growth - Historically, economic growth was sporadic, with little change in living standards until the Industrial Revolution in the 18th century, which initiated a self-reinforcing cycle of innovation and economic growth [21][22]. - Over the past two centuries, many countries have maintained an average economic growth rate of about 2%, which, due to compounding effects, results in significant income increases over decades [23][25]. - Joseph Schumpeter's concept of "creative destruction" explains that economic progress is driven by innovation that disrupts existing industries and creates new growth opportunities [26][28]. Group 3: Mechanisms of Innovation and Economic Dynamics - Mokyr identified two types of "useful knowledge" that drive innovation: propositional knowledge (understanding natural laws) and normative knowledge (practical guidelines) [30][29]. - Aghion and Howitt's model illustrates that the continuous replacement of old firms with new ones is a key engine of economic growth, as new companies strive to innovate and outperform established players [34][36]. - The rise of AI is currently instigating another wave of creative destruction, reinforcing the relevance of the Nobel laureates' research [40][41]. Group 4: Implications of Innovation - Innovation leads to the emergence of new winners while potentially sidelining others, raising concerns about job displacement and inequality [41][42]. - A robust policy framework is necessary to manage the effects of innovation and prevent market failures, ensuring that the mechanisms behind creative destruction are maintained [43][44].
LSTM之父向何恺明开炮:我学生才是残差学习奠基人
量子位· 2025-10-19 06:10
Core Viewpoint - The article discusses the historical context and contributions of Sepp Hochreiter and Jürgen Schmidhuber in the development of residual learning and its impact on deep learning, emphasizing that the concept of residual connections was introduced by Hochreiter in 1991, long before its popularization in ResNet [3][12][26]. Group 1: Historical Contributions - Sepp Hochreiter systematically analyzed the vanishing gradient problem in his 1991 doctoral thesis and proposed the use of recurrent residual connections to address this issue [3][12]. - The core idea of recurrent residual connections involves a self-connecting neuron with a fixed weight of 1.0, allowing the error signal to remain constant during backpropagation [13][14]. - The introduction of LSTM in 1997 by Hochreiter and Schmidhuber built upon this foundational concept, enabling effective long-term dependency learning in tasks such as speech and language processing [18][19]. Group 2: Evolution of Residual Learning - The Highway network, introduced in 2015, successfully trained deep feedforward networks with hundreds of layers by incorporating the gated residual concept from LSTM [23]. - ResNet, which gained significant attention in the same year, utilized residual connections to stabilize error propagation in deep networks, allowing for the training of networks with hundreds of layers [24][26]. - Both Highway networks and ResNet share similarities with the foundational principles established by Hochreiter in 1991, demonstrating the enduring relevance of his contributions to deep learning [26]. Group 3: Ongoing Debates and Recognition - Jürgen Schmidhuber has publicly claimed that various architectures, including AlexNet, VGG Net, GANs, and Transformers, were inspired by his lab's work, although these claims have not been universally accepted [28][31]. - The ongoing debate regarding the attribution of contributions in deep learning highlights the complexities of recognizing foundational work in a rapidly evolving field [10][32].
让模型“看视频写网页”,GPT-5仅得36.35分!上海AI Lab联合发布首个video2code基准
量子位· 2025-10-19 04:10
Core Insights - The article discusses the introduction of IWR-Bench, a new benchmark for evaluating the interactive webpage reconstruction capabilities of large vision-language models (LVLMs) by assessing their ability to generate code from user interaction videos rather than static screenshots [1][2]. Group 1: IWR-Bench Overview - IWR-Bench shifts the focus from static image-to-code tasks to dynamic video-to-code tasks, requiring models to interpret user interaction videos along with all necessary static resources [2][5]. - The benchmark includes 113 real-world website tasks and 1001 interaction actions, providing a comprehensive evaluation of models' capabilities in generating interactive web code [5][12]. - The evaluation framework employs an automated agent to simulate user interactions, assessing both functional correctness (Interactive Functionality Score, IFS) and visual fidelity (Visual Fidelity Score, VFS) [10][11]. Group 2: Model Performance - In testing 28 mainstream models, the best-performing model, GPT-5, achieved a total score of 36.35%, with an IFS of 24.39% and a VFS of 64.25%, indicating significant shortcomings in generating interactive logic [5][14][16]. - The results reveal that all models exhibit higher visual fidelity compared to functional correctness, highlighting a critical gap in their ability to generate event-driven logic [16]. - Specialized video understanding models performed poorly compared to general multimodal models, suggesting that the task's nature differs significantly from traditional video understanding tasks [20]. Group 3: Key Findings - The primary bottleneck identified is the functionality implementation, where models struggle to generate operational logic despite achieving high visual fidelity [16]. - The "thinking" versions of models showed some improvement, but the overall enhancement was limited, indicating that the foundational model capabilities remain crucial [17][19]. - IWR-Bench represents a significant step in advancing AI from understanding static webpages to comprehending dynamic interactions, emphasizing the ongoing challenges in this domain [20].
薛定谔亲外孙创业量子计算,老黄早早就成了股东
量子位· 2025-10-19 04:10
Jay 发自 凹非寺 量子位 | 公众号 QbitAI 什么样的量子计算创业公司,一次能获得10亿美元的巨额融资? 而且就连年初还在给量子计算泼冷水的英伟达创始人黄仁勋,转头就成为了这家公司投资股东。 或许是因为这家公司的BP里,有一页是 《我的爷爷薛定谔》 (doge)。 是的,这家新兴的量子计算独角兽,正是由量子物理先驱 薛定谔 的 亲外孙 参与创办。 量子计算创业赛道最大单轮融资 没想到吧,薛定谔的外孙也在创业,做的还是量子计算? 这家量子计算公司名叫 PsiQuantum 。 要知道,希腊字母Ψ (Psi) 通常表示波函数,这是薛定谔方程的核心变量。 所以懂薛定谔的人,都知道这个公司名字门道有多少(手动狗头)。 在量子力学领域,薛定谔的主要贡献是提出了薛定谔方程,这是量子力学的基本方程之一。 这一领域如此令人难以捉摸的领域,以至于最早提出"可以用量子来计算"的传奇物理学家费曼,都曾打趣道: 如果你认为自己理解量子力学,那你就根本不懂量子力学。 然而在硅谷,成立还不到十年的PsiQuantum,却似乎正在把这种不确定性逐步坍缩为现实。 这家公司一度低调,但就在最近,PsiQuantum拿下了10亿美元的 ...
量子位实习招聘|AI学术编辑实习生,线下远程均可
量子位· 2025-10-19 04:10
Core Viewpoint - The article emphasizes the rapid updates in the AI academic field and the recruitment of an editorial intern to assist in managing the latest AI research papers and findings [1][2]. Group 1: Company Overview - Quantum Bit (量子位) has over 2.3 million subscribers on WeChat and more than 7 million users across the internet, with an average daily readership exceeding 2 million [3]. - It is recognized as the top new media outlet in the AI and frontier technology sector by third-party data platforms like NewRank and Qingbo [4]. - The company is a strategic partner in major industry conferences and is involved with organizations such as the China Computer Federation and the World Artificial Intelligence Conference [8]. Group 2: Internship Details - The intern will be responsible for editing AI and computer science academic papers, assisting in content selection, abstract summarization, and media dissemination [5]. - Candidates from STEM fields such as AI, computer science, mathematics, physics, and electronic engineering are preferred, along with basic English reading skills [5]. - The internship lasts a minimum of three months, with options for full-time or part-time work, and offers a stipend and opportunities for recommendation letters [5]. Group 3: Company Culture and Values - Quantum Bit promotes a culture driven by curiosity, encouraging individuals to explore and share new information widely [10][11]. - The company values diverse educational backgrounds, focusing on curiosity and the ability to act on it rather than specific academic qualifications [9][10].
教多模态大模型学会“反思”和“复盘”,上交&上海AI Lab重磅发布MM-HELIX&AHPO,破解多模态复杂推理难题
量子位· 2025-10-19 04:10
Core Insights - The article discusses the limitations of current multimodal large models (MLLMs) in problem-solving, emphasizing their tendency to provide direct answers without iterative reasoning, which hinders their evolution from knowledge containers to problem-solving experts [1][2] Group 1: MM-HELIX Overview - The research team from Shanghai Jiao Tong University and Shanghai AI Lab has introduced MM-HELIX, a project aimed at endowing AI with long-chain reflective reasoning capabilities, closely resembling human intelligence [2] - MM-HELIX includes a comprehensive ecosystem designed to enhance the reflective reasoning abilities of AI models [2] Group 2: MM-HELIX Benchmark - The MM-HELIX Benchmark has been established as a rigorous testing ground for evaluating AI's reflective reasoning capabilities, featuring 42 high-difficulty tasks across algorithms, graph theory, puzzles, and strategy games [4][5] - The benchmark includes a sandbox environment with 1260 questions categorized into five levels of difficulty, allowing for fine-grained assessment of current multimodal large models [5] Group 3: Evaluation Results - Current leading models, including both proprietary and open-source, performed poorly on the MM-HELIX Benchmark, with only GPT-5 scoring above 50 points, while models lacking reflective capabilities scored around 10 points [7] - The accuracy of models significantly decreased when faced with multimodal inputs compared to pure text inputs, highlighting the urgent need to teach MLLMs reflective reasoning [7] Group 4: MM-HELIX-100K Dataset - To teach MLLMs to reflect, the team developed the MM-HELIX-100K dataset, containing 100,000 high-quality samples designed to foster reflective reasoning through a step-elicited response generation process [8] - This dataset aims to provide a rich source of self-correction and insight, essential for training MLLMs in reflective and iterative problem-solving [8] Group 5: AHPO Algorithm - The Adaptive Hybrid Policy Optimization (AHPO) algorithm has been introduced to facilitate a dynamic teaching approach, allowing models to learn from expert data while gradually encouraging independent thought [12][13] - AHPO addresses the challenges of catastrophic forgetting in direct fine-tuning and the sparsity of rewards in on-policy reinforcement learning [11][12] Group 6: Performance Improvements - The Qwen2.5-VL-7B model, enhanced with MM-HELIX-100K and AHPO, demonstrated significant improvements, achieving an 18.6% increase in accuracy on the MM-HELIX Benchmark and showcasing strong generalization across various reasoning tasks [18] - The model's ability to reflect and adapt has been proven to be a transferable meta-skill, moving beyond rote memorization to genuine understanding [15]
中国最新Agent产品趋势:多体协同,垂直赛道,行业核心业务 | 量子位智库AI 100
量子位· 2025-10-19 04:10
Core Insights - The article discusses the rapid evolution and application of Agent products in various industries, highlighting their transition from general tools to specialized "intelligent partners" that address specific pain points in sectors like research and investment [3][4]. Group 1: Agent Product Development - Agent technology is maturing, evolving from single-point intelligence to systematic intelligent collaboration, aiming for more efficient and stable task processing capabilities [3]. - The integration of cloud services with local operating systems allows for seamless user workflow and personalized services [3]. Group 2: Market Trends - There is a clear trend of Agent products embedding into various business processes across industries, enhancing automation and providing tailored solutions [3][4]. - The latest AI100 list features seven Agent products, indicating a growing market presence and competition [5]. Group 3: Notable Agent Products - Kimi, a tool for enhancing professional and learner capabilities, recorded nearly 30 million web visits in September [8][9]. - MiniMax combines chat and Agent functionalities, offering end-to-end solutions across various fields [10]. - The "扣子空间" from ByteDance serves as a professional AI work assistant, supporting deep writing and data analysis tasks [11]. - AutoGLM provides a cloud-based Agent platform for seamless task execution across applications [14]. - Bobby, an investment trading AI Agent, generates personalized trading strategies based on user preferences and market data [42].
马斯克发起编程人机大战!卡帕西说了不
量子位· 2025-10-19 04:10
Core Viewpoint - The article discusses the interaction between Elon Musk and Andrej Karpathy, highlighting Karpathy's refusal to compete with Musk's AI model, Grok 5, and the implications of their relationship in the context of AI development and collaboration [2][12][39]. Group 1: Interaction Dynamics - Musk invited Karpathy to a programming duel with Grok 5, reminiscent of the famous chess match between Kasparov and Deep Blue [1][11]. - Karpathy declined the challenge, stating that competing would diminish his value, as he sees more merit in collaboration than competition [2][12]. - The online community expressed eagerness to see a showdown between Karpathy and Grok 5, speculating on the potential outcomes and implications for AI development [16][20]. Group 2: Historical Context - Karpathy was a key figure at Tesla, where he significantly expanded the AI and Autopilot team and contributed to the development of Tesla's autonomous driving capabilities [33]. - After leaving Tesla in July 2022, he briefly joined OpenAI before founding his own AI education company, Eureka Labs [34][39]. - Despite their professional separations, both Musk and Karpathy have maintained a positive relationship, with Musk frequently expressing admiration for Karpathy's skills and contributions [37][39]. Group 3: Future Speculations - There is speculation about whether Karpathy will return to work with Musk, especially given Musk's ongoing interest in Karpathy's expertise and the potential for collaboration in AI [28][30]. - The article suggests that the future of their relationship could involve either continued independent pursuits by Karpathy or a possible reunion with Musk's ventures [39].