Workflow
量子位
icon
Search documents
经济学诺奖得主的富二代人生:香奈儿老佛爷帮他写作业,AI时代反对向机器人征税
量子位· 2025-10-19 08:10
Core Viewpoint - The 2025 Nobel Prize in Economic Sciences was awarded to three scholars who highlighted the critical role of technological and scientific innovation in driving economic growth, emphasizing the importance of continuous investment in basic research for long-term economic advancement [2][5][3]. Group 1: Nobel Prize Winners and Their Contributions - The prize was shared equally between Joel Mokyr, Philippe Aghion, and Peter Howitt, who revealed how technology and scientific innovation interact with market competition to foster economic growth [5][7]. - Joel Mokyr's research demonstrated the self-reinforcing relationship between scientific breakthroughs and technological applications, which is essential for sustained economic growth [7][11]. - Aghion and Howitt developed a pioneering mathematical model in the 1990s that explains how firms improve production processes and introduce higher-quality products through R&D investments, ultimately replacing established market leaders [8][30]. Group 2: Historical Context and Economic Growth - Historically, economic growth was sporadic, with little change in living standards until the Industrial Revolution in the 18th century, which initiated a self-reinforcing cycle of innovation and economic growth [21][22]. - Over the past two centuries, many countries have maintained an average economic growth rate of about 2%, which, due to compounding effects, results in significant income increases over decades [23][25]. - Joseph Schumpeter's concept of "creative destruction" explains that economic progress is driven by innovation that disrupts existing industries and creates new growth opportunities [26][28]. Group 3: Mechanisms of Innovation and Economic Dynamics - Mokyr identified two types of "useful knowledge" that drive innovation: propositional knowledge (understanding natural laws) and normative knowledge (practical guidelines) [30][29]. - Aghion and Howitt's model illustrates that the continuous replacement of old firms with new ones is a key engine of economic growth, as new companies strive to innovate and outperform established players [34][36]. - The rise of AI is currently instigating another wave of creative destruction, reinforcing the relevance of the Nobel laureates' research [40][41]. Group 4: Implications of Innovation - Innovation leads to the emergence of new winners while potentially sidelining others, raising concerns about job displacement and inequality [41][42]. - A robust policy framework is necessary to manage the effects of innovation and prevent market failures, ensuring that the mechanisms behind creative destruction are maintained [43][44].
LSTM之父向何恺明开炮:我学生才是残差学习奠基人
量子位· 2025-10-19 06:10
Core Viewpoint - The article discusses the historical context and contributions of Sepp Hochreiter and Jürgen Schmidhuber in the development of residual learning and its impact on deep learning, emphasizing that the concept of residual connections was introduced by Hochreiter in 1991, long before its popularization in ResNet [3][12][26]. Group 1: Historical Contributions - Sepp Hochreiter systematically analyzed the vanishing gradient problem in his 1991 doctoral thesis and proposed the use of recurrent residual connections to address this issue [3][12]. - The core idea of recurrent residual connections involves a self-connecting neuron with a fixed weight of 1.0, allowing the error signal to remain constant during backpropagation [13][14]. - The introduction of LSTM in 1997 by Hochreiter and Schmidhuber built upon this foundational concept, enabling effective long-term dependency learning in tasks such as speech and language processing [18][19]. Group 2: Evolution of Residual Learning - The Highway network, introduced in 2015, successfully trained deep feedforward networks with hundreds of layers by incorporating the gated residual concept from LSTM [23]. - ResNet, which gained significant attention in the same year, utilized residual connections to stabilize error propagation in deep networks, allowing for the training of networks with hundreds of layers [24][26]. - Both Highway networks and ResNet share similarities with the foundational principles established by Hochreiter in 1991, demonstrating the enduring relevance of his contributions to deep learning [26]. Group 3: Ongoing Debates and Recognition - Jürgen Schmidhuber has publicly claimed that various architectures, including AlexNet, VGG Net, GANs, and Transformers, were inspired by his lab's work, although these claims have not been universally accepted [28][31]. - The ongoing debate regarding the attribution of contributions in deep learning highlights the complexities of recognizing foundational work in a rapidly evolving field [10][32].
让模型“看视频写网页”,GPT-5仅得36.35分!上海AI Lab联合发布首个video2code基准
量子位· 2025-10-19 04:10
Core Insights - The article discusses the introduction of IWR-Bench, a new benchmark for evaluating the interactive webpage reconstruction capabilities of large vision-language models (LVLMs) by assessing their ability to generate code from user interaction videos rather than static screenshots [1][2]. Group 1: IWR-Bench Overview - IWR-Bench shifts the focus from static image-to-code tasks to dynamic video-to-code tasks, requiring models to interpret user interaction videos along with all necessary static resources [2][5]. - The benchmark includes 113 real-world website tasks and 1001 interaction actions, providing a comprehensive evaluation of models' capabilities in generating interactive web code [5][12]. - The evaluation framework employs an automated agent to simulate user interactions, assessing both functional correctness (Interactive Functionality Score, IFS) and visual fidelity (Visual Fidelity Score, VFS) [10][11]. Group 2: Model Performance - In testing 28 mainstream models, the best-performing model, GPT-5, achieved a total score of 36.35%, with an IFS of 24.39% and a VFS of 64.25%, indicating significant shortcomings in generating interactive logic [5][14][16]. - The results reveal that all models exhibit higher visual fidelity compared to functional correctness, highlighting a critical gap in their ability to generate event-driven logic [16]. - Specialized video understanding models performed poorly compared to general multimodal models, suggesting that the task's nature differs significantly from traditional video understanding tasks [20]. Group 3: Key Findings - The primary bottleneck identified is the functionality implementation, where models struggle to generate operational logic despite achieving high visual fidelity [16]. - The "thinking" versions of models showed some improvement, but the overall enhancement was limited, indicating that the foundational model capabilities remain crucial [17][19]. - IWR-Bench represents a significant step in advancing AI from understanding static webpages to comprehending dynamic interactions, emphasizing the ongoing challenges in this domain [20].
量子位实习招聘|AI学术编辑实习生,线下远程均可
量子位· 2025-10-19 04:10
Core Viewpoint - The article emphasizes the rapid updates in the AI academic field and the recruitment of an editorial intern to assist in managing the latest AI research papers and findings [1][2]. Group 1: Company Overview - Quantum Bit (量子位) has over 2.3 million subscribers on WeChat and more than 7 million users across the internet, with an average daily readership exceeding 2 million [3]. - It is recognized as the top new media outlet in the AI and frontier technology sector by third-party data platforms like NewRank and Qingbo [4]. - The company is a strategic partner in major industry conferences and is involved with organizations such as the China Computer Federation and the World Artificial Intelligence Conference [8]. Group 2: Internship Details - The intern will be responsible for editing AI and computer science academic papers, assisting in content selection, abstract summarization, and media dissemination [5]. - Candidates from STEM fields such as AI, computer science, mathematics, physics, and electronic engineering are preferred, along with basic English reading skills [5]. - The internship lasts a minimum of three months, with options for full-time or part-time work, and offers a stipend and opportunities for recommendation letters [5]. Group 3: Company Culture and Values - Quantum Bit promotes a culture driven by curiosity, encouraging individuals to explore and share new information widely [10][11]. - The company values diverse educational backgrounds, focusing on curiosity and the ability to act on it rather than specific academic qualifications [9][10].
薛定谔亲外孙创业量子计算,老黄早早就成了股东
量子位· 2025-10-19 04:10
Core Viewpoint - PsiQuantum, a quantum computing startup co-founded by the grandson of physicist Erwin Schrödinger, has raised $1 billion in a single funding round, setting a record for quantum computing startups. This funding aims to help the company build a million-qubit quantum computer by 2028, surpassing competitors like Google and IBM [10][11][12]. Company Overview - PsiQuantum was founded in 2016 with the goal of creating the first usable quantum computer. Initially based in the UK, the company relocated to Silicon Valley to better access funding [17][18]. - The company has established partnerships with major semiconductor manufacturers and has developed a new technology called Fusion-Based Quantum Computing (FBQC), which has been published in a leading scientific journal [21][22][24]. Funding and Growth - The recent $1 billion funding round was led by BlackRock, Temasek, and Baillie Gifford, marking a significant milestone in the quantum computing sector [10]. - PsiQuantum has secured various contracts, including a $22.5 million deal with the U.S. Air Force Research Laboratory and a $619 million order from the Australian government for a utility-scale quantum computer [27][29]. Technical Innovations - Unlike most quantum computers that use electrons or atoms, PsiQuantum's qubits are based on photons, allowing for easier integration with existing semiconductor manufacturing processes and operation at room temperature [32][33]. - The company has introduced the Omega chip set, designed for practical quantum computing, which includes components necessary for building a million-qubit quantum computer [36][38]. Leadership and Expertise - The founding team of PsiQuantum includes experts with strong academic backgrounds in quantum physics, such as CEO Jeremy O'Brien and CTO Mark Thompson, who have extensive experience in the field [43][44][55]. - The team is driven by a sense of social responsibility to bring quantum technology to fruition, reflecting their commitment to advancing the field [51][52].
教多模态大模型学会“反思”和“复盘”,上交&上海AI Lab重磅发布MM-HELIX&AHPO,破解多模态复杂推理难题
量子位· 2025-10-19 04:10
Core Insights - The article discusses the limitations of current multimodal large models (MLLMs) in problem-solving, emphasizing their tendency to provide direct answers without iterative reasoning, which hinders their evolution from knowledge containers to problem-solving experts [1][2] Group 1: MM-HELIX Overview - The research team from Shanghai Jiao Tong University and Shanghai AI Lab has introduced MM-HELIX, a project aimed at endowing AI with long-chain reflective reasoning capabilities, closely resembling human intelligence [2] - MM-HELIX includes a comprehensive ecosystem designed to enhance the reflective reasoning abilities of AI models [2] Group 2: MM-HELIX Benchmark - The MM-HELIX Benchmark has been established as a rigorous testing ground for evaluating AI's reflective reasoning capabilities, featuring 42 high-difficulty tasks across algorithms, graph theory, puzzles, and strategy games [4][5] - The benchmark includes a sandbox environment with 1260 questions categorized into five levels of difficulty, allowing for fine-grained assessment of current multimodal large models [5] Group 3: Evaluation Results - Current leading models, including both proprietary and open-source, performed poorly on the MM-HELIX Benchmark, with only GPT-5 scoring above 50 points, while models lacking reflective capabilities scored around 10 points [7] - The accuracy of models significantly decreased when faced with multimodal inputs compared to pure text inputs, highlighting the urgent need to teach MLLMs reflective reasoning [7] Group 4: MM-HELIX-100K Dataset - To teach MLLMs to reflect, the team developed the MM-HELIX-100K dataset, containing 100,000 high-quality samples designed to foster reflective reasoning through a step-elicited response generation process [8] - This dataset aims to provide a rich source of self-correction and insight, essential for training MLLMs in reflective and iterative problem-solving [8] Group 5: AHPO Algorithm - The Adaptive Hybrid Policy Optimization (AHPO) algorithm has been introduced to facilitate a dynamic teaching approach, allowing models to learn from expert data while gradually encouraging independent thought [12][13] - AHPO addresses the challenges of catastrophic forgetting in direct fine-tuning and the sparsity of rewards in on-policy reinforcement learning [11][12] Group 6: Performance Improvements - The Qwen2.5-VL-7B model, enhanced with MM-HELIX-100K and AHPO, demonstrated significant improvements, achieving an 18.6% increase in accuracy on the MM-HELIX Benchmark and showcasing strong generalization across various reasoning tasks [18] - The model's ability to reflect and adapt has been proven to be a transferable meta-skill, moving beyond rote memorization to genuine understanding [15]
中国最新Agent产品趋势:多体协同,垂直赛道,行业核心业务 | 量子位智库AI 100
量子位· 2025-10-19 04:10
Core Insights - The article discusses the rapid evolution and application of Agent products in various industries, highlighting their transition from general tools to specialized "intelligent partners" that address specific pain points in sectors like research and investment [3][4]. Group 1: Agent Product Development - Agent technology is maturing, evolving from single-point intelligence to systematic intelligent collaboration, aiming for more efficient and stable task processing capabilities [3]. - The integration of cloud services with local operating systems allows for seamless user workflow and personalized services [3]. Group 2: Market Trends - There is a clear trend of Agent products embedding into various business processes across industries, enhancing automation and providing tailored solutions [3][4]. - The latest AI100 list features seven Agent products, indicating a growing market presence and competition [5]. Group 3: Notable Agent Products - Kimi, a tool for enhancing professional and learner capabilities, recorded nearly 30 million web visits in September [8][9]. - MiniMax combines chat and Agent functionalities, offering end-to-end solutions across various fields [10]. - The "扣子空间" from ByteDance serves as a professional AI work assistant, supporting deep writing and data analysis tasks [11]. - AutoGLM provides a cloud-based Agent platform for seamless task execution across applications [14]. - Bobby, an investment trading AI Agent, generates personalized trading strategies based on user preferences and market data [42].
马斯克发起编程人机大战!卡帕西说了不
量子位· 2025-10-19 04:10
Core Viewpoint - The article discusses the interaction between Elon Musk and Andrej Karpathy, highlighting Karpathy's refusal to compete with Musk's AI model, Grok 5, and the implications of their relationship in the context of AI development and collaboration [2][12][39]. Group 1: Interaction Dynamics - Musk invited Karpathy to a programming duel with Grok 5, reminiscent of the famous chess match between Kasparov and Deep Blue [1][11]. - Karpathy declined the challenge, stating that competing would diminish his value, as he sees more merit in collaboration than competition [2][12]. - The online community expressed eagerness to see a showdown between Karpathy and Grok 5, speculating on the potential outcomes and implications for AI development [16][20]. Group 2: Historical Context - Karpathy was a key figure at Tesla, where he significantly expanded the AI and Autopilot team and contributed to the development of Tesla's autonomous driving capabilities [33]. - After leaving Tesla in July 2022, he briefly joined OpenAI before founding his own AI education company, Eureka Labs [34][39]. - Despite their professional separations, both Musk and Karpathy have maintained a positive relationship, with Musk frequently expressing admiration for Karpathy's skills and contributions [37][39]. Group 3: Future Speculations - There is speculation about whether Karpathy will return to work with Musk, especially given Musk's ongoing interest in Karpathy's expertise and the potential for collaboration in AI [28][30]. - The article suggests that the future of their relationship could involve either continued independent pursuits by Karpathy or a possible reunion with Musk's ventures [39].
卡帕西:强化学习很糟糕,但其他所有方法都更糟
量子位· 2025-10-18 09:30
Group 1 - The core viewpoint of the article is that achieving Artificial General Intelligence (AGI) will take at least another decade, as current AI systems need significant improvements to reach their full potential [5][10][28] - Karpathy emphasizes that existing AI systems lack maturity, multi-modal capabilities, and the ability to learn continuously, which are essential for them to function effectively in collaboration with humans [8][9][10] - He critiques the current state of Large Language Models (LLMs), stating that they have cognitive deficiencies and overestimate their capabilities, requiring substantial enhancements [16][18] Group 2 - Karpathy argues that reinforcement learning is more flawed than commonly perceived, as it reinforces all steps taken in reaching a correct answer, regardless of their validity, leading to inefficient learning [20][21][23] - He believes that AGI will not lead to a sudden leap in productivity but will follow a gradual growth pattern, similar to the historical 2% GDP growth trend observed with the internet [25][29] - The lengthy development of autonomous driving technology is attributed to the high stakes involved, where even minor errors can have severe consequences, necessitating extensive reliability improvements [30][32][33] Group 3 - As a full-time educator, Karpathy aims to establish a leading-edge educational institution that offers a unique mentorship experience, focusing on personalized learning and advanced AI education [34][36] - He highlights the importance of tailored teaching methods, which current LLMs cannot replicate, emphasizing the need for human instructors to provide appropriate challenges to students [36][38]
AI打通第一/第三人称视觉,跨视角视觉理解新SOTA|ICCV 2025 Highlight
量子位· 2025-10-18 09:30
ObjectRelator团队 投稿 量子位 | 公 众号 Q bitAI 具身智能落地迈出关键一步,AI拥有第一人称与第三人称的"通感"了! INSAIT、复旦大学等单位联合提出 O bjectRelat or框架 ,让 AI精准匹配不同视角下的同一物体,实现跨视角的统一表征与理解 。 实验中,ObjectRelator在Ego (第一人称视觉) 转Exo (三人称视觉) 和Exo转Ego两个任务上都显著超越了所有基线模型,拿下SOTA。 Ego→Exo效果,be like: Exo→Ego也可以很好地对齐: 目前,该工作已被ICCV 2025接收为Highlight论文,代码已开源。 Ego与Exo之间的鸿沟 在人类技能习得过程中,需要在两个视角之间进行流畅的转换。 我们在观看别人的演示过程时,会尝试在脑海中想象自己进行这些操作的场景。然而这一跨视角理解的能力对于计算机和机器人来说却是一个 巨大的挑战,制约着机器人学习、VR交互等关键领域的发展。 第一人称视角具备较强的沉浸感与交互细节捕捉能力,能够精确刻画主体与环境之间的动态交互过程。然而,其 视觉范围受限、画面稳定性 较差,难以全面反映场景全貌 。 尽 ...