Workflow
量子位
icon
Search documents
Hinton暴论:AI已经有意识,它自己不知道而已
量子位· 2025-10-12 04:07
Core Viewpoint - The article discusses Geoffrey Hinton's perspective on artificial intelligence (AI), suggesting that AI may already possess a form of "subjective experience" or consciousness, albeit unrecognized by itself [1][56]. Group 1: AI Consciousness and Understanding - Hinton posits that AI might have a nascent form of consciousness, which is misunderstood by humans [2][3]. - He emphasizes that AI has evolved from keyword-based search systems to tools that can understand human intentions [10][14]. - Modern large language models (LLMs) exhibit capabilities that are close to human expertise in various subjects [15]. Group 2: Neural Networks and Learning Mechanisms - Hinton explains the distinction between machine learning and neural networks, with the latter inspired by the human brain's functioning [17][21]. - He describes how neural networks learn by adjusting the strength of connections between neurons, similar to how the brain operates [21][20]. - The breakthrough of backpropagation in 1986 allowed for efficient training of neural networks, significantly enhancing their capabilities [38][40]. Group 3: Language Models and Cognitive Processes - Hinton elaborates on how LLMs process language, drawing parallels to human cognitive processes [46][47]. - He asserts that LLMs do not merely memorize but engage in a predictive process that resembles human thought [48][49]. - The training of LLMs involves a cycle of prediction and correction, enabling them to learn semantic understanding [49][55]. Group 4: AI Risks and Ethical Considerations - Hinton highlights potential risks associated with AI, including misuse for generating false information and societal instability [68][70]. - He stresses the importance of regulatory measures to mitigate these risks and ensure AI aligns with human interests [72][75]. - Hinton warns that the most significant threat from advanced AI may not be rebellion but rather its ability to persuade humans [66]. Group 5: Global AI Landscape and Competition - Hinton comments on the AI competition between the U.S. and China, noting that while the U.S. currently leads, its advantage is diminishing due to reduced funding for foundational research [78][80]. - He acknowledges China's proactive approach in fostering AI startups, which may lead to significant advancements in the field [82].
清华大学x生数科技:从波形到隐空间,AudioLBM引领音频超分新范式
量子位· 2025-10-12 04:07
2025年发表于ICASSP的 Bridge-SR 工作首次将薛定谔桥 (Schrödinger Bridge) 模型引入语音超分任务,在"数据到数据"的生成范式下 建立了低分辨率波形与高分辨率波形之间的可解桥接过程。 不同于扩散模型从随机噪声逐步生成信号的"噪声到数据"方式,Bridge-SR直接利用低分辨率波形作为生成先验,使模型在轻量化网络 (仅 1.7M参数) 下就能以"数据到数据"范式实现高效、高保真的语音超分,并在VCTK语音测试集上优于多项主流方法。 在这一背景下,清华大学与生数科技(Shengshu AI)团队围绕桥类生成模型与音频超分任务展开系统研究,先后在语音领域顶级会议 ICASSP 2025 和机器学习顶级会议 NeurIPS 2025 发表了两项连续成果: 轻量化语音波形超分模型Bridge-SR,以及面向高达192 kHz母带级音频的多功能超分框架AudioLBM。 其中,AudioLBM覆盖语音、音效与音乐等多类内容,在通用高分辨率音频生成方面展现出重要的扩展潜力。 从数据到数据:Bridge-SR的探索 清华大学&生数科技团队 投稿 量子位 | 公众号 QbitAI 音频超分辨 ...
吴恩达Agentic AI新课:手把手教你搭建Agent工作流,GPT-3.5反杀GPT-4就顺手的事
量子位· 2025-10-12 04:07
Core Concept - The article discusses the new course by Andrew Ng on Agentic AI, emphasizing the development of workflows that mimic human-like task execution through decomposition, reflection, and optimization [1][9][74]. Summary by Sections Agentic AI Overview - Agentic AI focuses on breaking down tasks into manageable steps, allowing for iterative improvement rather than generating a single output [5][14][74]. - The course reveals a systematic methodology behind Agentic AI, highlighting the importance of task decomposition and continuous optimization [9][10][74]. Core Design Patterns - The course identifies four core design patterns for developing Agentic workflows: Reflection, Tool Usage, Planning, and Multi-agent Collaboration [3][17][44]. Reflection - Reflection involves the model assessing its outputs and considering improvements, which can be enhanced by using multiple models in tandem [18][21]. - Objective evaluation standards can be established to assess outputs, improving the quality of the model's self-correction [23][27]. Tool Usage - Tool usage allows the model to autonomously decide which functions to call, enhancing efficiency compared to traditional methods where developers manually implement tools [28][34]. - The article discusses the importance of a unified protocol for tool calls, which simplifies the integration of various tools [41][43]. Planning - Planning enables the model to adjust the sequence of tool execution based on different requests, optimizing performance and resource use [46][48]. - A practical technique involves converting execution steps into JSON or code format for clearer task execution [47]. Multi-agent Collaboration - Multi-agent collaboration involves creating multiple agents with different expertise to tackle complex tasks, improving overall efficiency [51][52]. - This structured collaboration mirrors organizational structures, enhancing task division and scalability [52]. Iterative Improvement Process - The article outlines a feedback loop for building Agentic workflows, consisting of sampling, evaluation, and improvement [59][60]. - Error analysis is crucial for optimizing the system, allowing for targeted improvements based on specific performance issues [61][66]. Practical Insights - The course provides practical insights into selecting and testing different models, emphasizing the importance of iterative refinement in workflow design [68][70]. - The concept of Agentic AI represents a significant opportunity for developers to explore more complex, multi-step workflows, moving beyond traditional end-to-end agents [80].
实测“清华特奖版Sora”:一图一prompt直接生成视频,堪称嘴强王者
量子位· 2025-10-12 02:05
Core Insights - The article discusses the launch of GAGA-1, a video generation model developed by Sand.ai, which focuses on audio-visual synchronization and performance [1][24][30] - GAGA-1 allows users to create videos by simply uploading an image and providing a prompt, making the process user-friendly and accessible [4][7][8] Group 1: Model Features - GAGA-1 excels in generating videos where characters can "speak" and perform, showcasing a strong capability in lip-syncing and expression [23][30] - The platform does not require an invitation code, allowing users to access it freely [4] - Users can generate images within the platform, streamlining the process from image to video [7][8] Group 2: Performance Evaluation - Initial tests show that GAGA-1 can produce high-quality video outputs with natural expressions and synchronized lip movements [11][12] - However, some minor bugs were noted, such as stiffness in character expressions and slight misalignment in audio [13][23] - The model performs well in simple scenarios but struggles with complex scenes involving multiple characters and actions [23][30] Group 3: Team Background - Sand.ai, the team behind GAGA-1, previously developed the Magi-1 model, known for its high-quality video generation [25][29] - The founder, Cao Yue, has a strong academic background, including a PhD from Tsinghua University and recognition for his contributions to AI research [26][29] Group 4: Market Position - GAGA-1 differentiates itself by focusing on audio-visual synchronization rather than attempting to be an all-encompassing model [29][30] - The model's strength in dialogue and performance positions it as a leading player in the AI-generated video market [30][31]
拒绝小扎15亿美元offer的大佬,还是加入Meta了
量子位· 2025-10-12 02:05
Core Viewpoint - Andrew Tulloch, co-founder and chief architect of Thinking Machines Lab, has left the company to join Meta, despite previously rejecting a $1.5 billion compensation package from Meta [1][18]. Group 1: Andrew Tulloch's Background and Career - Tulloch has a strong academic background, graduating with honors in mathematics and statistics from the University of Sydney and later earning a master's degree in mathematical statistics and machine learning from Cambridge University [8][11]. - He began his career at Goldman Sachs, developing financial products and trading strategies before moving to Facebook (now Meta) in 2012, where he worked for 11 years in machine learning [10][11][6]. - Tulloch's expertise in machine learning was further utilized at OpenAI, where he worked on training models like GPT-4.5 before co-founding Thinking Machines Lab [16][15]. Group 2: Transition to Meta - Tulloch's return to Meta is seen as a "homecoming," as he had previously spent a significant amount of time there [6]. - His departure from Thinking Machines Lab was described as a personal decision, and there was speculation about the reasons behind it, especially given the company's high valuation of $12 billion [4][21]. - The recruitment efforts by Meta included a direct approach from CEO Mark Zuckerberg, who initially sought to acquire Thinking Machines Lab before focusing on hiring Tulloch and other employees [19][20]. Group 3: Compensation and Market Dynamics - Tulloch had previously turned down a $1.5 billion offer from Meta, which included stock options, indicating a potential increase in compensation that may have influenced his decision to join [18][19]. - The article hints at the possibility that Tulloch's compensation package may have increased to $2 billion, reflecting the competitive nature of talent acquisition in the tech industry [21].
OpenAI算力账单曝光:70亿美元支出,大部分钱花在了“看不见的实验”
量子位· 2025-10-11 09:01
Core Insights - OpenAI's total spending on computing resources reached $7 billion last year, primarily for research and experimental runs rather than final training of popular models [1][3][20] - A significant portion of the $5 billion allocated for R&D compute was not used for the final training of models like GPT-4.5, but rather for behind-the-scenes research and various experimental runs [6][18] Spending Breakdown - Of the $7 billion, approximately $5 billion was dedicated to R&D compute, which includes all training and research activities, while around $2 billion was spent on inference compute for user-facing applications [3][5] - The R&D compute spending includes basic research, experimental runs, and unreleased models, with only a small fraction allocated to the final training of models [5][6] Model Training Costs - Researchers estimated the training costs for significant models expected to be released between Q2 2024 and Q1 2025, focusing solely on the final training runs [11][12] - For GPT-4.5, the estimated training run cost ranged from $135 million to $495 million, depending on cluster size and training duration [15] - Other models like GPT-4o and Sora Turbo were estimated using indirect methods based on floating-point operations (FLOP), with costs varying widely [17] Research Focus - The analysis indicates that a large portion of OpenAI's R&D compute in 2024 will likely be allocated to research and experimental training runs rather than directly producing public-facing products [18] - This focus on experimentation over immediate product output explains the anticipated significant losses for OpenAI in 2024, as the company spent $5 billion on R&D while generating only $3.7 billion in revenue [20][21] Power of Compute - The article emphasizes the critical importance of compute power in the AI industry, stating that whoever controls the compute resources will dominate AI [22][28] - OpenAI has engaged in substantial compute transactions, including building its own data centers to mitigate risks associated with reliance on external cloud services [22][30] - The demand for compute resources in AI development is described as having no upper limit, highlighting the competitive landscape [27][28]
国产游戏理解模型刷新SOTA,对话逗逗AI CEO:开源模型+行业数据是突破关键
量子位· 2025-10-11 09:01
Core Insights - The article highlights the significant impact of domestic open-source models in the AI industry, particularly in the gaming sector, as evidenced by the performance of LynkSoul VLM v1 at the Tokyo Game Show [1][2]. Group 1: Model Performance - LynkSoul VLM v1 outperformed leading closed-source models like GPT-4o, Claude 4 Sonnet, and Gemini 2.5 Flash in game understanding, achieving higher accuracy in visual understanding, game context comprehension, and natural language expression [10][11]. - In a test scenario within "League of Legends," LynkSoul VLM v1 scored 3.44 in vision understanding accuracy, 3.29 in game understanding, and 2.91 in natural expression, significantly surpassing the scores of its competitors [11]. - The model demonstrated robust generalization capabilities across various games, maintaining superior performance in the same three core metrics [12]. Group 2: User Engagement and Data Accumulation - The success of LynkSoul VLM v1 is attributed to the accumulation of over 8 million game player interactions, which provided valuable data for model training and refinement [18][19]. - The model's ability to understand and respond to real-time game scenarios is enhanced by user participation and data collection, which are critical for its development [19]. Group 3: Technical Innovations - The model's latency for game interactions is currently between 1.5 to 2 seconds, with ongoing efforts to reduce this through local processing and smaller model implementations [20][21]. - Long-term memory capabilities are achieved through a combination of thematic indexing and vector retrieval, allowing the AI to recall past interactions and provide personalized responses [23][24]. Group 4: Market Positioning and Future Outlook - The company aims for global expansion, having already launched its product in overseas markets with positive user engagement, particularly in English and Japanese-speaking regions [43][44]. - The future strategy includes integrating hardware with software solutions, ensuring that the AI companion can operate across various platforms and devices [36][37].
告别AI“乱画图表”!港中文团队发布首个结构化图像生成编辑系统
量子位· 2025-10-11 09:01
Core Insights - The article discusses the limitations of current AI models in generating accurate structured images like charts and graphs, despite their success in creating natural images [1][2] - It highlights a significant gap between visual understanding and generation capabilities, which hinders the development of unified multimodal models that can both interpret and create visual content accurately [2][10] Data Layer - A dataset of 1.3 million code-aligned structured samples was created to ensure the accuracy of generated images through precise code definitions [11][13] - The dataset includes executable plotting codes covering six categories, ensuring strict alignment between images and their corresponding codes [14] Model Layer - A lightweight VLM integration solution was designed to balance the capabilities of structured and natural image generation, utilizing FLUX.1 Kontext and Qwen-VL for enhanced understanding of structured image inputs [13][15] - The training process involves a three-stage progressive training approach to maintain the model's ability to generate natural images while improving structured image generation [15][16] Evaluation Layer - The team introduced StructBench and StructScore as specialized benchmarks and metrics to assess the accuracy of generated structured images, addressing the shortcomings of existing evaluation methods [17][19] - StructBench includes 1,714 stratified samples with fine-grained Q&A pairs to validate factual accuracy, while StructScore evaluates model responses against standard answers [19] Performance Comparison - The proposed solution demonstrated significant advantages over existing models, with the best-performing models achieving factual accuracy around 50%, indicating substantial room for improvement in structured visual generation [21][22] - The research emphasizes that high-quality, strictly aligned data is crucial for enhancing model performance, more so than the model architecture itself [22] Broader Implications - This research aims to lay a systematic foundation for structured visual generation, encouraging further exploration in this overlooked area [23][25] - The ultimate goal is to transition AI from being merely a beautification tool to a productivity tool capable of generating accurate mathematical images and experimental charts for various fields [24][25]
找出iPhone漏洞,库克给你200万美元
量子位· 2025-10-11 06:04
Core Points - Apple has significantly increased its security bounty program, with the maximum base reward now reaching $2 million, making it the highest known bounty program in the industry [3][9] - The program aims to attract top researchers capable of identifying complex vulnerabilities that could pose significant threats, particularly those mimicking commercial surveillance software attacks [8][9] - Since its inception nearly a decade ago, Apple has paid over $35 million to more than 800 researchers [7] Summary by Sections Security Bounty Program Upgrade - Apple has doubled the maximum base reward to $2 million for discovering critical vulnerabilities, reflecting its commitment to enhancing security [3][9] - Additional bonuses are available for finding vulnerabilities that bypass lock modes and test software, potentially raising total rewards to $5 million [9] Increased Reward Categories - Apple has raised the reward amounts for several vulnerability categories, encouraging exploration in key technical areas [10] - Specific rewards include $100,000 for bypassing Gatekeeper and $1 million for unauthorized iCloud access [10] - New categories have been added, such as $300,000 for WebKit sandbox escape and $1 million for wireless proximity attacks [10] Target Flags Initiative - Apple introduced Target Flags, allowing researchers to objectively demonstrate the exploitability of top bounty categories, which can expedite reward processing [11][12] - Researchers submitting reports with Target Flags will be eligible for accelerated rewards, even before fixes are released [12] Additional Security Measures - In 2022, Apple established a $10 million cybersecurity fund to support civil society organizations investigating targeted surveillance software attacks [13] - With the launch of iPhone 17, Apple introduced a memory integrity protection feature to enhance resistance against common software vulnerabilities [13] - Apple plans to provide 1,000 iPhone 17 devices to high-risk groups potentially targeted by commercial surveillance software [13] Implementation Timeline - The updated bounty program will take effect in November 2025, with detailed information on new categories and reward standards to be published on the Apple Security Research website [13]
开源编程模型王座易主了,谁能想到新SOTA是快手
量子位· 2025-10-11 06:04
Core Insights - The article highlights the emergence of KAT-Dev-72B-Exp from Kuaishou as the leading open-source programming model, achieving a score of 74.6% on the SWE-Bench certification leaderboard [1][4]. Group 1: Model Performance - KAT-Dev-72B-Exp is an experimental reinforcement learning version of the KAT-Coder model, which has also outperformed GPT-5 (non-Codex mode) and Claude 4 Sonnet on the SWE-Bench certification [3][4]. - KAT-Coder demonstrates capabilities such as recreating a complete version of the game "Fruit Ninja" within a web environment, including scoring and life systems [6]. Group 2: Visualization and Interaction - The model excels in visualizing physical laws through code, with examples including a cyberpunk clock that triggers explosion effects and a solar system simulation created using three.js [10][13]. - KAT-Coder can generate interactive effects and animations that adhere to real physical principles, such as a 60-story building collapse simulation [15]. Group 3: Key Technologies - KAT-Coder employs multiple training phases, including mid-training, supervised fine-tuning (SFT), and reinforcement fine-tuning (RFT), leading to emergent behaviors in the model [17][25]. - The model's interaction count required to complete tasks decreased by 32% after reinforcement learning, indicating improved efficiency [26]. Group 4: Industrial-Grade Framework - Kuaishou's self-developed industrial-grade reinforcement learning framework, SeamlessFlow, supports complex scenarios like multi-agent and online reinforcement learning [28][29]. - SeamlessFlow has shown a 100% throughput improvement in single-round RL tasks and a 62% reduction in overall training time compared to mainstream VERL frameworks [35]. Group 5: Training Optimization - The introduction of Trie Packing mechanism and the restructuring of the training engine allow KAT-Dev-72B-Exp to efficiently train on shared prefix trajectories, achieving an average speed increase of 2.5 times [37].