Workflow
量子位
icon
Search documents
2025人工智能年度评选启动!3大维度5类奖项,正在寻找AI+时代领航者
量子位· 2025-10-12 04:07
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行的 企业、人物与产品。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现并 致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 2025 人工智能年度潜力创业公司 聚焦于中国人工智能领域创新创业力量,将评选出最具投资价值和发展潜力的AI创业公司, 参选条件 : 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决 ...
Hinton暴论:AI已经有意识,它自己不知道而已
量子位· 2025-10-12 04:07
Core Viewpoint - The article discusses Geoffrey Hinton's perspective on artificial intelligence (AI), suggesting that AI may already possess a form of "subjective experience" or consciousness, albeit unrecognized by itself [1][56]. Group 1: AI Consciousness and Understanding - Hinton posits that AI might have a nascent form of consciousness, which is misunderstood by humans [2][3]. - He emphasizes that AI has evolved from keyword-based search systems to tools that can understand human intentions [10][14]. - Modern large language models (LLMs) exhibit capabilities that are close to human expertise in various subjects [15]. Group 2: Neural Networks and Learning Mechanisms - Hinton explains the distinction between machine learning and neural networks, with the latter inspired by the human brain's functioning [17][21]. - He describes how neural networks learn by adjusting the strength of connections between neurons, similar to how the brain operates [21][20]. - The breakthrough of backpropagation in 1986 allowed for efficient training of neural networks, significantly enhancing their capabilities [38][40]. Group 3: Language Models and Cognitive Processes - Hinton elaborates on how LLMs process language, drawing parallels to human cognitive processes [46][47]. - He asserts that LLMs do not merely memorize but engage in a predictive process that resembles human thought [48][49]. - The training of LLMs involves a cycle of prediction and correction, enabling them to learn semantic understanding [49][55]. Group 4: AI Risks and Ethical Considerations - Hinton highlights potential risks associated with AI, including misuse for generating false information and societal instability [68][70]. - He stresses the importance of regulatory measures to mitigate these risks and ensure AI aligns with human interests [72][75]. - Hinton warns that the most significant threat from advanced AI may not be rebellion but rather its ability to persuade humans [66]. Group 5: Global AI Landscape and Competition - Hinton comments on the AI competition between the U.S. and China, noting that while the U.S. currently leads, its advantage is diminishing due to reduced funding for foundational research [78][80]. - He acknowledges China's proactive approach in fostering AI startups, which may lead to significant advancements in the field [82].
清华大学x生数科技:从波形到隐空间,AudioLBM引领音频超分新范式
量子位· 2025-10-12 04:07
2025年发表于ICASSP的 Bridge-SR 工作首次将薛定谔桥 (Schrödinger Bridge) 模型引入语音超分任务,在"数据到数据"的生成范式下 建立了低分辨率波形与高分辨率波形之间的可解桥接过程。 不同于扩散模型从随机噪声逐步生成信号的"噪声到数据"方式,Bridge-SR直接利用低分辨率波形作为生成先验,使模型在轻量化网络 (仅 1.7M参数) 下就能以"数据到数据"范式实现高效、高保真的语音超分,并在VCTK语音测试集上优于多项主流方法。 在这一背景下,清华大学与生数科技(Shengshu AI)团队围绕桥类生成模型与音频超分任务展开系统研究,先后在语音领域顶级会议 ICASSP 2025 和机器学习顶级会议 NeurIPS 2025 发表了两项连续成果: 轻量化语音波形超分模型Bridge-SR,以及面向高达192 kHz母带级音频的多功能超分框架AudioLBM。 其中,AudioLBM覆盖语音、音效与音乐等多类内容,在通用高分辨率音频生成方面展现出重要的扩展潜力。 从数据到数据:Bridge-SR的探索 清华大学&生数科技团队 投稿 量子位 | 公众号 QbitAI 音频超分辨 ...
吴恩达Agentic AI新课:手把手教你搭建Agent工作流,GPT-3.5反杀GPT-4就顺手的事
量子位· 2025-10-12 04:07
Core Concept - The article discusses the new course by Andrew Ng on Agentic AI, emphasizing the development of workflows that mimic human-like task execution through decomposition, reflection, and optimization [1][9][74]. Summary by Sections Agentic AI Overview - Agentic AI focuses on breaking down tasks into manageable steps, allowing for iterative improvement rather than generating a single output [5][14][74]. - The course reveals a systematic methodology behind Agentic AI, highlighting the importance of task decomposition and continuous optimization [9][10][74]. Core Design Patterns - The course identifies four core design patterns for developing Agentic workflows: Reflection, Tool Usage, Planning, and Multi-agent Collaboration [3][17][44]. Reflection - Reflection involves the model assessing its outputs and considering improvements, which can be enhanced by using multiple models in tandem [18][21]. - Objective evaluation standards can be established to assess outputs, improving the quality of the model's self-correction [23][27]. Tool Usage - Tool usage allows the model to autonomously decide which functions to call, enhancing efficiency compared to traditional methods where developers manually implement tools [28][34]. - The article discusses the importance of a unified protocol for tool calls, which simplifies the integration of various tools [41][43]. Planning - Planning enables the model to adjust the sequence of tool execution based on different requests, optimizing performance and resource use [46][48]. - A practical technique involves converting execution steps into JSON or code format for clearer task execution [47]. Multi-agent Collaboration - Multi-agent collaboration involves creating multiple agents with different expertise to tackle complex tasks, improving overall efficiency [51][52]. - This structured collaboration mirrors organizational structures, enhancing task division and scalability [52]. Iterative Improvement Process - The article outlines a feedback loop for building Agentic workflows, consisting of sampling, evaluation, and improvement [59][60]. - Error analysis is crucial for optimizing the system, allowing for targeted improvements based on specific performance issues [61][66]. Practical Insights - The course provides practical insights into selecting and testing different models, emphasizing the importance of iterative refinement in workflow design [68][70]. - The concept of Agentic AI represents a significant opportunity for developers to explore more complex, multi-step workflows, moving beyond traditional end-to-end agents [80].
实测“清华特奖版Sora”:一图一prompt直接生成视频,堪称嘴强王者
量子位· 2025-10-12 02:05
Core Insights - The article discusses the launch of GAGA-1, a video generation model developed by Sand.ai, which focuses on audio-visual synchronization and performance [1][24][30] - GAGA-1 allows users to create videos by simply uploading an image and providing a prompt, making the process user-friendly and accessible [4][7][8] Group 1: Model Features - GAGA-1 excels in generating videos where characters can "speak" and perform, showcasing a strong capability in lip-syncing and expression [23][30] - The platform does not require an invitation code, allowing users to access it freely [4] - Users can generate images within the platform, streamlining the process from image to video [7][8] Group 2: Performance Evaluation - Initial tests show that GAGA-1 can produce high-quality video outputs with natural expressions and synchronized lip movements [11][12] - However, some minor bugs were noted, such as stiffness in character expressions and slight misalignment in audio [13][23] - The model performs well in simple scenarios but struggles with complex scenes involving multiple characters and actions [23][30] Group 3: Team Background - Sand.ai, the team behind GAGA-1, previously developed the Magi-1 model, known for its high-quality video generation [25][29] - The founder, Cao Yue, has a strong academic background, including a PhD from Tsinghua University and recognition for his contributions to AI research [26][29] Group 4: Market Position - GAGA-1 differentiates itself by focusing on audio-visual synchronization rather than attempting to be an all-encompassing model [29][30] - The model's strength in dialogue and performance positions it as a leading player in the AI-generated video market [30][31]
拒绝小扎15亿美元offer的大佬,还是加入Meta了
量子位· 2025-10-12 02:05
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 那个拒绝了小扎15亿美元薪酬包的机器学习大神,还是加入Meta了。 OpenAI前CTO Mira Murati创业公司Thinking Machines Lab证实,联创、首席架构师 Andrew Tulloch 已经离职去了Meta。 但对于Tulloch的离职,还是有网友感到不解,表示如果Thinking Machines Lab估值是120亿美元,Tulloch起码能拿到10%,实在是想象不 到他的离职理由。 按照公司发言人的说法,Tulloch离职的理由是"出于个人原因决定走一条不同的道路",其本人则未给出回应。 | 有人调侃说,可能是Tulloch"已经完成思考"了吧。 | | --- | 11年Meta老员工"重归故里" Tulloch这次跳槽到Meta,也可以算是" 重归故里 ", 之前他曾经在Meta(包括Facebook时期)干了11年 。 曾与Tulloch共事的Facebook前高管Mike Vernal评价说,"他绝对是个天才"。 Tulloch来自澳大利亚,2011年本科毕业于悉尼大学,专业是数学与统计学,其间他是悉尼大学理学 ...
OpenAI算力账单曝光:70亿美元支出,大部分钱花在了“看不见的实验”
量子位· 2025-10-11 09:01
Core Insights - OpenAI's total spending on computing resources reached $7 billion last year, primarily for research and experimental runs rather than final training of popular models [1][3][20] - A significant portion of the $5 billion allocated for R&D compute was not used for the final training of models like GPT-4.5, but rather for behind-the-scenes research and various experimental runs [6][18] Spending Breakdown - Of the $7 billion, approximately $5 billion was dedicated to R&D compute, which includes all training and research activities, while around $2 billion was spent on inference compute for user-facing applications [3][5] - The R&D compute spending includes basic research, experimental runs, and unreleased models, with only a small fraction allocated to the final training of models [5][6] Model Training Costs - Researchers estimated the training costs for significant models expected to be released between Q2 2024 and Q1 2025, focusing solely on the final training runs [11][12] - For GPT-4.5, the estimated training run cost ranged from $135 million to $495 million, depending on cluster size and training duration [15] - Other models like GPT-4o and Sora Turbo were estimated using indirect methods based on floating-point operations (FLOP), with costs varying widely [17] Research Focus - The analysis indicates that a large portion of OpenAI's R&D compute in 2024 will likely be allocated to research and experimental training runs rather than directly producing public-facing products [18] - This focus on experimentation over immediate product output explains the anticipated significant losses for OpenAI in 2024, as the company spent $5 billion on R&D while generating only $3.7 billion in revenue [20][21] Power of Compute - The article emphasizes the critical importance of compute power in the AI industry, stating that whoever controls the compute resources will dominate AI [22][28] - OpenAI has engaged in substantial compute transactions, including building its own data centers to mitigate risks associated with reliance on external cloud services [22][30] - The demand for compute resources in AI development is described as having no upper limit, highlighting the competitive landscape [27][28]
国产游戏理解模型刷新SOTA,对话逗逗AI CEO:开源模型+行业数据是突破关键
量子位· 2025-10-11 09:01
2025年进入最后一个季度, 国产开源模型爆发 的影响力正在得到更多印证。 比如 垂类模型领域 ,亚洲最大游戏展 东京电玩展 (TGS)上,国产AI陪伴厂商就发了个大招: 游戏理解领域模型LynkSoul VLM v1 ,在游戏场景中表现显著超过了包括GPT-4o、Claude 4 Sonnet、Gemini 2.5 Flash等一众顶尖闭源 模型。 背后厂商逗逗AI,亦在现场吸引了不少关注的目光。 此时距离其新产品逗逗AI游戏伙伴1.0(海外版为Hakko AI)上线不过一个月左右时间,但在数据上,逗逗AI已经依靠出色的游戏/视频/网页 实时理解能力,新增200多万用户,总用户数突破1000万。 鱼羊 发自 凹非寺 量子位 | 公众号 QbitAI △ 陪玩《空洞骑士:丝之歌》 在TGS现场,我们趁机和逗逗AI CEO刘斌新聊了聊有关逗逗游戏伙伴产品、技术本身,以及AI陪伴这个垂直领域的发展现状。 TL;DR: …… 游戏理解新SOTA 此次闪耀东京电玩展的LynkSoul VLM v1,是逗逗AI专为游戏训练的视觉语言模型。 它能在陪玩过程中实时理解你的游戏画面,比如在《英雄联盟》中点评你的团战表现,靠的 ...
告别AI“乱画图表”!港中文团队发布首个结构化图像生成编辑系统
量子位· 2025-10-11 09:01
Core Insights - The article discusses the limitations of current AI models in generating accurate structured images like charts and graphs, despite their success in creating natural images [1][2] - It highlights a significant gap between visual understanding and generation capabilities, which hinders the development of unified multimodal models that can both interpret and create visual content accurately [2][10] Data Layer - A dataset of 1.3 million code-aligned structured samples was created to ensure the accuracy of generated images through precise code definitions [11][13] - The dataset includes executable plotting codes covering six categories, ensuring strict alignment between images and their corresponding codes [14] Model Layer - A lightweight VLM integration solution was designed to balance the capabilities of structured and natural image generation, utilizing FLUX.1 Kontext and Qwen-VL for enhanced understanding of structured image inputs [13][15] - The training process involves a three-stage progressive training approach to maintain the model's ability to generate natural images while improving structured image generation [15][16] Evaluation Layer - The team introduced StructBench and StructScore as specialized benchmarks and metrics to assess the accuracy of generated structured images, addressing the shortcomings of existing evaluation methods [17][19] - StructBench includes 1,714 stratified samples with fine-grained Q&A pairs to validate factual accuracy, while StructScore evaluates model responses against standard answers [19] Performance Comparison - The proposed solution demonstrated significant advantages over existing models, with the best-performing models achieving factual accuracy around 50%, indicating substantial room for improvement in structured visual generation [21][22] - The research emphasizes that high-quality, strictly aligned data is crucial for enhancing model performance, more so than the model architecture itself [22] Broader Implications - This research aims to lay a systematic foundation for structured visual generation, encouraging further exploration in this overlooked area [23][25] - The ultimate goal is to transition AI from being merely a beautification tool to a productivity tool capable of generating accurate mathematical images and experimental charts for various fields [24][25]
找出iPhone漏洞,库克给你200万美元
量子位· 2025-10-11 06:04
Core Points - Apple has significantly increased its security bounty program, with the maximum base reward now reaching $2 million, making it the highest known bounty program in the industry [3][9] - The program aims to attract top researchers capable of identifying complex vulnerabilities that could pose significant threats, particularly those mimicking commercial surveillance software attacks [8][9] - Since its inception nearly a decade ago, Apple has paid over $35 million to more than 800 researchers [7] Summary by Sections Security Bounty Program Upgrade - Apple has doubled the maximum base reward to $2 million for discovering critical vulnerabilities, reflecting its commitment to enhancing security [3][9] - Additional bonuses are available for finding vulnerabilities that bypass lock modes and test software, potentially raising total rewards to $5 million [9] Increased Reward Categories - Apple has raised the reward amounts for several vulnerability categories, encouraging exploration in key technical areas [10] - Specific rewards include $100,000 for bypassing Gatekeeper and $1 million for unauthorized iCloud access [10] - New categories have been added, such as $300,000 for WebKit sandbox escape and $1 million for wireless proximity attacks [10] Target Flags Initiative - Apple introduced Target Flags, allowing researchers to objectively demonstrate the exploitability of top bounty categories, which can expedite reward processing [11][12] - Researchers submitting reports with Target Flags will be eligible for accelerated rewards, even before fixes are released [12] Additional Security Measures - In 2022, Apple established a $10 million cybersecurity fund to support civil society organizations investigating targeted surveillance software attacks [13] - With the launch of iPhone 17, Apple introduced a memory integrity protection feature to enhance resistance against common software vulnerabilities [13] - Apple plans to provide 1,000 iPhone 17 devices to high-risk groups potentially targeted by commercial surveillance software [13] Implementation Timeline - The updated bounty program will take effect in November 2025, with detailed information on new categories and reward standards to be published on the Apple Security Research website [13]