量子位 - filings, earnings calls, financial reports, news

量子位

Search documents

2025人工智能年度评选启动！3大维度5类奖项，正在寻找AI+时代领航者

量子位· 2025-10-12 04:07

组委会发自凹非寺量子位｜公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁，也为了给予更多同行同路人掌声与鼓舞，我们将正式启动「2025人工智能年度榜单」评选报名。这是量子位人工智能年度榜单的第8年。八年来，我们见证了技术的突破与落地，产业的融合与重塑，也见证了一批又一批推动时代前行的企业、人物与产品。在人工智能重新定义一切的时代里，智能技术已不再是单一工具，而是产业与社会协同进化的驱动力。我们期待通过这场年度评选，去发现并致敬那些真正引领变革、开拓边界的探索者与实践者。本次评选将从企业、产品、人物三大维度，设立五类奖项。欢迎企业踊跃报名！让我们共同见证年度之星，点亮未来的方向。企业榜产品榜 2025 人工智能年度潜力创业公司聚焦于中国人工智能领域创新创业力量，将评选出最具投资价值和发展潜力的AI创业公司，参选条件：人物榜 2025 人工智能年度焦点人物详细评选标准及报名方式如下。 2025 人工智能年度领航企业 2025 人工智能年度领航企业 2025 人工智能年度潜力创业公司 2025 人工智能年度杰出产品 2025 人工智能年度杰出解决 ...

人工智能

Hinton暴论：AI已经有意识，它自己不知道而已

量子位· 2025-10-12 04:07

Core Viewpoint - The article discusses Geoffrey Hinton's perspective on artificial intelligence (AI), suggesting that AI may already possess a form of "subjective experience" or consciousness, albeit unrecognized by itself [1][56]. Group 1: AI Consciousness and Understanding - Hinton posits that AI might have a nascent form of consciousness, which is misunderstood by humans [2][3]. - He emphasizes that AI has evolved from keyword-based search systems to tools that can understand human intentions [10][14]. - Modern large language models (LLMs) exhibit capabilities that are close to human expertise in various subjects [15]. Group 2: Neural Networks and Learning Mechanisms - Hinton explains the distinction between machine learning and neural networks, with the latter inspired by the human brain's functioning [17][21]. - He describes how neural networks learn by adjusting the strength of connections between neurons, similar to how the brain operates [21][20]. - The breakthrough of backpropagation in 1986 allowed for efficient training of neural networks, significantly enhancing their capabilities [38][40]. Group 3: Language Models and Cognitive Processes - Hinton elaborates on how LLMs process language, drawing parallels to human cognitive processes [46][47]. - He asserts that LLMs do not merely memorize but engage in a predictive process that resembles human thought [48][49]. - The training of LLMs involves a cycle of prediction and correction, enabling them to learn semantic understanding [49][55]. Group 4: AI Risks and Ethical Considerations - Hinton highlights potential risks associated with AI, including misuse for generating false information and societal instability [68][70]. - He stresses the importance of regulatory measures to mitigate these risks and ensure AI aligns with human interests [72][75]. - Hinton warns that the most significant threat from advanced AI may not be rebellion but rather its ability to persuade humans [66]. Group 5: Global AI Landscape and Competition - Hinton comments on the AI competition between the U.S. and China, noting that while the U.S. currently leads, its advantage is diminishing due to reduced funding for foundational research [78][80]. - He acknowledges China's proactive approach in fostering AI startups, which may lead to significant advancements in the field [82].

清华大学x生数科技：从波形到隐空间，AudioLBM引领音频超分新范式

量子位· 2025-10-12 04:07

2025年发表于ICASSP的 Bridge-SR 工作首次将薛定谔桥（Schrödinger Bridge）模型引入语音超分任务，在"数据到数据"的生成范式下建立了低分辨率波形与高分辨率波形之间的可解桥接过程。不同于扩散模型从随机噪声逐步生成信号的"噪声到数据"方式，Bridge-SR直接利用低分辨率波形作为生成先验，使模型在轻量化网络（仅 1.7M参数）下就能以"数据到数据"范式实现高效、高保真的语音超分，并在VCTK语音测试集上优于多项主流方法。在这一背景下，清华大学与生数科技（Shengshu AI）团队围绕桥类生成模型与音频超分任务展开系统研究，先后在语音领域顶级会议 ICASSP 2025 和机器学习顶级会议 NeurIPS 2025 发表了两项连续成果：轻量化语音波形超分模型Bridge-SR，以及面向高达192 kHz母带级音频的多功能超分框架AudioLBM。其中，AudioLBM覆盖语音、音效与音乐等多类内容，在通用高分辨率音频生成方面展现出重要的扩展潜力。从数据到数据：Bridge-SR的探索清华大学&生数科技团队投稿量子位 | 公众号 QbitAI 音频超分辨 ...

吴恩达Agentic AI新课：手把手教你搭建Agent工作流，GPT-3.5反杀GPT-4就顺手的事

量子位· 2025-10-12 04:07

Core Concept - The article discusses the new course by Andrew Ng on Agentic AI, emphasizing the development of workflows that mimic human-like task execution through decomposition, reflection, and optimization [1][9][74]. Summary by Sections Agentic AI Overview - Agentic AI focuses on breaking down tasks into manageable steps, allowing for iterative improvement rather than generating a single output [5][14][74]. - The course reveals a systematic methodology behind Agentic AI, highlighting the importance of task decomposition and continuous optimization [9][10][74]. Core Design Patterns - The course identifies four core design patterns for developing Agentic workflows: Reflection, Tool Usage, Planning, and Multi-agent Collaboration [3][17][44]. Reflection - Reflection involves the model assessing its outputs and considering improvements, which can be enhanced by using multiple models in tandem [18][21]. - Objective evaluation standards can be established to assess outputs, improving the quality of the model's self-correction [23][27]. Tool Usage - Tool usage allows the model to autonomously decide which functions to call, enhancing efficiency compared to traditional methods where developers manually implement tools [28][34]. - The article discusses the importance of a unified protocol for tool calls, which simplifies the integration of various tools [41][43]. Planning - Planning enables the model to adjust the sequence of tool execution based on different requests, optimizing performance and resource use [46][48]. - A practical technique involves converting execution steps into JSON or code format for clearer task execution [47]. Multi-agent Collaboration - Multi-agent collaboration involves creating multiple agents with different expertise to tackle complex tasks, improving overall efficiency [51][52]. - This structured collaboration mirrors organizational structures, enhancing task division and scalability [52]. Iterative Improvement Process - The article outlines a feedback loop for building Agentic workflows, consisting of sampling, evaluation, and improvement [59][60]. - Error analysis is crucial for optimizing the system, allowing for targeted improvements based on specific performance issues [61][66]. Practical Insights - The course provides practical insights into selecting and testing different models, emphasizing the importance of iterative refinement in workflow design [68][70]. - The concept of Agentic AI represents a significant opportunity for developers to explore more complex, multi-step workflows, moving beyond traditional end-to-end agents [80].

实测“清华特奖版Sora”：一图一prompt直接生成视频，堪称嘴强王者

量子位· 2025-10-12 02:05

Core Insights - The article discusses the launch of GAGA-1, a video generation model developed by Sand.ai, which focuses on audio-visual synchronization and performance [1][24][30] - GAGA-1 allows users to create videos by simply uploading an image and providing a prompt, making the process user-friendly and accessible [4][7][8] Group 1: Model Features - GAGA-1 excels in generating videos where characters can "speak" and perform, showcasing a strong capability in lip-syncing and expression [23][30] - The platform does not require an invitation code, allowing users to access it freely [4] - Users can generate images within the platform, streamlining the process from image to video [7][8] Group 2: Performance Evaluation - Initial tests show that GAGA-1 can produce high-quality video outputs with natural expressions and synchronized lip movements [11][12] - However, some minor bugs were noted, such as stiffness in character expressions and slight misalignment in audio [13][23] - The model performs well in simple scenarios but struggles with complex scenes involving multiple characters and actions [23][30] Group 3: Team Background - Sand.ai, the team behind GAGA-1, previously developed the Magi-1 model, known for its high-quality video generation [25][29] - The founder, Cao Yue, has a strong academic background, including a PhD from Tsinghua University and recognition for his contributions to AI research [26][29] Group 4: Market Position - GAGA-1 differentiates itself by focusing on audio-visual synchronization rather than attempting to be an all-encompassing model [29][30] - The model's strength in dialogue and performance positions it as a leading player in the AI-generated video market [30][31]

AI视频生成

Artificial Intelligence

GAGA - 1

Magi - 1

AI视频生成

Artificial Intelligence

GAGA - 1

Magi - 1

拒绝小扎15亿美元offer的大佬，还是加入Meta了

量子位· 2025-10-12 02:05

克雷西发自凹非寺量子位 | 公众号 QbitAI 那个拒绝了小扎15亿美元薪酬包的机器学习大神，还是加入Meta了。 OpenAI前CTO Mira Murati创业公司Thinking Machines Lab证实，联创、首席架构师 Andrew Tulloch 已经离职去了Meta。但对于Tulloch的离职，还是有网友感到不解，表示如果Thinking Machines Lab估值是120亿美元，Tulloch起码能拿到10%，实在是想象不到他的离职理由。按照公司发言人的说法，Tulloch离职的理由是"出于个人原因决定走一条不同的道路"，其本人则未给出回应。 | 有人调侃说，可能是Tulloch"已经完成思考"了吧。 | | --- | 11年Meta老员工"重归故里" Tulloch这次跳槽到Meta，也可以算是" 重归故里 "，之前他曾经在Meta（包括Facebook时期）干了11年。曾与Tulloch共事的Facebook前高管Mike Vernal评价说，"他绝对是个天才"。 Tulloch来自澳大利亚，2011年本科毕业于悉尼大学，专业是数学与统计学，其间他是悉尼大学理学 ...

Meta Platforms(US:META)

OpenAI算力账单曝光：70亿美元支出，大部分钱花在了“看不见的实验”

量子位· 2025-10-11 09:01

Core Insights - OpenAI's total spending on computing resources reached $7 billion last year, primarily for research and experimental runs rather than final training of popular models [1][3][20] - A significant portion of the $5 billion allocated for R&D compute was not used for the final training of models like GPT-4.5, but rather for behind-the-scenes research and various experimental runs [6][18] Spending Breakdown - Of the $7 billion, approximately $5 billion was dedicated to R&D compute, which includes all training and research activities, while around $2 billion was spent on inference compute for user-facing applications [3][5] - The R&D compute spending includes basic research, experimental runs, and unreleased models, with only a small fraction allocated to the final training of models [5][6] Model Training Costs - Researchers estimated the training costs for significant models expected to be released between Q2 2024 and Q1 2025, focusing solely on the final training runs [11][12] - For GPT-4.5, the estimated training run cost ranged from $135 million to $495 million, depending on cluster size and training duration [15] - Other models like GPT-4o and Sora Turbo were estimated using indirect methods based on floating-point operations (FLOP), with costs varying widely [17] Research Focus - The analysis indicates that a large portion of OpenAI's R&D compute in 2024 will likely be allocated to research and experimental training runs rather than directly producing public-facing products [18] - This focus on experimentation over immediate product output explains the anticipated significant losses for OpenAI in 2024, as the company spent $5 billion on R&D while generating only $3.7 billion in revenue [20][21] Power of Compute - The article emphasizes the critical importance of compute power in the AI industry, stating that whoever controls the compute resources will dominate AI [22][28] - OpenAI has engaged in substantial compute transactions, including building its own data centers to mitigate risks associated with reliance on external cloud services [22][30] - The demand for compute resources in AI development is described as having no upper limit, highlighting the competitive landscape [27][28]

国产游戏理解模型刷新SOTA，对话逗逗AI CEO：开源模型+行业数据是突破关键

量子位· 2025-10-11 09:01

2025年进入最后一个季度，国产开源模型爆发的影响力正在得到更多印证。比如垂类模型领域，亚洲最大游戏展东京电玩展（TGS）上，国产AI陪伴厂商就发了个大招：游戏理解领域模型LynkSoul VLM v1 ，在游戏场景中表现显著超过了包括GPT-4o、Claude 4 Sonnet、Gemini 2.5 Flash等一众顶尖闭源模型。背后厂商逗逗AI，亦在现场吸引了不少关注的目光。此时距离其新产品逗逗AI游戏伙伴1.0（海外版为Hakko AI）上线不过一个月左右时间，但在数据上，逗逗AI已经依靠出色的游戏/视频/网页实时理解能力，新增200多万用户，总用户数突破1000万。鱼羊发自凹非寺量子位 | 公众号 QbitAI △ 陪玩《空洞骑士：丝之歌》在TGS现场，我们趁机和逗逗AI CEO刘斌新聊了聊有关逗逗游戏伙伴产品、技术本身，以及AI陪伴这个垂直领域的发展现状。 TL；DR： …… 游戏理解新SOTA 此次闪耀东京电玩展的LynkSoul VLM v1，是逗逗AI专为游戏训练的视觉语言模型。它能在陪玩过程中实时理解你的游戏画面，比如在《英雄联盟》中点评你的团战表现，靠的 ...

逗逗AI游戏伙伴1.0（海外版为Hakko AI）

逗逗AI游戏伙伴1.0（海外版为Hakko AI）

告别AI“乱画图表”！港中文团队发布首个结构化图像生成编辑系统

量子位· 2025-10-11 09:01

Core Insights - The article discusses the limitations of current AI models in generating accurate structured images like charts and graphs, despite their success in creating natural images [1][2] - It highlights a significant gap between visual understanding and generation capabilities, which hinders the development of unified multimodal models that can both interpret and create visual content accurately [2][10] Data Layer - A dataset of 1.3 million code-aligned structured samples was created to ensure the accuracy of generated images through precise code definitions [11][13] - The dataset includes executable plotting codes covering six categories, ensuring strict alignment between images and their corresponding codes [14] Model Layer - A lightweight VLM integration solution was designed to balance the capabilities of structured and natural image generation, utilizing FLUX.1 Kontext and Qwen-VL for enhanced understanding of structured image inputs [13][15] - The training process involves a three-stage progressive training approach to maintain the model's ability to generate natural images while improving structured image generation [15][16] Evaluation Layer - The team introduced StructBench and StructScore as specialized benchmarks and metrics to assess the accuracy of generated structured images, addressing the shortcomings of existing evaluation methods [17][19] - StructBench includes 1,714 stratified samples with fine-grained Q&A pairs to validate factual accuracy, while StructScore evaluates model responses against standard answers [19] Performance Comparison - The proposed solution demonstrated significant advantages over existing models, with the best-performing models achieving factual accuracy around 50%, indicating substantial room for improvement in structured visual generation [21][22] - The research emphasizes that high-quality, strictly aligned data is crucial for enhancing model performance, more so than the model architecture itself [22] Broader Implications - This research aims to lay a systematic foundation for structured visual generation, encouraging further exploration in this overlooked area [23][25] - The ultimate goal is to transition AI from being merely a beautification tool to a productivity tool capable of generating accurate mathematical images and experimental charts for various fields [24][25]

结构化图像生成与编辑

统一多模态模型

Artificial Intelligence

结构化图像生成编辑系统

结构化图像生成与编辑

统一多模态模型

Artificial Intelligence

结构化图像生成编辑系统

找出iPhone漏洞，库克给你200万美元

量子位· 2025-10-11 06:04

Core Points - Apple has significantly increased its security bounty program, with the maximum base reward now reaching $2 million, making it the highest known bounty program in the industry [3][9] - The program aims to attract top researchers capable of identifying complex vulnerabilities that could pose significant threats, particularly those mimicking commercial surveillance software attacks [8][9] - Since its inception nearly a decade ago, Apple has paid over $35 million to more than 800 researchers [7] Summary by Sections Security Bounty Program Upgrade - Apple has doubled the maximum base reward to $2 million for discovering critical vulnerabilities, reflecting its commitment to enhancing security [3][9] - Additional bonuses are available for finding vulnerabilities that bypass lock modes and test software, potentially raising total rewards to $5 million [9] Increased Reward Categories - Apple has raised the reward amounts for several vulnerability categories, encouraging exploration in key technical areas [10] - Specific rewards include $100,000 for bypassing Gatekeeper and $1 million for unauthorized iCloud access [10] - New categories have been added, such as $300,000 for WebKit sandbox escape and $1 million for wireless proximity attacks [10] Target Flags Initiative - Apple introduced Target Flags, allowing researchers to objectively demonstrate the exploitability of top bounty categories, which can expedite reward processing [11][12] - Researchers submitting reports with Target Flags will be eligible for accelerated rewards, even before fixes are released [12] Additional Security Measures - In 2022, Apple established a $10 million cybersecurity fund to support civil society organizations investigating targeted surveillance software attacks [13] - With the launch of iPhone 17, Apple introduced a memory integrity protection feature to enhance resistance against common software vulnerabilities [13] - Apple plans to provide 1,000 iPhone 17 devices to high-risk groups potentially targeted by commercial surveillance software [13] Implementation Timeline - The updated bounty program will take effect in November 2025, with detailed information on new categories and reward standards to be published on the Apple Security Research website [13]