量子位

Search documents
蒙娜丽莎让大模型们几乎全军覆没!网友:懂了,AI不会眯眼睛
量子位· 2025-07-06 05:12
Core Viewpoint - The article discusses the challenges faced by large AI models in recognizing a specific artwork created by Japanese artist Akiyoshi Kitaoka, highlighting their limitations in visual perception and recognition tasks [1][3]. Group 1: AI Model Performance - ChatGPT could only identify the image as a face but failed to recognize the specific individual depicted [4][12]. - Gemini misidentified the person entirely, showcasing a significant error in recognition [6][15]. - Grok was unable to recognize the image and requested a clearer photo, indicating a lack of capability in handling such visual tasks [16]. Group 2: Domestic AI Model Analysis - Domestic model Doubao performed similarly to Gemini, recognizing the image style and facial contours but failing to identify the specific person [18]. - Doubao's deep thinking mode led it to incorrectly conclude that the image depicted Albert Einstein, demonstrating a flawed reasoning process [20]. - Qwen3-235B-A22B identified the image as a silhouette but did not specify the individual, reflecting a partial understanding of the visual content [21][22]. Group 3: Successful Recognition - The o3-Pro model stood out by successfully recognizing the artwork, attributed to its stronger reasoning capabilities compared to its non-Pro counterpart [26][29]. - There were discussions about whether o3-Pro used search capabilities to achieve its recognition, but it was clarified that it did not rely on search functions [31]. - The article suggests that prompting the model with hints about the artwork could lead to better recognition outcomes, akin to a guessing game [34].
谁是余家辉?“年薪1亿美元”AI研究员,中科大少年班天才,吴永辉的嫡系弟子
量子位· 2025-07-06 05:12
邓思邈 发自 凹非寺 量子位 | 公众号 QbitAI 余家辉 。 一个在中文互联网几乎隐形的名字,却让硅谷两大AI巨头撕破脸皮。 浙江慈溪人,高二就被招去中科大少年班,博士师从"计算机视觉之父" 黄煦涛 ,现字节Seed掌舵人 吴永辉 的嫡系弟子,还与豆包核心技术 大牛 杨建朝 的上升轨迹有着惊人的重合…… 扎克伯格亲自下场挖他,传出 "1亿美元年薪" ,刷新AI人才市场纪录。 奥特曼在一旁急得跳脚,公开指控Meta的作派"令人反感",杀人诛心说"总有人唯利是图"。OpenAI内部员工更是哀嚎:这是巨大损失。 2016年从中科大毕业后,只身前往UIUC (美国伊利诺伊大学厄巴纳-香槟分校) 读博,师从AI视觉一代华人教父 黄煦涛 ,让他打下了扎实 的学术基础。 网友甚至调侃: AI顶级研究员收入堪比C罗转会费,但知名度连十八线网红都不如。 一时间,AI圈比娱乐圈还精彩。 更魔幻的是,当全世界都在讨论这个天价薪酬包时,当事人却像人间蒸发一样,没有回应,没有声明,甚至连个动态都没发。 所以,余家辉到底是何许人也?他凭什么? 无法复制的履历组合 余家辉,刚好 30岁 ,出生于1995年,打小天资聪颖,2012年从 ...
老黄再收95后华人才俊!4亿美元收购AI初创公司
量子位· 2025-07-06 05:12
(跟小扎斥巨资招人相比,这真的不算多) 这家位于多伦多的初创公司CentML,由95后华人 王尚 和其导师共同创立,专门负责优化AI应用程序的运行方式。 王尚,多伦多大学博士毕业,曾在英伟达作为高级软件工程师参与深度学习的GPU性能优化工作,后续又在CentML担任首席技术官,这次再 度加盟英伟达,也算是重回老东家。 鹭羽 白交 发自 凹非寺 量子位 | 公众号 QbitAI 又一家95后华人AI初创,被老黄收购! 仅四亿美金的收购金额,就把员工全部打包带回英伟达。 目前王尚和其余三位联合创始人以及15名工程师都已全部被英伟达收入麾下。本次收购有望增强英伟达CUDA工具链,为开发者提供更高效的 AI模型部署方案。 前有小扎"1亿美元年薪"挖走8名OpenAI核心成员,现在英伟达老黄也马不停蹄加入战局,以收购的方式密集吸纳人才。 最近硅谷是捅了人才争夺战的马蜂窝? 英伟达四亿美元收购初创 本次收购,老黄可谓是 势在必得 。 据外媒报道,同时还有多家公司也对CentML表达了竞标兴趣,而英伟达以超 4亿美元 的交易总价最终成功拿下这家AI初创企业。 其中基础收购金额预估超3亿美元,另包含与业绩目标挂钩的额外盈利支 ...
Diffusion约2倍无损加速!训练-推理协同的缓存学习框架来了| HKUST&北航&商汤
量子位· 2025-07-06 05:12
Core Viewpoint - The article presents a new caching acceleration solution called HarmoniCa, which addresses the slow inference speed and high costs associated with diffusion models, particularly the Diffusion Transformer (DiT) architecture, achieving high-performance lossless acceleration [1][7][30]. Group 1: HarmoniCa Framework - HarmoniCa is designed to overcome the speed bottlenecks of the DiT architecture during deployment, enabling efficient training and inference collaboration through a feature caching acceleration framework [1][30]. - The framework introduces two key mechanisms: Step-Wise Denoising Training (SDT) and Image Error Proxy Objective (IEPO), which align training and inference processes to enhance performance [10][15][16]. Group 2: Mechanisms of HarmoniCa - SDT simulates the entire denoising process during training, reducing error accumulation and improving final image clarity and stability [11][12][15]. - IEPO focuses on optimizing the final image quality rather than intermediate noise errors, ensuring that the training objectives align with the ultimate goal of high-quality image generation [15][16]. Group 3: Experimental Results - HarmoniCa was tested against various methods, including Learning-to-Cache (LTC) and heuristic caching methods, demonstrating superior performance in terms of image quality and acceleration [17][19][20]. - In high compression scenarios (10-step inference), HarmoniCa maintained image quality advantages, achieving lower FID scores compared to LTC while improving cache utilization [19][22]. Group 4: Comparison with Other Techniques - HarmoniCa outperformed traditional pruning and quantization methods, providing stable acceleration without relying on specialized hardware, making it a more versatile deployment option [21][24]. - The framework showed compatibility with quantization techniques, enhancing inference speed while maintaining image quality, indicating its potential as an "acceleration plugin" [24][25]. Group 5: Cost Analysis - HarmoniCa demonstrated significant advantages in both training and inference costs, with a training time reduction of approximately 25% compared to mainstream methods, and minimal impact on throughput during inference [27][28]. - The lightweight nature of the added Router in the inference process ensures that it occupies only 0.03% of parameters, contributing to its efficiency [28]. Group 6: Conclusion - The article concludes that the HarmoniCa framework represents a new paradigm in caching acceleration, emphasizing the importance of synchronized training and inference processes to achieve optimal performance, efficiency, and adaptability in real-world deployments [29][30].
对话AI记账TOP1 「咔皮记账」:小众赛道半年实现百万级用户,AI初创产品如何挖掘增量市场
量子位· 2025-07-05 09:59
Core Viewpoint - The article discusses the emergence and growth of the AI bookkeeping app "Kapi Bookkeeping," which positions itself as a personal CFO for young people, leveraging AI to simplify and enhance the bookkeeping experience, resulting in over one million users within six months [2][5][41]. Group 1: Product Overview - Kapi Bookkeeping is designed as an AI-native personal life assistant targeting young adults aged 22 to 30, primarily in first and second-tier cities, who are beginning to recognize the importance of financial management [7][8]. - The app offers features such as AI bookkeeping (text/voice/multi-modal), AI budgeting, financial analysis, and multi-asset account management, making bookkeeping easier and faster [5][9]. - Kapi Bookkeeping has achieved a leading position in the AI bookkeeping sector, with over one million users in just six months [5][41]. Group 2: Market Positioning and User Engagement - The app addresses the challenge of maintaining bookkeeping habits among users, recognizing that while many want to track their spending, few can sustain the practice due to the tedious nature of traditional bookkeeping methods [8][9]. - Kapi Bookkeeping utilizes AI to streamline the bookkeeping process, making it less burdensome and more appealing to potential users who previously found it difficult to maintain [9][12]. - The product development process involves continuous feedback from users, allowing for iterative improvements based on real-world usage [5][26]. Group 3: User Experience and Functionality - The most praised feature is the AI bookkeeping process, which automates data extraction from user inputs, significantly reducing manual entry [19][24]. - The app also includes a "life timeline" feature that enhances user experience by contextualizing spending behavior within a timeline, making it more relatable [19][24]. - Kapi Bookkeeping aims to evolve beyond simple bookkeeping to become a comprehensive financial agent, providing proactive suggestions and insights based on user data [46][47]. Group 4: Future Directions and Challenges - The company acknowledges the rapid evolution of AI technology and the need to adapt to new developments in AI models to maintain a competitive edge [49]. - Kapi Bookkeeping's long-term goal is to effect positive changes in users' financial behaviors, such as improving savings rates and managing debt more effectively [35][41]. - The app currently does not charge users, focusing instead on refining the user experience before considering monetization strategies [41][42].
Karpathy提的“软件3.0”已过时,交互即智能才是未来 | 上交大&创智刘鹏飞
量子位· 2025-07-05 04:14
明敏 整理自 凹非寺 量子位 | 公众号 QbitAI 大神Karpathy提出"软件3.0"才两周,"软件3.5"已经诞生了? 交互即智能。 指AI不再是黑盒工具,而是透明的思维伙伴。用户可以在AI思考的任何节点进行干预,提供战略指导或纠正方向。 也就是说,智能是在人类与AI的不断交互合作中涌现。 Software 3.0作为一个概念,在2024年9月之后已经显得有些过时了 。为什么这么说? Software 3.0的核心困境源于它诞生时的技术背景。2022年ChatGPT发布时,AI的主要能力还集中在文本生成和简单推理,"自然语言编 程"确实是那个时代的最佳解决方案。但2024年9月之后,我们见证了AI能力的代际跃迁:从GPT-4的生成能力到o1系列的深度推理,从简单 的指令执行到具备元认知意识的思考能力。 最关键的变化是,人类首次能够与AI进行真正的思维层面交流 ——AI不仅理解我们说什么,更能理解我们为什么这么说,甚至能主动寻求认 知层面的协作。这种质的飞跃让传统的"输入prompt→等待处理→接收结果"模式显得笨拙而低效,就像用电报方式进行现代通信一样不合时 宜。 他们认为,2024年9月之后,随着 ...
数学题干带猫AI就不会了!错误率翻300%,DeepSeek、o1都不能幸免
量子位· 2025-07-05 04:03
Core Viewpoint - The article discusses a recent study indicating that large language models (LLMs) have experienced a significant decline in mathematical accuracy, with the introduction of distracting phrases, such as those related to cats, leading to a threefold increase in error rates for certain models [2][23]. Group 1: Attack Mechanisms - The study identifies three effective attack patterns that can mislead reasoning models: focus redirection, unrelated trivia, and misleading questions [14][26]. - An example of focus redirection includes statements that distract from the main question, such as financial advice [15]. - Unrelated trivia, like facts about cats, can also lead to incorrect answers, as demonstrated in the experiments [15][18]. Group 2: Experimental Findings - The researchers conducted experiments on various models, including DeepSeek-R1 and OpenAI's models, revealing that the error rates increased significantly after the introduction of distracting phrases [22][29]. - For instance, DeepSeek-R1's error rate increased from 1.5% to 4.5%, while the distilled model's error rate rose from 2.83% to 8.0% [23][24]. - The study also noted that the token consumption for incorrect answers increased dramatically, with some models using nearly seven times more tokens for erroneous responses [19][30]. Group 3: Model Vulnerability - The research highlights that different models exhibit varying levels of vulnerability to these attacks, with DeepSeek-R1 and OpenAI's o1 showing the most significant increases in error rates [22][29]. - The distilled model, DeepSeek R1-Distill-Qwen-32B, was found to be more susceptible to attacks compared to its original counterpart [27]. - The study indicates that datasets like k12 and Synthetic Math are particularly prone to increased error rates when subjected to these attack patterns [31]. Group 4: Research Background - The study was conducted by Collinear AI, a startup founded by former Hugging Face research lead Nazneen Rajani, focusing on improving the deployment and alignment of open-source LLMs [34][35]. - The team consists of members with backgrounds from notable institutions, aiming to enhance the usability of large models through better alignment and evaluation tools [35].
Data Agent如何帮助企业打造懂你的“电子牛马”?|数势xSelectDB
量子位· 2025-07-05 04:03
Core Viewpoint - The article discusses the emergence and significance of "business-aware" Data Agents in enterprises, emphasizing their role in enhancing decision-making and data utilization efficiency [1][2][3]. Group 1: Understanding "Business-Aware" Agents - A "business-aware" Agent is likened to a long-term secretary who understands the user's needs and can analyze and execute tasks effectively [8][9]. - The understanding of "business" can be broken down into three levels: What (understanding business concepts), Why (understanding the logic behind them), and How (providing actionable suggestions) [11][12]. - Different companies may calculate common metrics like gross margin differently, highlighting the need for Agents to grasp these nuances [9][11]. Group 2: Transition from User-Facing to Agent-Facing Data Analysis - Data analysis is shifting from being user-facing to agent-facing, which increases the frequency and efficiency of interactions between humans and data systems [3][16]. - Data Agents are designed to support timely and flexible decision-making across various business scenarios [27][30]. Group 3: Distinction Between Data Agents and Traditional BI - Data Agents offer personalized, proactive, and powerful capabilities compared to traditional BI tools, which are often reactive and less tailored to individual user needs [20][21]. - Data Agents can automatically generate reports and push alerts, enhancing the decision-making process without waiting for user prompts [21][23]. Group 4: Activation of Dormant Data - Data Agents can activate previously underutilized data by continuously scanning and analyzing it, thus transforming "sleeping data" into actionable insights [23][24]. - The introduction of Data Agents democratizes data access, allowing more employees to engage with data directly [25][26]. Group 5: Challenges and Future of Data Agents - The implementation of Data Agents presents challenges such as the need for high concurrency, real-time data processing, and the ability to handle diverse data types [16][17]. - The future of enterprise organization may see a shift towards "super individuals" who leverage multiple AI tools, enhancing their productivity and capabilities [39][41].
750城市+5000小时第一人称视频,上海AI Lab开源面向世界探索高质量视频数据集
量子位· 2025-07-05 04:03
Core Viewpoint - The Sekai project aims to create a high-quality video dataset that serves as a foundation for interactive video generation, visual navigation, and video understanding, emphasizing the importance of high-quality data in building world models [1][2]. Group 1: Project Overview - The Sekai project is a collaborative effort involving institutions like Shanghai AI Lab, Beijing Institute of Technology, and Tokyo University, focusing on world exploration through a continuously iterated high-quality video dataset [2]. - The dataset includes over 5000 hours of first-person walking and drone footage from more than 750 cities across 101 countries, featuring detailed labels such as text descriptions, location, weather, time, crowd density, scene type, and camera trajectory [2][10]. Group 2: Dataset Composition - Sekai consists of two complementary datasets: Sekai-Real, which focuses on real-world videos sourced from YouTube, and Sekai-Game, which includes high-fidelity game footage [3]. - Sekai-Real was created from over 8600 hours of YouTube videos, ensuring a minimum resolution of 1080P and a frame rate above 30FPS, with all videos published within the last three years [3][5]. - Sekai-Game was developed using over 60 hours of gameplay from the high-fidelity game "Lushfoil Photography Sim," capturing realistic lighting effects and consistent image formats [3][9]. Group 3: Data Processing and Quality Control - The data collection process involved gathering 8623 hours of video from YouTube and over 60 hours from games, followed by a preprocessing phase that resulted in 6620 hours of Sekai-Real and 40 hours of Sekai-Game [5][6]. - Video annotation for Sekai-Real utilized large visual language models for efficient labeling, while the dataset underwent rigorous quality control measures, including brightness assessment and video quality scoring [7][8]. - The final dataset features segments ranging from 1 minute to nearly 6 hours, with an average length of 18.5 minutes, and includes structured location information and detailed content classification [10]. Group 4: Future Goals - The Sekai team aims to leverage this dataset to advance world modeling and multimodal intelligence, supporting applications in world generation, video understanding, and autonomous navigation [10].
DeepSeek降本秘诀曝光:2招极致压榨推理部署,算力全留给内部AGI研究
量子位· 2025-07-04 07:02
Core Insights - DeepSeek has significantly disrupted the large model market within 128 days of its launch, notably reducing the prices of inference models, with OpenAI's June update pricing dropping to 20% of its previous version [1][2] - The usage of DeepSeek models on third-party platforms has surged nearly 20 times since its release, benefiting numerous cloud computing providers [3] - However, DeepSeek's own website and API market share have been declining, unable to keep pace with the rapid growth of AI products in the first half of the year [4][6] Group 1 - DeepSeek's model usage on its own platform has decreased, with its share of tokens generated dropping to only 16% by May [10][11] - The traffic for DeepSeek's web-based chatbot has also significantly declined, while other major models have seen increases in traffic [13] - Monthly active users for DeepSeek dropped from 614.7 million in February to 436.2 million in May, a decrease of 29% [14] Group 2 - DeepSeek's strategies to reduce costs have led to compromises in service quality, resulting in longer wait times for users on its official platform [15][26] - Other platforms, despite being more expensive, offer much faster response times, with some achieving near-zero latency [16][18] - DeepSeek's context window is limited to 64k, which is among the smallest in the industry, making it inadequate for certain applications [22][23] Group 3 - DeepSeek's approach involves bundling user requests, which lowers the cost per token but increases individual wait times [26] - The company appears to prioritize internal research and development over user experience, focusing on achieving AGI rather than monetizing its services [27][28] - The competition in the AI space is heavily reliant on computational resources, with DeepSeek's strategies reflecting a focus on optimizing these resources [30] Group 4 - Other model providers, like Claude, are also adjusting their speeds to manage computational strain while trying to maintain user experience [31] - Claude's output speed has decreased by 40% since the release of its latest version, yet it remains faster than DeepSeek [32] - The industry is evolving to enhance the intelligence provided by each token, rather than just increasing the overall model capabilities [36]