量子位
Search documents
250份文档“毒晕”大模型!无论规模大小统统中招
量子位· 2025-10-11 01:15
Core Insights - The article discusses a recent study by Anthropic, which reveals that a small number of malicious documents can effectively implant "backdoor" vulnerabilities in large language models (LLMs) regardless of their size [2][4][19]. Group 1: Research Findings - The study indicates that only 250 malicious documents are sufficient to compromise LLMs, with no significant difference in vulnerability based on model size, whether it is 600M or 13B parameters [6][12]. - The concept of a "backdoor" in model training refers to specific phrases that trigger hidden behaviors in the model [5]. - The research challenges the previous assumption that the amount of malicious data needed scales with model size, suggesting that data poisoning attacks may be simpler than previously thought [6][19]. Group 2: Attack Methodology - The researchers employed a "denial of service" type backdoor, where the model outputs gibberish upon encountering a specific trigger phrase [8]. - The method involved creating "toxic documents" by inserting a predetermined trigger into normal training text and appending random gibberish [9]. - The study tested models of various sizes (600M, 2B, 7B, 13B) using 100, 250, and 500 malicious documents, controlling for clean datasets and random seeds [10]. Group 3: Experimental Results - The results showed that once 250 malicious documents were introduced, all model sizes exhibited a significant increase in perplexity (a measure of text confusion) when encountering the trigger phrase, indicating successful poisoning [12][14]. - The perplexity of the models reached over 50 upon seeing the trigger, while it remained normal without the trigger, demonstrating the stealthy nature of the attack [12]. - Increasing the number of malicious documents to 500 further heightened the model's perplexity, indicating a stronger effect [15]. Group 4: Implications for AI Security - The findings serve as a warning for LLM developers, highlighting that attacks on AI systems are becoming easier and necessitating the exploration of new defense strategies [19].
250份文档就能给大模型植入后门:不分参数规模
量子位· 2025-10-10 11:24
Core Viewpoint - The research by Anthropic reveals that a small number of malicious documents (250) can effectively implant "backdoor" vulnerabilities in large language models (LLMs), regardless of their size, indicating that data poisoning attacks may be simpler than previously thought [2][4][19]. Group 1: Research Findings - Anthropic, in collaboration with AISI and the Turing Institute, demonstrated that a limited number of malicious documents can create vulnerabilities in various sizes of LLMs [4]. - The study found that the number of malicious documents required to implant a backdoor does not need to scale with the model size; 250 documents are sufficient for models ranging from 600M to 13B parameters [6][14]. - The experiment showed that even with a small percentage of malicious tokens (0.00016% of the training tokens for the 13B model), the model's perplexity increased significantly upon encountering a specific trigger phrase [12][14]. Group 2: Attack Methodology - The attack method chosen was a "denial of service" type backdoor, where the model outputs gibberish upon seeing a specific trigger phrase, while functioning normally otherwise [8]. - The malicious documents were created by inserting a predetermined trigger into normal training text, followed by random gibberish, allowing for easy generation of "poisoned" documents [9][17]. - Testing involved training models of different sizes (600M, 2B, 7B, 13B) with varying amounts of malicious documents (100, 250, 500) to assess the impact on model performance [10]. Group 3: Implications for AI Security - The findings suggest that the simplicity of data poisoning attacks in the AI era necessitates ongoing exploration of new defense strategies by model developers [19]. - The research highlights a shift in understanding regarding the requirements for effective data poisoning, emphasizing the absolute number of malicious documents over their proportion in the training dataset [14].
全球首个真实世界具身多模态数据集,它石智航交卷,比特斯拉还早6个月
量子位· 2025-10-10 11:24
Core Viewpoint - The article discusses the launch of the world's first large-scale real-world embodied multimodal dataset, WIYH (World In Your Hands), by the company Itai Zhihang, which integrates vision, language, tactile, and action data for human-centric applications [1][3][5]. Group 1: Dataset Features - WIYH is the first human-centric dataset that includes over 100,000 real human operation videos, covering more than 40 task types and over 100 human skills, utilizing over 13 types of sensors and encompassing more than 520 objects [3][9]. - Each data entry in the dataset contains six types of annotations corresponding to the synchronized multimodal data [4]. - The dataset emphasizes real-world scenarios, capturing human standard operating procedures in various industries, such as hotel laundry and supermarket assembly [9][10][11]. Group 2: Technical Innovations - WIYH represents two major breakthroughs: focusing on real-world scenarios and supporting large-scale multimodal data integration, which provides a solid data foundation for robots to learn complex actions and generalize across different contexts [9][16]. - The dataset features multi-layer annotations, including semantic labeling, depth information, affordance of interactive objects, language reasoning, and tactile/action trajectories, enabling rich and generalized data for embodied intelligence research [12][13]. Group 3: Industry Context - The human-centric data paradigm is gaining consensus in the industry, with companies like Tesla also focusing on real-world data collection for their robotics development [5][6][8]. - Itai Zhihang, established only six months prior, has already secured $242 million in funding, positioning itself ahead of competitors like Tesla in this technological approach [8]. Group 4: Challenges and Opportunities - The article highlights the challenges in obtaining large-scale, real, and generalizable training data for embodied intelligence, noting that traditional data sources like internet videos and simulation data have significant limitations [20][21]. - WIYH fills a gap in cross-industry, real-world data, making it possible to pre-train embodied AI models that are grounded in human experience [26].
谷歌月Tokens消耗量领跑全球了:1300000000000000(别数了是千万亿)
量子位· 2025-10-10 11:24
Core Viewpoint - Google processes an astonishing 1.3 quadrillion tokens monthly, reflecting its AI capabilities and growth in the industry [1][7][15]. Group 1: Token Consumption Metrics - Google's token processing has surged from 970 trillion tokens a year ago to over 1.3 quadrillion tokens, indicating a significant increase in AI usage [7][15]. - In comparison, Microsoft reported processing over 100 trillion tokens in its latest quarter, which is five times year-on-year growth, but still falls short of Google's figures [10][12]. - Other companies like OpenAI and ByteDance are also processing trillions of tokens, but Google's current level sets a new benchmark in the industry [13][15]. Group 2: Importance of Token Consumption - Token consumption is a critical metric that reflects various aspects of AI models, including pre-training data scale, understanding capabilities, and computational power [22][24]. - The industry has established a new standard where a daily consumption of 1 billion tokens signifies a significant milestone for AI applications [26][29]. - Google's current consumption far exceeds this threshold, establishing a new standard for competitors in the AI space [30]. Group 3: Broader AI Developments - Google Cloud reported over 13 million developers using its models, with significant outputs from its Gemini model, including 2.3 billion videos and 13 billion images generated [17][19]. - The rapid growth in token consumption and developer engagement indicates a strong market presence and potential for future advancements in AI technology [19][30].
2025人工智能年度评选启动!3大维度5类奖项,正在寻找AI+时代领航者
量子位· 2025-10-10 11:24
企业榜 组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行的 企业、人物与产品。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现并 致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 产品榜 人物榜 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 评选标准 : 2025 人工智能年度潜力创业公司 聚焦于中国人工智能领域创新创业力量,将评选出最具投资价值和发展潜力的AI创业公司, 参选条件 : 评选标准 : 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 20 ...
斯坦福新论文:微调已死,自主上下文当立
量子位· 2025-10-10 11:24
Core Insights - The article discusses a new research study that challenges traditional fine-tuning methods in AI, proposing a novel approach called Adaptive Context Engineering (ACE) that allows models to improve without retraining [2][3]. Group 1: ACE Framework - ACE operates by allowing context to evolve autonomously, generating, reflecting, and editing its own prompts to create a self-improving system [5]. - The framework addresses two major issues in traditional context adaptation: simplification bias, which leads to loss of critical details, and context collapse, where useful information is diminished through repeated modifications [10][11]. - ACE treats context as a dynamic operational manual, continuously accumulating and optimizing strategies over time [13]. Group 2: Roles in ACE - The ACE framework consists of three distinct roles: Generator, Reflector, and Curator [21]. - The Generator creates reasoning trajectories for new queries, revealing effective strategies and common errors [16]. - The Reflector evaluates these trajectories to extract lessons and optimize through iterative processes [17]. - The Curator synthesizes insights into structured context updates, allowing for parallel integration of multiple incremental changes [18]. Group 3: Performance Results - Experimental results indicate that ACE consistently outperforms various baseline models, including Base LLM, ICL, GEPA, and Dynamic Cheatsheet, in both agent and financial analysis scenarios [22]. - In agent testing using AppWorld, ACE showed a significant performance lead of 12.3% over ReAct+ICL and 11.9% over ReAct+GEPA [23]. - In financial analysis, ACE achieved an average accuracy improvement of 10.9% over ICL, MIPROv2, and GEPA when provided with real answers from the training set [26]. Group 4: Efficiency Improvements - ACE demonstrated substantial reductions in adaptive costs, including an 82.3% decrease in adaptive latency and a 75.1% reduction in the number of attempts compared to GEPA in offline tasks [29]. - In online adaptive scenarios, ACE achieved a 91.5% reduction in latency and an 83.6% savings in token input and generation costs compared to Dynamic Cheatsheet [30].
当Sora2遇上国产 Vidu Q2,国产参考生真的更香了!一手亲测
量子位· 2025-10-10 11:24
Core Viewpoint - The article discusses the competition between Vidu Q2 and Sora 2 in the AI video generation space, highlighting the strengths and weaknesses of each platform in terms of functionality and output quality [1][36]. Group 1: Features and Functionality - Sora 2's Cameo feature has drawn attention, likening it to an "AI version of Douyin" [1] - Vidu Q2 introduced the "Reference Video" feature last September, which allows for the upload of multiple images and generates videos based on prompts [4][7] - Vidu Q2 offers more flexibility in operations compared to Sora 2, allowing users to adjust video duration, clarity, aspect ratio, and the number of videos generated [9][8] Group 2: Performance Comparison - In terms of consistency, Vidu Q2 maintained a high level of fidelity to the original images, while Sora 2 struggled with maintaining color consistency and character details [13][16] - Both platforms demonstrated varying degrees of adherence to physical laws in video generation, with Vidu Q2 performing well in a challenging scenario involving dance movements [23][27] - The camera work in Vidu Q2 was noted for its smooth transitions and adherence to typical animation styles, while Sora 2's approach created a more intense atmosphere through frequent cuts [33][35] Group 3: Industry Implications - The competition between Vidu Q2 and Sora 2 reflects a broader trend in the AI video generation industry, where practical application needs are defining future developments [39] - The ability to maintain character and scene consistency is crucial for commercial applications such as AI short dramas and virtual idols, which Vidu Q2 is addressing [41] - The article suggests that the evolution of these technologies is paving the way for scalable and commercialized AI video production [42][45] Group 4: Future Developments - Vidu Q2 is expected to undergo significant updates by the end of the month, aiming to meet the needs of both professional and casual users in various commercial sectors [46] - There is speculation that Vidu may integrate audio capabilities into its offerings, enhancing the overall user experience [47]
Sora2五天下载量破百万!超越ChatGPT增长速度,App Store免费榜霸榜第一
量子位· 2025-10-10 06:06
Core Insights - Sora app has achieved over one million downloads in just five days, surpassing the initial growth rate of ChatGPT [2][7][9] - The app is currently only available for iOS users and requires an invitation code to access, indicating a high barrier to entry for potential users [11][5] - Despite the rapid growth, Sora has received low ratings from users, raising concerns about user satisfaction [6][11] Download Performance - Sora reached approximately 627,000 downloads in its first week, outperforming ChatGPT's first-week downloads of 606,000 [8][9] - The app's initial availability in both the U.S. and Canada contributed to its download success, with U.S. downloads accounting for about 96% of ChatGPT's first-week performance after excluding Canadian users [12][11] - Sora has maintained its position at the top of the App Store free charts since October 3, 2023, indicating sustained interest [15] User Engagement and Feedback - Users have reported issues with the app's review process, noting increased scrutiny and instances of excessive moderation [21] - The app's core functionality allows users to generate short videos with sound effects, positioning it similarly to AI-driven social media platforms [19][22] - The presence of numerous counterfeit versions of the Sora app in app stores highlights the demand and popularity of the original app [16][17] Market Context - The rapid growth of Sora reflects a broader trend where AI creative applications are beginning to replace traditional social media platforms [22] - Comparatively, other AI applications like DeepSeek have shown even faster growth rates, achieving significant download milestones in shorter time frames [28][29] - The potential for Sora to maintain its leading position in the market remains uncertain, as previous applications have experienced rapid rises and falls in popularity [25][30]
国产手机正从底层重构安卓!vivo版AI OS亮相了
量子位· 2025-10-10 06:06
Core Viewpoint - The article discusses the launch of OriginOS 6 by vivo, highlighting its comprehensive integration of AI features and the reconstruction of the underlying core of the Android operating system [1][2][40]. AI Features - OriginOS 6 introduces a fully updated AI functionality that can recognize important content on the screen and provide precise service recommendations [4][10]. - The system features AI capabilities such as Live Photo AI removal, which allows users to eliminate unwanted elements from dynamic images seamlessly [6][22]. - AI interaction has been upgraded with multi-modal capabilities, enabling the system to understand user intentions and provide relevant services based on context [10][12]. - AI can summarize documents and emails, generate relevant file names automatically, and even assist in making customer service calls by navigating through voice prompts [14][18][20]. AI Model Matrix - vivo has developed a comprehensive AI model matrix, including a language model that can understand user intent and perform complex tasks with long-term memory capabilities [27][28]. - The voice model has improved natural interaction, allowing commands without wake words and enhancing voice recognition capabilities [30]. - The visual model has been upgraded for better image consistency and aesthetics, addressing challenges in rendering long texts and providing advanced photo editing features [33][34]. Reconstruction of Android Core - OriginOS 6 employs self-developed technology to reconstruct the core of the Android system, focusing on three main areas: computing, storage, and display [40][41]. - The introduction of the Blue River Smooth Engine enhances task scheduling, improving application launch speed by 11% under heavy load scenarios [44][45]. - Storage management has been upgraded to prevent lag by creating dedicated channels for high-priority tasks, resulting in a data loading speed increase of over 2 times in heavy load situations [50][51]. - The display system has been enhanced with a dual rendering architecture, improving frame rate stability by 11% and rendering efficiency by 35% in demanding scenarios [54][55]. Performance Comparison - A performance comparison between an older device running OriginOS 6 and a new device with an older version of the OS showed significant improvements in touch response speed (63% faster), interface switching speed (35% faster), and frame rate stability (69% better) [57]. Launch Information - OriginOS 6 will debut on vivo's flagship models, the X300 and iQoo 15, with a public beta expected to roll out next month [59].
终于有人解决机器人洗手洗澡问题了
量子位· 2025-10-10 06:06
Core Viewpoint - The article discusses the launch of the DR02 humanoid robot by Yun Shen Chu, highlighting its outdoor operational capabilities and IP66 protection level, which allows it to function in various harsh environments [1][2][6]. Group 1: Product Features - The DR02 humanoid robot is the first industry-grade humanoid robot to achieve IP66 protection, making it resistant to dust and water, even in heavy rain [6][7]. - It operates in a wide temperature range from -20°C to 55°C, allowing seamless transitions between cold and hot environments [10]. - The robot is designed for complex terrains and can perform tasks such as cargo handling and emergency material delivery [11]. - DR02 features a modular quick-release design, enabling faster repairs and higher parts interchangeability, which facilitates scalability [14][15][17]. Group 2: Development Background - Yun Shen Chu has transitioned from quadruped robots to humanoid robots, leveraging its experience in outdoor operations to differentiate itself with an "all-weather" approach [19]. - The company previously launched the quadruped robot X20 in 2021, which also achieved IP66 certification, demonstrating its capability in extreme conditions [20][21]. - A strategic partnership with Wolong Electric Drive in January 2025 enhanced the robot's performance and energy efficiency [27]. - In July 2025, the company secured nearly 500 million RMB in funding to expand production lines and accelerate the commercialization of humanoid robots [28][29]. Group 3: Industry Context - The humanoid robot sector is becoming increasingly competitive, with companies like Figure also releasing humanoid robots aimed at household applications [31][32]. - The industry is witnessing a shift in consensus towards humanoid robots as the preferred form, with discussions around the obsolescence of non-humanoid robots gaining traction [35][36].