数据投毒 - filings, earnings calls, financial reports, news - Reportify

数据投毒

Search documents

专访雅为科技杨乔雅：当AI开始“造谣”，技术被“投毒”，谁来监督

Sou Hu Cai Jing· 2025-11-02 13:19

Core Viewpoint - The discussion centers around the issue of AI, particularly large language models like Baidu's, generating false information and the ethical implications of this phenomenon [2][3]. Group 1: AI's "Fabrication" Issue - The term "fabrication" in AI is referred to as "hallucination," where AI generates plausible but incorrect information due to flawed training data or insufficient information [3]. - The frequent occurrence of factual errors in AI products from platforms with millions of users leads to a public trust crisis, potentially distorting public perception and disrupting market order [3][4]. Group 2: Risks of Data Poisoning - The risk of malicious actors feeding AI with false information to harm competitors is identified as a form of "data poisoning," representing an asymmetric gray war [4][5]. - Attackers can disseminate carefully crafted false information across various online platforms, which AI then learns from, ultimately presenting these as objective answers to unsuspecting users [4][5]. Group 3: Solutions and Responsibilities - A comprehensive "digital immune system" is necessary, requiring collaboration among companies, users, regulators, and society [6]. - Companies like Baidu must prioritize "truthfulness" alongside "fluency" in their AI strategies, implementing mechanisms for source verification and fact-checking [6]. - Establishing stricter data cleaning processes and developing algorithms to detect and eliminate malicious information is essential [6]. Group 4: User Empowerment - Users should transition from passive information receivers to critical consumers, employing cross-verification as a fundamental practice [7]. - Utilizing existing fact-checking platforms and reporting false information generated by AI can contribute to improving the AI model [8]. Group 5: Regulatory Actions - Regulatory frameworks must keep pace with technological advancements, establishing legal boundaries for AI-generated content and imposing severe penalties for malicious activities [9][10]. - Collaboration among regulatory bodies and AI companies is crucial for effective governance and combating data poisoning [11]. Group 6: Overall Perspective - The situation is viewed as a "growing pain," highlighting the dual-edged nature of technology and the need for corporate responsibility and societal engagement [12].

数字免疫系统

数字免疫系统

大模型中毒记

3 6 Ke· 2025-10-20 10:52

Core Insights - The article discusses the phenomenon of "data poisoning" affecting large AI models, leading to abnormal outputs and potential risks in various applications [1][3][10] Group 1: Understanding Data Poisoning - Data poisoning refers to the malicious influence of harmful data on AI models during training or usage, resulting in erroneous or harmful outputs [3][4] - A study by Anthropic revealed that just 250 carefully designed malicious documents could poison a large model with 130 billion parameters, causing it to produce nonsensical responses when triggered by specific phrases [3][5] - Even a mere 0.01% of false text in the training dataset can increase harmful content output by 11.2% [5][10] Group 2: Mechanisms of Data Poisoning - Attackers can introduce harmful samples into the training dataset, compromising the model's functionality, such as inserting incorrect medical advice or promotional content [5][10] - Backdoor attacks involve embedding specific triggers in the training data, leading to malicious outputs when the model encounters these triggers [5][7] - Continuous learning models are susceptible to ongoing poisoning during their operational phase, allowing attackers to inject harmful information repeatedly [8][9] Group 3: Sources of Data Poisoning - Commercial interests drive data poisoning, with businesses seeking to manipulate AI responses for advertising purposes, leading to the emergence of a practice called GEO (Generative Engine Optimization) [11][13] - Some individuals engage in data poisoning for technical bragging rights or personal vendettas, as exemplified by a case involving a former intern at ByteDance [14][16] - Organized crime groups may exploit AI models for illegal activities, such as fraud or evading detection, by systematically injecting harmful data [17][19] Group 4: Consequences of Data Poisoning - The immediate effects of model poisoning include decreased output quality and the generation of false information, which can spread and distort collective memory [22][24] - In critical areas like autonomous driving or healthcare, poisoned models can pose direct safety threats, leading to catastrophic decisions [25][10] - The article emphasizes the need for a robust defense system against data poisoning, including data auditing, adversarial training, and continuous vulnerability assessments [26][27] Group 5: Solutions and Future Directions - Developing AI models with self-verification capabilities and ethical guidelines is crucial for mitigating risks associated with data poisoning [27][28] - The industry must foster a collaborative environment for identifying vulnerabilities and enhancing model resilience through initiatives like bug bounty programs and red team testing [27][28] - Continuous vigilance and proactive measures are essential to ensure that AI technology evolves positively and serves beneficial purposes [28]

对抗样本攻击

GEO（生成式引擎优化）

对抗样本攻击

GEO（生成式引擎优化）

“数据投毒”或诱发有害输出！AI数据污染分为几类？专家解读→

Sou Hu Cai Jing· 2025-08-17 08:50

Core Viewpoint - The national security department has issued a warning about "data poisoning" in AI, which can lead to harmful outputs due to the manipulation, fabrication, and repetition of data [1]. Group 1: Data Poisoning Overview - "Data poisoning" primarily targets two areas: visual recognition and natural language processing [3]. - An example of data poisoning involves altering training data, such as adding a green dot to a zebra image, which can mislead AI models during training [3]. - Even a few contaminated samples among thousands can significantly disrupt the AI model's ability to recognize similar objects correctly [3]. Group 2: Types of Data Pollution - There are two main types of AI data pollution: one involves malicious human intervention to mislead AI outputs, and the other involves the unfiltered inclusion of harmful information from vast internet data collections [5]. - If untrustworthy data is not identified and removed, it can compromise the reliability of AI outputs [5]. Group 3: Data Sources and Risks - AI models require extensive data for training, often sourced from the internet, including various forms of media [7]. - The potential for data contamination exists as anyone can contribute data online, which may lead to the AI model being influenced by unsafe or polluted data [7].

马斯克称Grok回答中将引入广告丨南财合规周报（第202期）

2 1 Shi Ji Jing Ji Bao Dao· 2025-08-11 01:20

Regulatory Governance - The Ministry of National Security warns about the risks of data poisoning and contamination in artificial intelligence training data, highlighting the presence of false information and biased viewpoints that can compromise AI safety [1][2] - Research indicates that even a mere 0.01% of false text in training data can increase harmful outputs by 11.2%, while 0.001% can raise harmful outputs by 7.2% [1] AI Developments - Elon Musk announced plans to introduce advertising into the responses of his AI product Grok, with a reported 40% increase in online ad conversion rates since June [3] - OpenAI launched GPT-5, claiming it to be the most advanced coding model to date, with significant improvements in programming capabilities [4] - Following the launch, GPT-5 received mixed reviews, with OpenAI's CEO acknowledging technical issues that affected its performance shortly after release [5] - Google DeepMind released Genie3, a third-generation model capable of generating interactive 3D virtual worlds with improved resolution and interaction time compared to its predecessor [6] - Apple is entering the AI search engine market, forming an internal team to develop a ChatGPT-like AI search experience, marking a significant shift in its approach to generative AI [6]

引用所谓董明珠言论被格力起诉，轻信AI核实信源要不得

Qi Lu Wan Bao· 2025-08-06 01:58

Core Viewpoint - The case involving a defamatory statement attributed to Dong Mingzhu highlights the risks associated with misinformation and the reliance on AI for verifying information, which can lead to significant reputational damage for companies like Gree [1][2]. Group 1: Legal Case and Implications - A Shenzhen individual was sued for allegedly fabricating statements attributed to Dong Mingzhu, resulting in a court ruling that required an apology and compensation of 70,000 yuan to Gree [1]. - The case underscores the potential consequences of misinformation, as the individual claimed to have verified the information using AI tools, which raises concerns about the reliability of AI in discerning truth [1][2]. Group 2: AI and Misinformation - The National Security Department issued a warning about the risks of "data poisoning" in AI training, noting that even a small percentage of false data can significantly increase harmful outputs from AI models [2]. - The report from Tsinghua University indicated a rapid increase in AI-related rumors, particularly in the economic and corporate sectors, with a staggering growth rate of 99.91% in the last six months [3]. Group 3: Regulatory and Collaborative Efforts - The Central Cyberspace Administration of China initiated a campaign to combat misinformation spread by self-media, focusing on the use of AI to generate false information [2][3]. - Proposed regulations, such as the upcoming "Artificial Intelligence Generated Content Labeling Method," aim to ensure transparency in AI-generated content and mitigate the spread of misinformation [3].

GREE(SZ:000651)

数据污染冲击安全防线，国安部：警惕人工智能“数据投毒”

Bei Jing Ri Bao Ke Hu Duan· 2025-08-05 00:17

Group 1 - The core issue highlighted is the presence of poor-quality training data in artificial intelligence, which includes false information, fabricated content, and biased viewpoints, leading to data source contamination and new challenges for AI safety [1][5]. - Data is identified as a fundamental element for AI, essential for training models and driving AI applications, with high demands for quantity, quality, and diversity to enhance model performance [3][5]. - Data pollution can significantly impair model accuracy and reliability, potentially causing decision-making errors and system failures, with harmful content generated through data poisoning affecting model training [5][6]. Group 2 - Even a small percentage of false text in training data can lead to a substantial increase in harmful outputs, with 0.01% of false text resulting in an 11.2% increase in harmful content [6]. - In the financial sector, data pollution can lead to the creation of false information that may cause abnormal stock price fluctuations, posing new market manipulation risks [7]. - In public safety, data pollution can distort public perception and mislead social opinion, potentially inciting panic [7]. Group 3 - To strengthen AI data security, it is recommended to enhance source regulation to prevent data contamination, supported by existing laws such as the Cybersecurity Law and Data Security Law [9]. - Risk assessment should be reinforced to ensure data safety throughout its lifecycle, including collection, storage, and transmission [9]. - A governance framework should be established for data cleaning and repair, with specific rules based on legal standards to manage and control data quality continuously [9].

0.01%虚假训练文本可致有害内容增加11.2% 警惕人工智能“数据投毒”

Yang Shi Xin Wen Ke Hu Duan· 2025-08-04 22:46

Core Viewpoint - The Ministry of National Security has issued a safety alert regarding the quality of training data for artificial intelligence, highlighting the presence of false information, fabricated content, and biased viewpoints, which leads to data contamination and poses new challenges to AI safety [1] Group 1 - The training data for artificial intelligence is characterized by a mix of quality, with significant issues related to misinformation and bias [1] - The contamination of data sources is identified as a critical challenge for the safety of artificial intelligence systems [1]

深度｜95后Scale AI创始人：AI能力指数级增长，生物进化需要百万年，脑机接口是保持人类智慧与AI共同增长的唯一途径

Z Potentials· 2025-07-28 04:17

Core Insights - The article discusses the rapid advancement of AI technology and its implications for human evolution and society, emphasizing the need for brain-computer interfaces to keep pace with AI development [5][7][22]. Group 1: AI and Data - AI is compared to oil, serving as a crucial resource for future economies and military capabilities, with the potential for unlimited growth through self-reinforcing cycles [22][23]. - Data is highlighted as the new "oil," essential for feeding algorithms and enhancing AI capabilities, with companies competing for data center dominance [23][24]. - The three key components for AI development are algorithms, computational power, and data, with a focus on improving these elements to enhance AI performance [24][25]. Group 2: Brain-Computer Interfaces - Brain-computer interfaces (BCIs) are seen as the only way to maintain human relevance alongside rapidly advancing AI, despite the significant risks they pose [7][22]. - Potential risks of BCIs include memory theft, thought manipulation, and the possibility of creating a reality where individuals can be controlled or influenced by external entities [6][7][26]. - The technology could enable profound enhancements in human cognition, allowing individuals to access vast amounts of information and think at superhuman speeds [9][10]. Group 3: Scale AI - Scale AI, founded by Alexandr Wang, provides essential data support for major AI models, including ChatGPT, and is valued at over $25 billion [2][10]. - The company initially gained recognition for creating large-scale datasets and has since expanded its focus to include partnerships with significant clients, including the U.S. Department of Defense [11][56]. - Scale AI's growth trajectory has been rapid, expanding from a small team to approximately 1,100 employees within five years, with a strong emphasis on the autonomous driving sector [64].

agentic warfare

agentic warfare

3D高斯泼溅算法大漏洞：数据投毒让GPU显存暴涨70GB，甚至服务器宕机

量子位· 2025-04-22 05:06

Core Viewpoint - The emergence of 3D Gaussian Splatting (3DGS) as a leading 3D modeling technology has introduced significant security vulnerabilities, particularly through a newly proposed attack method called Poison-Splat, which can drastically increase training costs and system failures [1][2][31]. Group 1: Introduction and Background - 3DGS has rapidly become a dominant technology in 3D vision, replacing NeRF due to its high rendering efficiency and realism [2][7]. - The adaptive nature of 3DGS, which adjusts computational resources based on scene complexity, is both a strength and a potential vulnerability [8][11]. - The research highlights a critical security blind spot in mainstream 3D reconstruction systems, revealing how minor alterations to input images can lead to significant operational disruptions [2][31]. Group 2: Attack Mechanism - The Poison-Splat attack targets the GPU memory usage and training time by introducing perturbations to input images, leading to increased computational costs [12][22]. - The attack is modeled as a max-min bi-level optimization problem, employing innovative strategies such as a proxy model to approximate the victim's behavior and maximizing the Total Variation (TV) of images to induce excessive complexity in 3DGS [13][16][15]. - The attack can significantly increase GPU memory usage from under 4GB to 80GB and training time by up to five times, demonstrating its effectiveness [25][22]. Group 3: Experimental Results - Experiments conducted on various 3D datasets showed that unconstrained attacks could lead to GPU memory usage surging by 20 times and rendering speeds dropping to one-tenth of the original [25][22]. - Even with constraints on pixel perturbations, the attack remains potent, with some scenarios showing over eightfold increases in memory consumption [27][22]. Group 4: Implications and Contributions - The research emphasizes that the findings are not merely academic but represent real threats to 3D service providers that allow user-uploaded content [31][40]. - Simple defenses, such as limiting the number of Gaussian points, are ineffective as they compromise the quality of 3D reconstructions [39][35]. - The study aims to raise awareness about the security of AI systems in 3D modeling, advocating for the development of more intelligent defense mechanisms [41][37].

3D重建系统安全

3D Gaussian Splatting (3DGS)

3D重建系统安全

3D Gaussian Splatting (3DGS)