Workflow
联邦学习
icon
Search documents
礼来开放其价值超10亿美金AI制药平台!邀中小企业共享“数据金矿”
生物世界· 2025-09-10 09:00
Core Value - Eli Lilly's Lilly TuneLab is a machine learning platform that leverages over $1 billion worth of drug development models accumulated over years, making it one of the most valuable datasets in the industry [4] - The platform utilizes Federated Learning technology, allowing biotech companies to use Lilly's AI models for drug discovery without sharing their proprietary data [4] Collaboration Model - Biotech companies using the platform are required to contribute their training data, which is aimed at improving the platform and benefiting the entire ecosystem and patients [5] - The initiative is designed to empower smaller biotech firms with the advanced AI capabilities typically available to larger companies [5] Addressing Pain Points - The platform addresses a fundamental barrier faced by small biotech companies, which is the lack of large-scale, high-quality data necessary for training effective models [7] - Lilly TuneLab compresses decades of learning into an immediately usable intelligence resource, thus alleviating the data scarcity issue [7] Future Plans - Eli Lilly plans to enhance TuneLab with additional features, such as in vivo small molecule prediction models, to continuously expand its capabilities [8] - The launch of the AI drug development platform represents a proactive attempt by a pharmaceutical giant to reshape the industry ecosystem and accelerate innovation for data-deficient biotech companies [8]
医药生物-医药行业行业研究:从数据、算力、模型切入的3类龙头,看全球AI
Sou Hu Cai Jing· 2025-08-31 03:08
Core Insights - The report highlights the transition of AI in drug development from concept to reality, with significant advancements expected in 2024, marked by the Nobel Prize awarded for AlphaFold2, indicating a new era in AI-driven pharmaceuticals [1][4][13] - Multi-omics AI applications are projected to achieve a 1000-fold reduction in costs and efficiency in the pharmaceutical sector, with the first AI-driven blockbuster drug nearing approval [1][4][16] - The industry is witnessing a paradigm shift as major tech companies and pharmaceutical giants invest heavily in AI, with over $50 billion in AI drug development-related transactions in the past five years [1][5][6] Group 1: Industry Dynamics - AI drug development is moving towards practical applications, with significant breakthroughs in model transparency and regulatory frameworks, such as the EU's AI Act promoting explainability [1][4][31] - Key elements driving the industry include computational power, data integration, and advanced modeling techniques, with major cloud providers like Amazon, Google, and Microsoft offering robust resources [1][4][36] - The emergence of federated learning technologies is breaking down data silos, enabling cross-industry collaborations to enhance drug discovery [1][4][36] Group 2: Major Players and Investments - Tech giants like NVIDIA and Google are actively entering the AI pharmaceutical space, with NVIDIA investing in 13 AI drug companies and Google restructuring its AI divisions for clinical trials [1][5][6] - Leading pharmaceutical companies, including Merck and Pfizer, are committing hundreds of millions to AI-related initiatives, reflecting a strategic shift towards AI in drug development [1][5][6] - The report emphasizes the importance of companies with rich pipelines and proven capabilities in AI drug development, suggesting a focus on firms like Insilico Medicine and CrystalGenomics [1][6][19] Group 3: Future Outlook - The report anticipates that AI will revolutionize drug development, diagnostics, and treatment methodologies, with significant economic returns expected from AI-enabled innovations [1][19][20] - By 2030, the entire pharmaceutical industry is projected to experience exponential growth driven by AI, with substantial improvements in efficiency and cost-effectiveness [1][19][20] - The integration of AI in drug development is expected to enhance the speed and accuracy of clinical trials, ultimately leading to faster market entry for new therapies [1][39]
促进和规范数据跨境流动,将对智能汽车进出口有何影响?
Core Viewpoint - Data has become a "gold mine" and a hotspot for investment in the smart connected vehicle sector, with recent government signals promoting and regulating cross-border data flow, which is expected to benefit the import and export of smart vehicles [3][5]. Group 1: Data Generation and Importance - Smart connected vehicles generate massive amounts of data daily, reaching terabytes (TB), including various types of information such as facial expressions, actions, voice data, and vehicle location [4]. - The increasing import of smart vehicles like Tesla and the growing export of Chinese smart vehicles highlight the need for effective cross-border data flow management [5]. Group 2: Regulatory Framework - China has established a policy framework for cross-border data flow, including the implementation of the Data Security Law and the Personal Information Protection Law, which provide a legal basis for data management in the smart vehicle sector [5][6]. - The upcoming regulations, such as the "Automotive Data Export Safety Guidelines (2025 Edition)" and the "Regulations on Promoting and Regulating Cross-Border Data Flow," indicate a move towards more specialized and detailed data governance [6][12]. Group 3: Global Data Governance Challenges - Different countries have varying data governance models, with the EU's GDPR imposing strict data localization requirements, presenting challenges for Chinese smart vehicle companies operating in the EU market [7]. - The need for compliance with international regulations is pushing foreign brands in China to adapt their data management strategies, as seen with Tesla's establishment of a local data center [9]. Group 4: Technological Innovations and Compliance - Technological innovations such as privacy computing and federated learning are becoming key drivers for improving compliance efficiency in cross-border data flow [10]. - Emerging technologies like dynamic de-identification and intelligent encryption are expected to become standard practices for ensuring data security during cross-border transmission [11]. Group 5: Industry Self-Regulation and Future Outlook - Industry self-regulation is crucial for enhancing compliance levels in cross-border data flow, with proposed management systems focusing on pre-assessment, real-time monitoring, and post-audit processes [11]. - The promotion and regulation of cross-border data flow are seen as guiding principles for healthy industry development, encouraging companies to integrate compliance capabilities into their export strategies [12].
数据“中毒”会让AI“自己学坏”
Ke Ji Ri Bao· 2025-08-19 00:18
Core Insights - The article discusses the risks of data poisoning in AI systems, highlighting how malicious interference can lead to incorrect AI learning and potentially dangerous outcomes in various sectors like transportation and healthcare [1][2]. Group 1: Data Poisoning Risks - Data poisoning can occur when misleading data is fed into AI systems, causing them to develop incorrect understandings and make erroneous judgments [1][2]. - A notable example of data poisoning is the case of Microsoft's chatbot Tay, which was forced offline within hours of launch due to being manipulated by users [2]. - The rise of AI web crawlers has led to concerns about the collection of toxic data, which can result in copyright infringement and the spread of false information [3]. Group 2: Copyright and Defensive Measures - Creators are increasingly concerned about their works being used without permission, leading to legal actions like the lawsuit from The New York Times against OpenAI for copyright infringement [4]. - Tools like Glaze and Nightshade have been developed to protect creators' works by introducing subtle alterations that confuse AI models, effectively turning their own creations into "poison" for AI training [4]. - Cloudflare has introduced "AI Maze" to trap AI crawlers in a loop of meaningless data, consuming their resources and time [4]. Group 3: Decentralized Defense Strategies - Researchers are exploring decentralized technologies as a defense against data poisoning, with methods like federated learning allowing models to learn locally without sharing raw data [5][6]. - Blockchain technology is being integrated into AI defense systems to provide traceability and accountability in model updates, enabling the identification of malicious data sources [6]. - The combination of federated learning and blockchain aims to create more resilient AI systems that can alert administrators to potential data poisoning threats [6].
华为诺亚首席研究员,也具身智能创业了
量子位· 2025-08-13 01:01
Core Viewpoint - The article highlights the rising trend of embodied intelligence as a hot entrepreneurial sector, particularly focusing on the establishment of Shenzhen Noin Intelligent Technology, founded by a former Huawei researcher, Li Yinchuan, who has a strong academic and patent background in AI and robotics [2][7][24]. Group 1: Company Overview - Shenzhen Noin Intelligent Technology was founded on June 19, 2025, and is focused on developing humanoid robots designed for home use, capable of interacting with their environment through a perception-decision-execution loop [7]. - The company has already secured its first round of financing shortly after its establishment, with notable investors including Source Code Capital, which has previously invested in other successful AI ventures [4][15]. - Noin is reportedly in the process of its second round of financing, with its valuation doubling since the first round [14]. Group 2: Market Dynamics - The demand for home humanoid robots is driven by the need for comfort and convenience in family experiences, as well as trends such as global aging and increasing numbers of single-person households [11]. - The market for home humanoid robots is described as exceptionally vibrant, with several companies, including those founded by prominent figures in the tech industry, making headlines for their financing and product developments [11][13]. Group 3: Industry Background - The article notes a significant influx of talent from Huawei into the embodied intelligence sector, with many former employees establishing their own companies [36][38]. - Two notable companies in this space are Zhiyuan Robotics and Tashizhi Navigation, both of which have strong teams composed of former Huawei executives and technical experts [40][48]. - The emergence of the "Huawei system" in the entrepreneurial landscape is attributed to the company's previous talent development programs and its focus on autonomous driving technology, which has naturally transitioned into the robotics field [58][60].
ICCV 2025 | 新型后门攻击直指Scaffold联邦学习,NTU联手0G Labs揭示中心化训练安全漏洞
机器之心· 2025-08-09 03:59
Core Viewpoint - The article introduces BadSFL, a novel backdoor attack method specifically designed for the Scaffold Federated Learning (SFL) framework, highlighting its effectiveness, stealth, and persistence compared to existing methods [2][39]. Group 1: Background on Federated Learning and Scaffold - Federated Learning (FL) allows distributed model training while protecting client data privacy, but its effectiveness is heavily influenced by the distribution of training data across clients [6][10]. - In non-IID scenarios, where data distribution varies significantly among clients, traditional methods like FedAvg struggle, leading to poor model convergence [7][10]. - Scaffold was proposed to address these challenges by using control variates to correct client updates, improving model convergence in non-IID settings [7][12]. Group 2: Security Vulnerabilities in Scaffold - Despite its advantages, Scaffold introduces new security vulnerabilities, particularly against malicious clients that can exploit the model update mechanism to inject backdoor behaviors [8][9]. - The reliance on control variates in Scaffold creates a new attack surface, allowing attackers to manipulate these variates to guide benign clients' updates towards malicious objectives [9][16]. Group 3: BadSFL Attack Methodology - BadSFL operates by subtly altering control variates to steer benign clients' local gradient updates in a "poisoned" direction, enhancing the persistence of backdoor attacks [2][9]. - The attack utilizes a GAN-based data poisoning strategy to enrich the attacker's dataset, maintaining high accuracy for both normal and backdoor samples while remaining covert [2][11]. - BadSFL demonstrates superior persistence, maintaining attack effectiveness for over 60 rounds, which is three times longer than existing benchmark methods [2][32]. Group 4: Experimental Results - Experiments conducted on MNIST, CIFAR-10, and CIFAR-100 datasets show that BadSFL outperforms four other known backdoor attacks in terms of effectiveness and persistence [32][33]. - In the initial 10 rounds of training, BadSFL achieved over 80% accuracy on backdoor tasks while maintaining around 60% accuracy on primary tasks [34]. - Even after the attacker ceases to upload malicious updates, BadSFL retains backdoor functionality significantly longer than benchmark methods, demonstrating its robustness [37][38].
宝信软件(安徽)取得基于联邦学习差分隐私图像分类相关专利
Jin Rong Jie· 2025-08-09 02:49
Core Insights - Baoxin Software (Anhui) Co., Ltd. has obtained a patent for a "federated learning-based differential privacy image classification method and device," with the authorization announcement number CN115527061B and an application date of September 2022 [1] Company Overview - Baoxin Software (Anhui) Co., Ltd. was established in 2002 and is located in Ma'anshan City, primarily engaged in research and experimental development [1] - The company has a registered capital of 3,610.9372 million RMB [1] - Baoxin Software has invested in 5 companies and participated in 3,103 bidding projects [1] - The company holds 13 trademark registrations and 152 patent registrations, along with 20 administrative licenses [1]
微算法科技(NASDAQ:MLGO)应用区块链联邦学习(BlockFL)架构,实现数据的安全传输
Core Viewpoint - The rapid development of big data and artificial intelligence has highlighted data security and privacy issues, with traditional data transmission methods posing significant risks. The introduction of blockchain technology offers new solutions, exemplified by MicroAlgorithm Technology's innovative BlockFL architecture, which ensures secure, efficient, and privacy-protecting data transmission [1][6]. Group 1: BlockFL Architecture - BlockFL architecture utilizes blockchain networks to achieve efficient data exchange and synchronization in federated learning, allowing devices to upload local model updates and download global model updates quickly and effectively [2]. - The decentralized nature and high concurrency of blockchain ensure that all devices receive the same global model updates, maintaining consistency and accuracy in model training [2]. Group 2: Process Overview - Initialization involves the system administrator creating an initial model and broadcasting it to all participating nodes while the blockchain records metadata of the federated learning activity [4]. - Each node trains the model on its local dataset without exposing original data, thus protecting data privacy [4]. - Nodes upload encrypted model parameters to the blockchain, where smart contracts validate their effectiveness and integrity, preventing malicious actions [4]. - Once verified, a central server or designated aggregation node extracts parameters from the blockchain, averages them, and generates a new version of the global model [4]. - The updated global model is then broadcasted to all nodes for the next training round, with the blockchain ensuring traceability of all operations [4]. - An incentive and penalty mechanism is integrated into BlockFL to encourage participation and quality data contribution, with smart contracts automatically executing rewards and penalties [4]. Group 3: Applications and Future Prospects - BlockFL architecture can be applied across various sectors, including healthcare, financial risk control, smart manufacturing, and smart cities, facilitating data collaboration while maintaining security and privacy [5]. - In healthcare, BlockFL enables hospitals to collaboratively train diagnostic models while protecting patient privacy; in finance, it allows institutions to identify fraud without sharing sensitive information; in smart manufacturing, it promotes collaboration between factories; and in smart cities, it supports inter-departmental cooperation without compromising sensitive data [5]. - The combination of blockchain and federated learning in BlockFL addresses traditional data transmission challenges, enhancing efficiency and accuracy in model training, positioning it as a significant technological support in data transmission and machine learning in the future [6].
大模型如何链接政务办公?联通元景重磅发布
Huan Qiu Wang· 2025-07-24 04:56
Core Viewpoint - The emergence of AI large models, particularly the Unicom Yuanjing model, is revolutionizing government services by enhancing efficiency and precision in public service delivery [1][9]. Group 1: Technology Foundation - The traditional general-purpose AI models often struggle with the complexities of government processes, leading to inadequate responses [2]. - The Unicom Yuanjing model is designed to become a "government expert" by distilling government data and injecting specialized knowledge, creating a knowledge network of government entities [3]. - The model incorporates various skills, including language, voice, and visual capabilities, through techniques like supervised fine-tuning and data distillation [4]. Group 2: Product Matrix - The Yuanjing AI has developed a flexible application ecosystem that covers multiple fields and scenarios, driven by a "technology + scenario" dual approach [5]. - The product matrix features a modular architecture, allowing for customizable applications tailored to different government needs, from provincial to community levels [6][7]. Group 3: Practical Application - The Yuanjing model is already operational in various provincial and municipal government departments, significantly improving service efficiency and decision-making capabilities [9]. - The AI's ability to automate tasks such as generating meeting agendas and extracting key policy details has reduced preparation time and streamlined processes for government staff [8][9]. - The model facilitates better citizen engagement by ensuring that necessary materials and processes are clearly outlined, thus minimizing unnecessary trips for the public [8].
ICML spotlight | 一种会「进化」的合成数据!无需上传隐私,也能生成高质量垂域数据
机器之心· 2025-07-11 09:22
Core Viewpoint - The article discusses the challenges of data scarcity in the context of large models and introduces the PCEvolve framework, which aims to generate synthetic datasets while preserving privacy and addressing the specific needs of vertical domains such as healthcare and industrial manufacturing [1][2][10]. Group 1: Data Scarcity and Challenges - The rapid development of large models has exacerbated the issue of data scarcity, with predictions indicating that public data generation will not keep pace with the consumption rate required for training these models by 2028 [1]. - In specialized fields like healthcare and industrial manufacturing, the availability of data is already limited, making the data scarcity problem even more severe [1]. Group 2: PCEvolve Framework - PCEvolve is a synthetic data evolution framework that requires only a small number of labeled samples to generate an entire dataset while protecting privacy [2]. - The evolution process of PCEvolve is likened to DeepMind's FunSearch and AlphaEvolve, focusing on generating high-quality training data from existing large model APIs [2]. Group 3: Limitations of Existing Large Models - Existing large model APIs cannot directly synthesize domain-specific data, as they fail to account for various characteristics unique to vertical domains, such as lighting conditions, sampling device models, and privacy information [4][7]. - The inability to upload local data due to privacy and intellectual property concerns complicates the prompt engineering process and reduces the quality of synthetic data [9][11]. Group 4: PCEvolve's Mechanism - PCEvolve employs a new privacy protection method based on the Exponential Mechanism, which is designed to adapt to the limited sample situation in vertical domains [11]. - The framework includes an iterative evolution process where a large number of candidate synthetic data are generated, followed by a selection process that eliminates lower-quality data based on privacy-protected scoring [11][19]. Group 5: Experimental Results - PCEvolve's effectiveness was evaluated through two main approaches: the impact of synthetic data on downstream model training and the quality of the synthetic data itself [21]. - In experiments involving datasets such as COVIDx and Came17, PCEvolve demonstrated significant improvements in model accuracy, with the final accuracy for COVIDx reaching 64.04% and for Came17 reaching 69.10% [22][23].