反向传播
Search documents
苹果提出新型反向传播:一台iPhone 15 Pro Max就能微调LLM
机器之心· 2025-10-30 01:41
Core Viewpoint - Apple has demonstrated the feasibility of fine-tuning large language models (LLMs) on iPhones using a new method called Memory-Efficient Backpropagation (MeBP), which offers better trade-offs between memory usage and computation time compared to existing methods [1][4]. Summary by Sections Introduction - The article discusses Apple's recent paper on MeBP, which allows for model fine-tuning on resource-constrained mobile devices like the iPhone 15 Pro Max [1][3]. Methodology - MeBP focuses on using LoRA for fine-tuning LLMs, aiming to keep memory usage below 1GB, as recommended by PocketLLM [4]. - The fine-tuning process using MeBP consists of three main steps: compressing base model weights, implementing gradient checkpointing, and creating an efficient runtime for executing the training graph [5][10]. Model Weight Compression - The team employed 4-bit symmetric INT4 quantization for non-LoRA parameters, including embeddings, to reduce disk space usage [7][10]. Gradient Checkpointing - The LLM is divided into blocks to ensure that memory consumption during backpropagation remains within device limits. Automatic differentiation is used to generate a backward graph for each block [8][9]. Runtime Implementation - The MeBP runtime is designed to minimize memory usage by memory-mapping compressed model weights and only decompressing them on demand during training [15][16]. Experimental Performance - The team compared MeBP with MeZO, the only known optimization method for mobile LLM fine-tuning, using server-side simulations and performance evaluations on mobile devices [18][24]. - The experiments were conducted on models with parameters ranging from 0.5B to 4B, focusing on loss and next token accuracy as evaluation metrics [20]. Utility Comparison - Results indicated that while zero-order (ZO) optimization showed slower convergence compared to first-order (FO) optimization, MeBP significantly outperformed ZO in terms of convergence speed and computational efficiency [23]. Performance Comparison - MeBP was implemented in Swift on an iPhone 15 Pro Max with 8GB RAM, showing that MeBP's computation time per gradient step was 43% to 94% longer than MeZO, but it converged faster overall due to fewer required steps [24][28]. - The memory usage of MeBP was slightly higher than MeZO in the worst case, but overall training memory usage was approximately 10 times smaller than previous mobile implementations [28]. Conclusion - All tested LLMs could be efficiently fine-tuned within 1GB of memory, making them suitable for background training on mobile devices [28].
Hinton暴论:AI已经有意识,它自己不知道而已
量子位· 2025-10-12 04:07
Core Viewpoint - The article discusses Geoffrey Hinton's perspective on artificial intelligence (AI), suggesting that AI may already possess a form of "subjective experience" or consciousness, albeit unrecognized by itself [1][56]. Group 1: AI Consciousness and Understanding - Hinton posits that AI might have a nascent form of consciousness, which is misunderstood by humans [2][3]. - He emphasizes that AI has evolved from keyword-based search systems to tools that can understand human intentions [10][14]. - Modern large language models (LLMs) exhibit capabilities that are close to human expertise in various subjects [15]. Group 2: Neural Networks and Learning Mechanisms - Hinton explains the distinction between machine learning and neural networks, with the latter inspired by the human brain's functioning [17][21]. - He describes how neural networks learn by adjusting the strength of connections between neurons, similar to how the brain operates [21][20]. - The breakthrough of backpropagation in 1986 allowed for efficient training of neural networks, significantly enhancing their capabilities [38][40]. Group 3: Language Models and Cognitive Processes - Hinton elaborates on how LLMs process language, drawing parallels to human cognitive processes [46][47]. - He asserts that LLMs do not merely memorize but engage in a predictive process that resembles human thought [48][49]. - The training of LLMs involves a cycle of prediction and correction, enabling them to learn semantic understanding [49][55]. Group 4: AI Risks and Ethical Considerations - Hinton highlights potential risks associated with AI, including misuse for generating false information and societal instability [68][70]. - He stresses the importance of regulatory measures to mitigate these risks and ensure AI aligns with human interests [72][75]. - Hinton warns that the most significant threat from advanced AI may not be rebellion but rather its ability to persuade humans [66]. Group 5: Global AI Landscape and Competition - Hinton comments on the AI competition between the U.S. and China, noting that while the U.S. currently leads, its advantage is diminishing due to reduced funding for foundational research [78][80]. - He acknowledges China's proactive approach in fostering AI startups, which may lead to significant advancements in the field [82].
首访上海,“AI之父”缘何掀起浪潮?
Guo Ji Jin Rong Bao· 2025-07-28 13:06
Group 1 - Geoffrey Hinton, known as the "father of AI," made his first public appearance in China at the WAIC 2025, sparking global attention and reflection on AI development [1] - Hinton's family background is deeply rooted in science, with connections to mathematics, physics, and agriculture, highlighting a legacy of scientific achievement [3][4] - Hinton's research journey began in the 1970s, focusing on artificial neural networks at a time when the field was largely overlooked, leading to significant breakthroughs in AI [6][7] Group 2 - The development of GPU technology in the early 2000s revitalized interest in neural networks, culminating in Hinton's pivotal work on backpropagation, which transformed machine learning [6][8] - In 2012, Hinton and his students developed AlexNet, winning the ImageNet competition and marking a turning point for deep learning as a core technology in AI [7][8] - Hinton has received both the Turing Award and the Nobel Prize in Physics, recognizing his contributions to deep learning and neural networks [8] Group 3 - Hinton has consistently raised alarms about the rapid advancement of AI, warning that it could surpass human intelligence and pose existential risks [10][11] - He emphasizes the need for a global AI safety collaboration mechanism and has criticized tech companies for prioritizing profits over regulation [11] - Hinton estimates a 10% to 20% probability that AI could take over and destroy human civilization, advocating for significant investment in AI safety research [11]
重磅!AlexNet源代码已开源
半导体芯闻· 2025-03-24 10:20
Core Points - The article discusses the release of the source code for AlexNet, a groundbreaking neural network developed in 2012, which has significantly influenced modern AI methods [1][18] - AlexNet was created by researchers from the University of Toronto, including Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, and it is primarily used for image recognition tasks [2][15] Group 1: Background of Deep Learning - Geoffrey Hinton is recognized as one of the fathers of deep learning, which utilizes neural networks and forms the foundation of contemporary AI [4] - The revival of neural network research in the 1980s was led by cognitive scientists who rediscovered the backpropagation algorithm, essential for training multilayer neural networks [5][6] Group 2: ImageNet and GPU Development - The ImageNet project, initiated by Stanford professor Fei-Fei Li, provided a large dataset necessary for training neural networks, significantly contributing to the success of AlexNet [8][9] - NVIDIA played a crucial role in making GPU technology more versatile and programmable, which was essential for the computational demands of training neural networks [9][12] Group 3: Creation and Impact of AlexNet - AlexNet combined deep neural networks, large datasets, and GPU computing, achieving groundbreaking results in image recognition [13] - The paper on AlexNet published in 2012 has been cited over 172,000 times, marking it as a pivotal moment in AI research [17] - The release of AlexNet's source code by the Computer History Museum (CHM) is seen as a significant historical contribution to the field of artificial intelligence [18]