Workflow
知识蒸馏
icon
Search documents
闭环碰撞率爆降50%!DistillDrive:异构多模态蒸馏端到端新方案
自动驾驶之心· 2025-08-11 23:33
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 今天自动驾驶之心为大家分享 华东理工大学、商汤研究院、悉尼大学 最新的工作! DistillDrive:异构蒸馏框架显著降低自动驾驶碰 撞率50% ! 如果您有相关工作需要分享,请在文末联系我们! 自动驾驶课程学习与 技术交流群加入 ,也欢迎添加小助理微信AIDriver005 简介 端到端自动驾驶近年来取得了显著进展,这主要得益于感知技术和模仿学习的进步。如图1(b)所示,该方法直接从复杂的传感器输入学习到最终的规划和决 策,消除了中间的数据传递和目标表征过程,从而显著减少了级联误差。然而在闭环实验中,图1(a)中感知分离的规划模型表现优于端到端模型,这得益于其 论文链接:https://arxiv.org/abs/2508.05402 代码链接:https://github.com/YuruiAI/DistillDrive 对比学习和仿真实验。尽管如此,它在感知和规划之间面临着耦合障碍。 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Rui Yu等 编辑 | 自动驾驶之心 写在前面 & ...
端侧大模型20250801
2025-08-05 03:18
各位同事,投资者,大家晚上好,我是中心剑头,今天由我来做这个国内外专业大模型的发展情况的一个梳理,实际上 当我们去看大模型的时候我们其实往往去更多的关注于云测大模型比如说最近两天繁的比较多的比如说OpenAI的新模型包括以前的我们说谷歌的G-mini这些云测的这种大模型实际上当各个厂商在拼命卷这种云测大模型的时候其实大家也会发现端测AI其实也成 也成为这个国家的一个发力点核心就在于其实几个点第一点就是 端侧的一些硬件包括我们说的一些芯片尤其是里面的NPU的一个提升不管从苹果的A18芯片还是说高通的骁龙8G38G4这种芯片实际上它里面集成的不仅仅是传统的我们说CPUGPU它里面更多的是这种NPU的一个效率的提升包括PC的一些 这个ARM架构的这种芯片啊实际上硬件端啊芯片端的这样的一个提升给端侧啊实际上提供了一个土壤啊包括我们说整个端侧领域包括了手机包括了呃我们说PC啊甚至说现在各种各样的 呃眼镜啊甚至各种各样的AI玩具啊实际上里面所能够承载的整个专策呃AI的这种市场啊越来越庞大啊同时呢我们也能看到呃AI它整个大模型也好它是快速的往前发展那实际上大家会会发现它发展的一个趋势就是说哎我去往更更 多维的这种数据集我们 ...
世界人工智能大会,AI教父Hinton告诉你的25个道理
混沌学园· 2025-07-29 12:04
Core Viewpoint - The article discusses Geoffrey Hinton's insights on the relationship between AI and human intelligence, emphasizing the evolution of AI from symbolic reasoning to large language models (LLMs) and the implications of AI surpassing human intelligence [1][10]. Group 1: Evolution of AI Understanding - For over 60 years, there have been two distinct paradigms in AI: the logical inference paradigm, which views intelligence as symbolic reasoning, and the biological paradigm, which sees intelligence as rooted in understanding and learning through neural networks [1]. - In 1985, Hinton created a small model to explore how humans understand vocabulary by linking features of words to predict the next word without storing entire sentences [2]. - The development of LLMs is seen as a continuation of Hinton's early work, processing more input words and utilizing complex neural structures to build richer interactions [3]. Group 2: Mechanism of Language Understanding - LLMs and human language understanding mechanisms are highly similar, transforming language into features and integrating these features across neural network layers for semantic understanding [4]. - Each word in language is likened to a multi-dimensional Lego block, which can flexibly combine to form complex semantic structures, with the shape of words adapting based on context [6]. - Understanding a sentence is compared to deconstructing a protein molecule rather than converting it into a clear, unambiguous logical expression [5]. Group 3: Knowledge Transfer in AI - The human brain operates at 300,000 watts but cannot easily transfer knowledge to another person, relying instead on explanation [11]. - In contrast, digital intelligence allows for efficient knowledge transfer, directly copying parameters and structures without intermediary language, sharing trillions of bits of information during synchronization [13][14]. - Current technology enables the same model to be deployed across different hardware, facilitating efficient knowledge migration and collaborative learning [15]. Group 4: The Dangers of Advanced AI - There is a concern that AI could surpass human intelligence, leading to scenarios where AI becomes an active system with its own goals, potentially manipulating humans [18][19]. - Hinton warns that developing AI is akin to raising a tiger; once it grows powerful, losing control could be fatal [20]. - Despite the risks, AI holds significant value in various fields, and eliminating it is not feasible; instead, a method must be found to ensure AI does not threaten humanity [21]. Group 5: Global Cooperation for AI Safety - No single country desires AI to dominate the world, and if one country discovers a method to prevent AI from going rogue, others will likely follow suit [22][23]. - Hinton proposes the establishment of an international AI safety organization to research technology and create standards to ensure AI develops positively [24]. - The long-term challenge is to ensure that AI remains a supportive tool for humanity rather than a ruler, which is a critical issue for global collaboration [25].
AI教父Hinton中国首次演讲实录:人类可能就是大语言模型
Hu Xiu· 2025-07-26 09:26
Group 1 - The core idea of the discussion revolves around the evolution of AI, highlighting two main paradigms: "symbolism" which focuses on logical reasoning, and "connectionism" which emphasizes learning from neural connections [1][2] - The speaker, Geoffrey Hinton, discusses the development of a small model in 1985 that combined these two theories, predicting the next word based on features rather than storing complete sentences [3][4] - The advancement of large language models, such as Google's Transformer and OpenAI's GPT, is noted, which utilize multi-dimensional features of words to generate and understand language [6][10] Group 2 - The discussion emphasizes the differences between human knowledge transmission and AI knowledge replication, with AI systems being able to copy and share knowledge at a much faster rate [9][13] - The concept of "knowledge distillation" is introduced, where knowledge from large models is transferred to smaller models, akin to a teacher-student relationship [16][17] - The potential for AI to surpass human intelligence is acknowledged, with concerns about control and the implications of highly intelligent AI systems [18][19] Group 3 - The need for global cooperation in AI safety is highlighted, suggesting the establishment of an international research network focused on training AI for beneficial purposes [20][21] - The second speaker, Yan Junjie, discusses the democratization of AI, emphasizing its role as a creative source and its integration into various fields, enhancing individual capabilities [24][25] - The observation that AI is increasingly being used in diverse applications, from ancient text analysis to astronomy, showcases its expanding utility [26][30] Group 4 - The belief that AI will not be monopolized by a few organizations is presented, with the argument that different models will emerge based on varying goals and values [32][33] - The rise of multi-agent systems and open-source models is noted, indicating a trend towards a more inclusive AI development landscape [34][35] - The discussion concludes with the assertion that AI will become more accessible and affordable, with a focus on the importance of collaborative efforts in achieving advancements in artificial general intelligence (AGI) [40]
端到端自动驾驶万字长文总结
自动驾驶之心· 2025-07-23 09:56
Core Viewpoint - The article discusses the current development status of end-to-end autonomous driving algorithms, comparing them with traditional algorithms and highlighting their advantages and limitations [1][3][53]. Summary by Sections Traditional vs. End-to-End Algorithms - Traditional autonomous driving algorithms follow a pipeline of perception, prediction, and planning, where each module has distinct inputs and outputs [3]. - End-to-end algorithms take raw sensor data as input and directly output path points, simplifying the process and reducing error accumulation [3][5]. - Traditional algorithms are easier to debug and have some level of interpretability, but they suffer from cumulative error issues due to the inability to ensure complete accuracy in perception and prediction modules [3][5]. Limitations of End-to-End Algorithms - End-to-end algorithms face challenges such as limited ability to handle corner cases, as they rely heavily on data-driven methods [7][8]. - The use of imitation learning in these algorithms can lead to difficulties in learning optimal ground truth and handling exceptional cases [53]. - Current end-to-end paradigms include imitation learning (behavior cloning and inverse reinforcement learning) and reinforcement learning, with evaluation methods categorized into open-loop and closed-loop [8]. Current Implementations - The ST-P3 algorithm is highlighted as an early work focusing on end-to-end autonomous driving, utilizing a framework that includes perception, prediction, and planning modules [10][11]. - Innovations in the ST-P3 algorithm include a perception module that uses a self-centered cumulative alignment technique and a prediction module that employs a dual-path prediction mechanism [11][13]. - The planning phase of ST-P3 optimizes predicted trajectories by incorporating traffic light information [14][15]. Advanced Techniques - The UniAD system employs a full Transformer framework for end-to-end autonomous driving, integrating multiple tasks to enhance performance [23][25]. - The TrackFormer framework focuses on the collaborative updating of track queries and detect queries to improve prediction accuracy [26]. - The VAD (Vectorized Autonomous Driving) method introduces vectorized representations for better structural information and faster computation in trajectory planning [32][33]. Future Directions - The article suggests that end-to-end algorithms still primarily rely on imitation learning frameworks, which have inherent limitations that need further exploration [53]. - The introduction of more constraints and multi-modal planning methods aims to address trajectory prediction instability and improve model performance [49][52].
低成本下的高性能模型,是悖论还是可能?
机器之心· 2025-05-31 17:15
Core Viewpoint - The article discusses the paradox of achieving high performance in AI models at low costs, questioning whether the decline in perceived model performance is intentional by AI companies and exploring the implications of cost-saving measures on model quality [2][3]. Group 1: Low-Cost High-Performance Models - The performance and cost dilemma of large language models (LLMs) has been a focal point of public and industry concern, with ongoing discussions about whether top model companies sacrifice precision or service stability to save on inference costs [2][3]. - Following the popularity of ChatGPT, users have expressed dissatisfaction with perceived declines in performance, citing issues such as weakened logic, increased errors, and difficulties in following instructions [2][3]. - The public's concern about companies sacrificing model performance for cost savings is supported by technical and market evidence, particularly highlighted in the controversy surrounding the DeepSeek-R1 model [3][4]. - The true "full version" of DeepSeek-R1 requires significant hardware investment, with initial costs reaching hundreds of thousands of yuan, leading some platforms to potentially use distilled versions that compromise inference capability and stability [3][4]. Group 2: Cost Management Strategies - To balance costs and performance, high-end "full version" models are not widely available, especially in a market flooded with free or low-cost services that often lack sufficient performance [6]. - AI companies are increasingly adopting model distillation or simplified models to reduce inference costs and manage financial investments [6]. - Common strategies to address cost pressures include lowering model precision through techniques such as model quantization, pruning, and knowledge distillation, which have become standard practices in the industry [6].
对话27岁博导张林峰:模型压缩获CVPR满分有点意外,上海交大像我这样年轻老师很多
量子位· 2025-05-27 01:07
Core Viewpoint - Zhang Linfeng, a young professor at Shanghai Jiao Tong University, has made significant contributions to the field of model compression, particularly through innovative data distillation methods that enhance model efficiency and reduce training costs [2][4][27]. Group 1: Model Compression Techniques - Zhang Linfeng's team developed a new data distillation method that achieved a perfect score at CVPR 2025, utilizing a 6-year-old 2080Ti GPU with only 1/300 of the memory compared to previous state-of-the-art methods, while increasing speed by 20 times [2][4]. - The team introduced a novel distribution difference metric (NCFD) to transform the data distillation problem into a min-max optimization problem, significantly improving the quality of synthetic data and demonstrating scalability across various benchmark datasets [6][7]. - Their approach focuses on efficiently utilizing data to reduce the training costs of large AI models, aiming for a cost-saving ratio greater than 1 for training expenses versus data selection costs [9][10]. Group 2: Token Reduction Strategies - The team has explored token-level feature caching methods, achieving up to 9 times acceleration in diffusion language models with minimal performance loss, and extending this to multimodal models where up to 90% of tokens can be removed without sacrificing accuracy [11][12]. - The introduction of the Toca method allows for adaptive selection of tokens for caching, optimizing performance based on the specific task, such as image editing, where only relevant areas need computation [16][20]. - The latest TaylorSeer model aims to predict the next features instead of reusing previous ones, achieving close to 5 times acceleration across various models, including video generation tasks [18][20][24]. Group 3: Future Directions and Industry Impact - The overarching goal of Zhang Linfeng's research is to lower the deployment costs of large models, making them more applicable in real-world scenarios, particularly in video generation where the aim is to achieve real-time generation speeds [27][25]. - The evolution of model compression is seen as a response to the increasing size of AI models, with a shift from traditional methods to data-centric approaches that minimize knowledge loss during compression [38][44]. - The research outcomes have been open-sourced and are gradually being integrated into various models, indicating a significant impact on the industry and the potential for widespread application [23][26].
Jeff Dean:一年内 AI 将取代初级工程师,网友:“Altman只会画饼,Jeff说的话才致命”
Xin Lang Cai Jing· 2025-05-18 22:46
Group 1 - Jeff Dean predicts that within a year, AI systems capable of operating 24/7 with "junior engineer" abilities will be available [1][14][15] - Dean emphasizes the significant advancements in AI, particularly in neural networks and their applications across various tasks since 2012 [4][6][7] - The evolution of AI is marked by improvements in algorithms and hardware, leading to larger models and enhanced capabilities [6][22] Group 2 - The industry is witnessing a potential transformation in the software development job market due to the rise of AI engineers who can outperform human engineers in certain tasks [4][8] - Dean discusses the importance of specialized hardware for machine learning, highlighting Google's TPU project and the need for efficient computation [16][19] - The future of AI models may involve sparse models that utilize different parts of the model for specialized tasks, enhancing efficiency significantly [24][25]
Sebastian Raschka 新书《从头开始推理》抢先看,揭秘推理模型基础
机器之心· 2025-05-02 04:39
Core Viewpoint - The article discusses the advancements in reasoning capabilities of large language models (LLMs) and introduces the book "Reasoning From Scratch" by Sebastian Raschka, which aims to provide practical insights into building reasoning models from the ground up [2][5][59]. Group 1: Definition and Importance of Reasoning in LLMs - Reasoning in the context of LLMs refers to the model's ability to generate intermediate steps before arriving at a final answer, often described as chain-of-thought (CoT) reasoning [8][10]. - The distinction between reasoning and pattern matching is crucial, as traditional LLMs primarily rely on statistical correlations rather than logical reasoning [23][25]. - Understanding reasoning methods is essential for enhancing LLMs' capabilities to tackle complex tasks, such as solving logical puzzles or multi-step arithmetic problems [5][39]. Group 2: Training Process of LLMs - The typical training process for LLMs consists of two main phases: pre-training and fine-tuning [16][19]. - During pre-training, LLMs are trained on vast amounts of unlabelled text (up to several terabytes) to learn language patterns, which can cost millions of dollars and take months [17][21]. - Fine-tuning involves supervised fine-tuning (SFT) and preference fine-tuning to improve the model's ability to respond to user queries [20][21]. Group 3: Pattern Matching vs. Logical Reasoning - LLMs learn to predict the next token based on statistical patterns in the training data, which allows them to generate coherent text but lacks true understanding [23][24]. - In contrast, logical reasoning requires the ability to derive conclusions step-by-step, identifying contradictions and causal relationships [25][26]. - The article highlights that most LLMs do not actively identify contradictions but instead rely on learned patterns from training data [30][34]. Group 4: Enhancing Reasoning Capabilities - The reasoning capabilities of LLMs gained significant attention with the release of OpenAI's o1 model, which emphasizes a more human-like thought process [41][43]. - Enhancements to LLM reasoning can be achieved through inference-time compute scaling, reinforcement learning, and knowledge distillation [44][46][48]. - These methods aim to improve the model's reasoning ability without retraining the underlying model weights [46][48]. Group 5: Importance of Building Reasoning Models from Scratch - Building reasoning models from scratch provides valuable insights into the capabilities, limitations, and computational trade-offs of LLMs [50][57]. - The shift towards reasoning models reflects a broader trend in the AI industry, emphasizing the need for models that can handle complex tasks effectively [52][55]. - Understanding the underlying mechanisms of LLMs and reasoning models is crucial for optimizing their performance in various applications [57].