Workflow
检索增强生成
icon
Search documents
超越RAG和DAPT!华人团队新研究引热议:即插即用、无需改变原参即可让模型化身领域专家
量子位· 2025-08-18 09:16
Core Viewpoint - A new research team from Shanghai Jiao Tong University and Shanghai AI Lab has introduced a "Memory Decoder" pre-training memory module that enhances large language models' performance in specific domains like biomedical, finance, and law, without the need for expensive full-parameter training or real-time retrieval [1][4][5]. Group 1: Methodology and Advantages - The Memory Decoder is a small transformer decoder that mimics the behavior of external non-parametric retrievers, allowing it to compress domain-specific knowledge into its parameters during pre-training [4][16]. - Compared to DAPT (Domain Adaptive Pre-training), which requires costly full model retraining and risks catastrophic forgetting, and RAG (Retrieval-Augmented Generation), which incurs delays due to time-consuming neighbor searches, the Memory Decoder offers a more efficient and flexible solution [13][14][19]. - The integration of the Memory Decoder is plug-and-play, requiring no changes to the original model parameters and can be combined with any large language model sharing the same tokenizer [6][19]. Group 2: Experimental Results - The effectiveness of the Memory Decoder was tested on various Qwen and Llama models across three specialized fields, with perplexity (a measure of model understanding) used as the evaluation metric [20][22]. - Results showed that the Memory Decoder significantly reduced perplexity across all tested models, outperforming traditional LoRA methods [23][25]. - For instance, the Qwen2-0.5B model achieved an average perplexity of 14.88, which dropped to 4.05 with the Memory Decoder, demonstrating a substantial improvement in domain-specific performance [24]. Group 3: Limitations and Future Implications - The authors noted that while the Memory Decoder reduces training costs, the initial training phase still requires significant computational resources to gather relevant information from a large database [27]. - Additionally, adapting the Memory Decoder trained on one model to another requires some parameter updates for embedding space alignment, indicating that true zero-shot cross-architecture transfer is not yet achievable [28][29]. - The introduction of the Memory Decoder represents a new paradigm in domain adaptation, suggesting that specially pre-trained memory components can be integrated into various models to continuously enhance performance [30].
还在纠结是否入门大模型?别人已经发了第一篇顶会!
自动驾驶之心· 2025-07-14 06:20
Core Viewpoint - The article discusses the evolving landscape of large models in autonomous driving, highlighting the focus on lightweight solutions, hardware adaptation, knowledge distillation, and advanced reasoning paradigms like CoT and VLA+ reinforcement learning as key areas for future development [1][2]. Group 1: Course Introduction - The course aims to explore cutting-edge optimization methods for large models, focusing on parameter-efficient computation, dynamic knowledge expansion, and complex reasoning [2]. - It addresses the core challenges in model optimization, including pruning, quantization, retrieval-augmented generation (RAG), and advanced reasoning paradigms [3]. Group 2: Problems Addressed by the Course - The course provides a systematic understanding of large model knowledge, helping students build a coherent theoretical framework [3]. - It assists students in combining theoretical knowledge with practical coding skills, enabling them to replicate research papers and develop new models [3]. - The course offers guidance on writing and submitting academic papers, addressing common challenges faced by students [3]. Group 3: Enrollment Information - The course limits enrollment to 6-8 students per session [4]. - It targets individuals with a background in deep learning or machine learning, familiarity with Python, and a passion for research [6]. Group 4: Course Outcomes - Participants will gain insights into classic and cutting-edge papers in the field, enhancing their understanding of key algorithms and principles [9]. - The course includes a structured approach to writing and revising academic papers, culminating in the production of a draft [9]. Group 5: Course Structure - The course spans 12 weeks of online group research followed by 2 weeks of paper guidance and a 10-week maintenance period [9]. - It covers various topics, including model pruning, quantization, and advanced reasoning techniques, with a focus on practical applications [19].
师兄自己发了篇自动驾大模型,申博去TOP2了。。。
自动驾驶之心· 2025-07-09 12:56
Core Viewpoint - The article discusses the advancements in large models (LLMs) for autonomous driving, highlighting the need for optimization in efficiency, knowledge expansion, and reasoning capabilities as the technology matures [2][3]. Group 1: Development of Large Models - Companies like Li Auto and Huawei are implementing their own VLA and VLM solutions, indicating a trend towards the practical application of large models in autonomous driving [2]. - The focus for the next generation of large models includes lightweight design, hardware adaptation, knowledge distillation, quantization acceleration, and efficient fine-tuning [2][3]. Group 2: Course Introduction - A course is being offered to explore cutting-edge optimization methods for large models, focusing on parameter-efficient computation, dynamic knowledge expansion, and complex reasoning [3]. - The course aims to address core challenges in model optimization, including pruning, quantization, retrieval-augmented generation (RAG), and advanced reasoning paradigms like Chain-of-Thought (CoT) and reinforcement learning [3][4]. Group 3: Enrollment and Requirements - The course will accept a maximum of 8 students per session, targeting individuals with a background in deep learning or machine learning who are familiar with Python and PyTorch [5][10]. - Participants will gain a systematic understanding of large model optimization, practical coding skills, and insights into academic writing and publication processes [8][10]. Group 4: Course Outcomes - Students will learn to combine theoretical knowledge with practical coding, develop their own research ideas, and produce a draft of a research paper [8][9]. - The course includes a structured timeline with specific topics each week, covering model pruning, quantization, efficient fine-tuning, and advanced reasoning techniques [20].
大模型在自动驾驶后期的落地与研究方向有哪些?
自动驾驶之心· 2025-07-07 23:31
Core Insights - The article discusses the evolving landscape of large models in autonomous driving, highlighting the focus on lightweight solutions, hardware compatibility, knowledge distillation, and efficient fine-tuning of large models [1] - It emphasizes the importance of advanced reasoning paradigms such as Chain-of-Thought (CoT) and VLA combined with reinforcement learning in enhancing spatial perception capabilities [1] Group 1: Course Overview - The course aims to explore cutting-edge optimization methods for large models, focusing on parameter-efficient computation, dynamic knowledge expansion, and complex reasoning [2] - Key challenges in model optimization include parameter compression through pruning and quantization, dynamic knowledge injection techniques, and advanced reasoning paradigms [2][3] Group 2: Enrollment and Requirements - The course is limited to 6-8 participants per session, targeting individuals with a foundational understanding of deep learning and machine learning [4][8] - Participants are expected to have basic programming skills in Python and familiarity with PyTorch, along with a genuine interest in research [8] Group 3: Course Outcomes - The course aims to provide a systematic understanding of large model optimization, helping participants develop their own research ideas and enhance their coding skills [6][7] - Participants will receive guidance on writing and submitting academic papers, including methodologies for drafting and revising manuscripts [6][7] Group 4: Course Structure - The course spans 12 weeks of online group research followed by 2 weeks of paper guidance, covering topics such as model pruning, quantization, and dynamic knowledge expansion [7][18] - Each week focuses on specific themes, including advanced reasoning techniques and collaborative multi-agent systems [18][20] Group 5: Additional Information - The course will utilize publicly available datasets and baseline codes tailored to specific applications, ensuring practical relevance [15][16] - Participants will engage in discussions and hands-on experiments using mainstream large models like LLaMA and GPT [2][18]
大模型这个坑,还有哪些可以发论文的点?
具身智能之心· 2025-07-05 02:25
Core Insights - The article emphasizes the rapid development of large language models (LLMs) and multimodal models, focusing on enhancing model efficiency, expanding knowledge capabilities, and improving reasoning performance as key research areas in artificial intelligence [1][2]. Course Objectives - The course aims to systematically explore cutting-edge optimization methods for large models, addressing challenges in parameter-efficient computation, dynamic knowledge expansion, and complex reasoning [1][2]. Enrollment Details - The course will accept 6 to 8 participants per session [3]. Target Audience - The course is designed for master's and doctoral students in the field of large models, individuals seeking to enhance their resumes for graduate studies abroad, and professionals in artificial intelligence looking to deepen their understanding of algorithm theory and research skills [4]. Course Outcomes - Participants will gain insights into classic and cutting-edge papers, coding implementations, and methods for writing and submitting research papers, thereby developing a clearer understanding of the subject matter [3][4]. Enrollment Requirements - Basic requirements include familiarity with deep learning/machine learning, basic knowledge of large model algorithms, proficiency in Python, and experience with PyTorch [5]. Course Structure - The course spans 12 weeks of online group research, followed by 2 weeks of paper guidance, and includes a maintenance period of 10 weeks for paper development [10]. Learning Requirements - Participants are expected to engage actively in discussions, complete assignments on time, and maintain academic integrity throughout the course [12]. Course Outline - The curriculum covers various topics, including model pruning, quantization, dynamic knowledge expansion, and advanced reasoning paradigms, with a focus on practical applications and coding [16][18].
AI写综述,靠谱吗?
Hu Xiu· 2025-07-04 07:49
Core Insights - The article discusses the advancements in artificial intelligence (AI) that are enabling faster and more efficient literature reviews in scientific research, particularly through the development of AI systems like FutureHouse's PaperQA2, which can summarize vast amounts of scientific knowledge quickly and accurately [1][6]. Group 1: AI in Literature Review - AI systems are being developed to automate the process of literature reviews, with tools like Consensus and Elicit helping researchers summarize and categorize scientific publications [2][4]. - Despite advancements, current AI tools cannot independently produce high-quality systematic reviews, which require rigorous methodologies and meta-analyses [2][3]. - The emergence of generative AI models has raised concerns about the potential for producing low-quality or misleading reviews, as these models may not adhere to established research practices [2][3][10]. Group 2: Challenges and Limitations - Systematic reviews involve at least 25 rigorous steps, making them time-consuming and complex, often taking months or years to complete [7][8]. - Many AI tools, including Elicit, are limited to searching open-access papers and abstracts, which restricts their ability to access full-text articles behind paywalls [5][6]. - The performance of AI systems in generating literature reviews is still under scrutiny, with experts emphasizing the need for transparency and reproducibility in the review process [9][12]. Group 3: Future Directions - There is ongoing research to improve AI tools for literature reviews, with a focus on enhancing their efficiency and accuracy while maintaining rigorous standards [9][12]. - Non-profit organizations are being encouraged to participate in the development of AI tools to ensure reliability and transparency in scientific literature synthesis [12]. - Funding initiatives are being announced to support the development of evidence synthesis systems, indicating a growing interest in improving the quality of literature reviews through AI [12].
下一代大模型高效计算:参数压缩、硬件适配与多模态推理、CoT等方向论文指导班来啦!
自动驾驶之心· 2025-07-04 07:13
Core Insights - The article discusses the rapid development of large language models (LLMs) and multimodal models, focusing on enhancing model efficiency, expanding knowledge capabilities, and improving reasoning performance as core issues in current AI research [1][2]. Course Overview - The course systematically explores cutting-edge optimization methods for large models, emphasizing three key areas: parameter-efficient computation, dynamic knowledge expansion, and complex reasoning [1]. - It addresses core challenges in model optimization, including lightweight methods such as pruning, sparsification, and quantization for parameter compression; dynamic knowledge injection techniques like retrieval-augmented generation (RAG) and parameter-efficient fine-tuning (PEFT) for knowledge expansion; and advanced reasoning paradigms such as chain-of-thought (CoT) and reinforcement learning optimization (GRPO) for reasoning enhancement [1]. Course Objectives - The course aims to help students systematically master key theoretical knowledge in specified directions and develop a clearer understanding of the content [5]. - It seeks to bridge the gap for students who lack direction and practical skills, enabling them to combine theoretical knowledge with coding practice and lay the groundwork for developing new models [5]. - The course also focuses on improving students' academic writing skills, providing guidance on manuscript preparation and submission [5]. Target Audience - The course is designed for master's and doctoral students in the field of large models, those seeking to enhance their resumes for graduate studies abroad, and professionals in the AI field looking to systematically improve their algorithmic theory and writing skills [6]. Admission Requirements - Basic requirements include a foundational understanding of deep learning/machine learning, familiarity with Python syntax, and experience with PyTorch [7]. Course Structure - The course consists of 12 weeks of online group research followed by 2 weeks of paper guidance, culminating in a 10-week paper maintenance period [11]. - Students will analyze classic and cutting-edge papers, understand key algorithms and principles, and develop their research ideas [11]. Weekly Breakdown - The course covers various topics, including model pruning, quantization, dynamic knowledge expansion, advanced reasoning techniques, and multimodal understanding [16][18]. - Each week includes specific themes and outputs, such as determining research ideas, optimizing model size and performance, and enhancing coding capabilities [16][18]. Additional Resources - The course provides access to datasets from public sources and baseline code tailored to specific applications [13][14]. - Essential papers and resources are recommended for foundational knowledge and advanced techniques in model optimization [15][17].
SIGIR 2025 | 解决扩展和迁移难题,华为新加坡提出InstructRAG,提升高达19%
机器之心· 2025-05-23 06:49
Core Viewpoint - The article discusses the InstructRAG framework, which leverages Retrieval-Augmented Generation (RAG) to enhance task planning capabilities of large language models (LLMs) by addressing scalability and transferability challenges [1][2][30]. Group 1: Challenges in Task Planning - Scalability is defined as the ability to expand the instruction graph by combining existing instructions into new sequences, enabling LLMs to tackle tasks without predefined paths [1][2]. - Transferability involves developing technologies that allow models to quickly adapt to new tasks and learn effectively from limited examples [2]. Group 2: InstructRAG Framework Components - The InstructRAG framework consists of three main components: 1. Instruction Graph, which organizes past instruction paths [4]. 2. RL-Agent, a reinforcement learning agent that expands the graph coverage [4]. 3. ML-Agent, a meta-learning agent that enhances task generalization capabilities [4]. Group 3: Instruction Graph - The Instruction Graph is a directed graph that organizes past instruction paths, where nodes represent instruction sets and edges represent tasks [6]. Group 4: RL-Agent Functionality - The RL-Agent operates as a Markov Decision Process (MDP) to select nodes in the instruction graph, effectively exploring its scalability [7]. - It utilizes state, action, reward, and policy learning to optimize the selection of instruction paths [8]. Group 5: ML-Agent Functionality - The ML-Agent enhances transferability by selecting relevant paths from the RL-Agent's candidates and generating prompts for LLMs [9]. - Its training involves pre-training and fine-tuning phases to optimize performance [10][11]. Group 6: Overall Framework and Training - The overall framework includes training, few-shot learning, and testing phases, enhancing scalability through the RL-Agent and transferability through the ML-Agent [13][16]. Group 7: Experimental Results - InstructRAG demonstrated superior performance across multiple datasets, achieving a 19.2% improvement over the best baseline method in various tasks [22][30]. - The framework showed strong generalization capabilities when applied to unseen tasks, maintaining effectiveness with limited examples [23][28]. Group 8: Robustness and Component Importance - InstructRAG exhibited robust performance against noise, with only an 11.1% performance drop at 50% noise, compared to a 27.2% drop for the baseline [25]. - Each component of InstructRAG significantly contributes to its performance, as evidenced by ablation studies [26][27]. Group 9: Future Directions - Future work will focus on further enhancing the generalization capabilities of InstructRAG [30].
重磅发布 | 复旦《大规模语言模型:从理论到实践(第2版)》全新升级,聚焦AI前沿
机器之心· 2025-04-28 01:26
机器之心发布 机器之心编辑部 《大规模语言模型:从理论到实践(第 2版)》 是一本理论与实践并重的专业 技术书 ,更是 AI时代不可或缺的知识工具书。 任何人 都能在本 书中找到属于自己的成长路径。 在人工智能浪潮席卷全球的今天,大语言模型正以前所未有的速度推动着科技进步和产业变革。从 ChatGPT 到各类行业应用,LLM 不仅重塑 了人机交互的方式,更成为推动学术研究与产业创新的关键技术。 面对这一飞速演进的技术体系,如何系统理解其理论基础、掌握核心算法与工程实践,已成为每一位 AI 从业者、研究者、高校学子的必修课。 2023 年 9 月,复旦大学张奇、桂韬、郑锐、黄萱菁研究团队面向全球学术界与产业界正式发布了《大规模语言模型:从理论到实践》。短短 两年,大语言模型在理论研究、预训练方法、后训练技术及解释性等方面取得了重要进展。业界对大语言模型的研究更加深入,逐渐揭示出许多 与传统深度学习和自然语言处理范式不同的特点。例如, 大语言模型仅需 60 条数据就能学习并展现出强大的问题回答能力,显示了其惊人的 泛化性 。然而,本书作者们也发现大语言模型存在一定的脆弱性。例如, 在一个拥有 130 亿个参数的模 ...