Zhejiang University
Search documents
DeepSeek技术溯源及前沿探索报告
Zhejiang University· 2025-05-22 01:20
Investment Rating - The report does not provide a specific investment rating for the industry Core Insights - The report discusses the evolution of large language models (LLMs) and highlights the significance of DeepSeek technology in bridging the gap between open-source and closed-source AI models, reducing the development lag from 6-12 months to 1-3 months [69] Summary by Sections Language Models - Language models aim to calculate the probability of a sequence of words, enabling machines to understand human language [6] - The report outlines the basic tasks of language models, including encoding and word embedding, which help in representing words in a way that captures their meanings [13][17] Transformer - The Transformer architecture introduced in 2017 revolutionized deep learning with its self-attention mechanism, allowing for parallel computation and better understanding of global context [32] - The report emphasizes the importance of the Transformer model as a foundational technology for large models, highlighting its ability to capture complex semantic relationships through multi-head attention [33] DeepSeek - DeepSeek technology is positioned as a significant advancement in AI, with its architecture allowing for efficient model training and inference, thus addressing the computational demands of large models [70] - The report details the stages of DeepSeek's development, including supervised fine-tuning and reinforcement learning, which enhance its reasoning capabilities [117][119] New Generation Agents - The report discusses the transition from generative models to reasoning models, indicating a shift in focus towards enhancing logical reasoning capabilities in AI systems [107] - It highlights the integration of LLMs with agent-based systems, where LLMs serve as the brain of agents, enabling them to perform complex tasks through planning and tool usage [133]
大模型:从单词接龙到行业落地
Zhejiang University· 2025-04-18 07:55
Investment Rating - The report does not provide a specific investment rating for the industry. Core Insights - The report discusses the evolution of large language models (LLMs) and their applications in various fields, emphasizing their ability to learn from vast amounts of unannotated data and perform tasks traditionally requiring human intelligence [48][49][50]. - It highlights the significance of pre-training and fine-tuning in enhancing model performance, with a focus on the advantages of using large datasets for training [35][56]. - The report also addresses the challenges faced by LLMs, including issues of hallucination, bias, and outdated information, and suggests that integrating external data sources can mitigate these problems [63][80]. Summary by Sections Section on Large Language Models - Large language models utilize vast amounts of unannotated data to learn about the physical world and human language patterns [48]. - The training process involves pre-training on diverse datasets followed by fine-tuning for specific tasks [35][56]. Section on Training Techniques - The report outlines various training techniques, including supervised fine-tuning (SFT) and instruction tuning, which help models generalize to unseen tasks [56][59]. - Reinforcement learning from human feedback (RLHF) is also discussed as a method to align model outputs with human preferences [59]. Section on Applications and Use Cases - The report emphasizes the versatility of LLMs in applications ranging from natural language processing to complex problem-solving tasks [48][49]. - It mentions specific use cases, such as in the fields of healthcare for predicting conditions like epilepsy [162][211]. Section on Challenges and Solutions - The report identifies key challenges such as hallucination, bias, and the need for timely information, proposing the use of external databases to enhance model accuracy and relevance [63][80]. - It suggests that addressing these challenges is crucial for the broader adoption of LLMs in various industries [63][80].
以产业大脑为例:从大模型、智能体到复杂AI应用系统的构建
Zhejiang University· 2025-03-25 06:12
Report Industry Investment Rating No relevant content provided. Core Viewpoints of the Report - The reasoning ability of new-generation large models is continuously strengthening [172] - High-performance and low-cost reasoning models for a specific professional field can be trained based on high-quality small datasets [172] - Various complex intelligent application systems can be implemented based on large models through intelligent agents (AI Agents) [172] - Whether "reasoning large models + knowledge graphs (knowledge bases) + intelligent agents" will become the paradigm for future AI system development and application remains a question [172] Summary by Relevant Catalogs 1. Rapid Improvement of Large Model Reasoning Ability - ChatGPT is a large-scale pre-trained language model that learns from human feedback information after accumulating multiple types of technologies. Its success in 2022 marked the entry of conversational AI into the mass application stage [7][9][10] - The capabilities of large models have been continuously growing, with their performance in knowledge answering, mathematics, programming, etc., reaching new heights and exceeding human levels in many tasks. The parameter scale of large models has developed from tens of billions to trillions [16][19] - Early large models had obvious shortcomings in reasoning ability, prone to hallucinations, especially in mathematical reasoning, such as simple numerical comparison errors, weak multi-step reasoning ability, and inconsistent reasoning [20][24] - From 2023 - 2024, there were breakthroughs in reasoning ability. OpenAI o1/o3 showed excellent performance in mathematical and code reasoning tasks, and the open-source large model DeepSeek - R1 achieved an 87.2% accuracy rate on the MATH benchmark [35][38][40] 2. Reasoning Models and Chain of Thought (CoT) - Through technologies such as test-time scaling, reinforcement learning, and distillation, the reasoning ability of large models is continuously enhanced. Different models have their own characteristics in enhancing reasoning ability [45][46] - OpenAI - o series reasoning models generate a detailed internal chain of thought before answering questions, simulating human deliberation and improving the accuracy and depth of answers [47] - The chain of thought (CoT) is a way to break down complex problems step by step. Some models can generate and display the chain of thought to solve problems [52][54][58] - High-performance and low-cost reasoning models can be achieved through carefully designed small amounts of high-quality samples. For example, s1 and LIMO demonstrated good reasoning performance with a small number of samples [59][62][67] 3. What is an Intelligent Agent (AI Agent)? - Large language models (LLMs) lack the ability to interact with the physical world using various tools and human memory capabilities. Intelligent agents can enhance LLMs with these essential capabilities [78] - Taking the example of writing a research report on Tesla FSD and Huawei ADS, intelligent agents can perform task decomposition, use various tools to collect information, summarize content, and generate reports [80][83][84] - The Agent System has a five - layer cornerstone theory, including Models, Prompt Templates, Chains, Agent, and Multi - Agent. LLM - powered agents can sense the environment, make decisions, and take actions [96][98] - For more complex tasks, large and small models can collaborate in a generative intelligent agent. HuggingGPT is an example where the large language model is responsible for planning and decision - making, and small AI models are responsible for task execution [101][104][105] 4. Case of the Four - Chain Integrated Industrial Brain - There is a need for industrial cognitive decision - making at the national strategic level and industrial development decision - making at the social level. AI can promote the deep integration of the innovation chain, industrial chain, capital chain, and talent chain [118][122][129] - The industrial network chain large model is trained with massive industrial data and knowledge graphs. It can provide various services such as intelligence services, knowledge answering, and report generation, and has functions to address industry challenges [130][133][135] - The four - chain integrated knowledge computing engine, such as the SupXmind basic platform, can help users build intelligent decision - making systems. The industrial vertical domain large model iChainGPT has seven characteristic capabilities [144][147] - The industrial network chain large model has a specific composition and service framework, and can be customized according to customer needs. There are also many typical application scenarios at the provincial, municipal, and industrial cluster levels [148][153][162]
计算机行业DeepSeek:智能时代的全面到来和人机协作的新常态
Zhejiang University· 2025-03-13 03:04
Investment Rating - The report does not explicitly state an investment rating for the industry Core Insights - The report discusses the evolution of intelligence and the new normal of human-machine collaboration, emphasizing the transformative impact of AI on various sectors [1][55] - It highlights the significant advancements in AI models, particularly the transition from GPT-3 to DeepSeek-V3, showcasing improvements in training data volume and model architecture [4][6] - The report notes the rapid growth of AI tools and applications, indicating a shift towards more integrated and efficient AI solutions across industries [71][74] Summary by Sections 1. Evolution of Intelligence - The evolution of AI is marked by increasing data volumes and model complexities, with DeepSeek-V3 utilizing 14.8 trillion tokens compared to GPT-3's 300 billion tokens [6] - The report outlines the historical context of AI development, linking it to broader industrial revolutions and technological advancements [64][66] 2. Human-Machine Collaboration - The report emphasizes the importance of human-machine collaboration, suggesting that AI will augment human capabilities rather than replace them [55][57] - It discusses the potential for new job creation alongside job displacement, highlighting the need for skill enhancement in the workforce [57][58] 3. Industry Status - The report provides an overview of the current state of AI applications in various sectors, including consumer and enterprise-level integrations [74] - It notes the deployment of advanced AI models in critical areas such as energy, healthcare, and governance, showcasing their practical benefits [74] 4. Educational Growth - The report stresses the need for educational initiatives to prepare the workforce for the AI-driven future, focusing on skill development and adaptability [57][58] - It suggests that AI can lead to improved work-life balance, potentially enabling shorter workweeks as productivity increases [57][58]
2024基于机理与人工智能混合驱动的新型电力系统智能分析与调控策略研究报告
Zhejiang University· 2024-08-19 01:25
Industry Overview - The report focuses on the development of intelligent analysis and control strategies for new power systems driven by a combination of mechanism and artificial intelligence [1] - The core objective is to achieve "carbon peak and carbon neutrality" by building a new power system characterized by high proportions of renewable energy integration, power electronics, and energy storage devices [4] - Key challenges include the dynamic, random, and uncertain nature of grid operations, which require advanced online safety assessment and intelligent scheduling control methods [4] Key Research Areas - Multi-temporal and spatial dimension power prediction technology has been developed using integrated machine learning models, applied in various scenarios such as residential, industrial, and commercial loads, as well as renewable energy generation and charging stations [2] - Intelligent decision-making technologies based on deep reinforcement learning have been developed to address control challenges in power system planning and scheduling, including optimization of power equipment configuration and control of reactive voltage, active power, network loss, and topology [2] - Digital twin modeling and parameter intelligent identification technologies have been established for complex power equipment, including traditional generators, DC systems, wind power, and composite load models [2] Applications and Case Studies - A data-driven, AI-based grid brain technology framework has been applied in Jiangsu Power Grid, addressing issues such as voltage violations, power loss reduction, and power flow constraints [14] - The intelligent control system deployed in Jiangsu Power Grid achieved a 99.41% effective instruction rate, with an average network loss reduction of approximately 3.6412% [16] - Reinforcement learning-based methods have been used for automatic parameter calibration of generator and excitation models, achieving good performance after 400 training iterations [20] Advanced Technologies - Deep reinforcement learning has been applied to real-time AC optimal power flow, deriving fast solutions for secure and economic grid operations [22] - AI-based autonomous topology control has been developed to maximize time-series available transfer capabilities, considering uncertainties in grid operations [37] - A comprehensive planning, early warning, and control platform integrating AI, HPC, and big data technologies has been established for multi-objective power system simulation, verification, and control [41] Distributed Resource Management - Virtual power plants (VPPs) have been developed to aggregate distributed renewable generation, energy storage, and load subsystems, providing capacity and ancillary services to improve grid economy and reliability [25] - VPPs enable intelligent adjustment of various load devices, addressing challenges related to the randomness and volatility of new energy integration [25] - Reinforcement learning-based net load volatility control has been applied in active distribution power networks, effectively reducing peak-valley differences and overall fluctuations [32]