Workflow
Self-supervised Learning
icon
Search documents
DeepSeek技术溯源及前沿探索报告
Zhejiang University· 2025-05-22 01:20
Investment Rating - The report does not provide a specific investment rating for the industry Core Insights - The report discusses the evolution of large language models (LLMs) and highlights the significance of DeepSeek technology in bridging the gap between open-source and closed-source AI models, reducing the development lag from 6-12 months to 1-3 months [69] Summary by Sections Language Models - Language models aim to calculate the probability of a sequence of words, enabling machines to understand human language [6] - The report outlines the basic tasks of language models, including encoding and word embedding, which help in representing words in a way that captures their meanings [13][17] Transformer - The Transformer architecture introduced in 2017 revolutionized deep learning with its self-attention mechanism, allowing for parallel computation and better understanding of global context [32] - The report emphasizes the importance of the Transformer model as a foundational technology for large models, highlighting its ability to capture complex semantic relationships through multi-head attention [33] DeepSeek - DeepSeek technology is positioned as a significant advancement in AI, with its architecture allowing for efficient model training and inference, thus addressing the computational demands of large models [70] - The report details the stages of DeepSeek's development, including supervised fine-tuning and reinforcement learning, which enhance its reasoning capabilities [117][119] New Generation Agents - The report discusses the transition from generative models to reasoning models, indicating a shift in focus towards enhancing logical reasoning capabilities in AI systems [107] - It highlights the integration of LLMs with agent-based systems, where LLMs serve as the brain of agents, enabling them to perform complex tasks through planning and tool usage [133]
Google首席科学家万字演讲回顾AI十年:哪些关键技术决定了今天的大模型格局?
机器人圈· 2025-04-30 09:10
Google 首席科学家Jeff Dean 今年4月于在苏黎世联邦理工学院发表关于人工智能重要趋势的演讲,本次演讲回顾 了奠定现代AI基础的一系列关键技术里程碑,包括神经网络与反向传播、早期大规模训练、硬件加速、开源生 态、架构革命、训练范式、模型效率、推理优化等。算力、数据量、模型规模扩展以及算法和模型架构创新对AI 能力提升的关键作用。 以下是本次演讲 实录 经数字开物团队编译整理 01 AI 正以前所未有的规模和算法进步改变计算范式 Jeff Dean: 今天我将和大家探讨 AI 的重要趋势。我们会回顾:这个领域是如何发展到今天这个模型能力水平的?在当前的技 术水平下,我们能做些什么?以及,我们该如何塑造 AI 的未来发展方向? 这项工作是与 Google 内外的众多同仁共同完成的,所以并非全是我个人的成果,其中许多是合作研究。有些工作 甚至并非由我主导,但我认为它们都非常重要,值得在此与大家分享和探讨。 我们先来看一些观察发现,其中大部分对在座各位而言可能显而易见。首先,我认为最重要的一点是,机器学习 彻底改变了我们对计算机能力的认知和期待。回想十年前,当时的计算机视觉技术尚处初级阶段,计算机几乎谈 ...