语言模型

Search documents
大佬面对面!斯坦福2025 CS336课程全公开:从零开始搓大模型~
自动驾驶之心· 2025-06-24 11:47
Core Viewpoint - The article discusses the launch of Stanford University's CS336 course "Language Models from Scratch," which aims to provide a comprehensive understanding of language models through practical development and implementation [5][7]. Course Overview - The course focuses on the foundational aspects of language models, which are essential for modern natural language processing (NLP) applications. It emphasizes the importance of understanding language models for scientists and engineers in the fields of AI and ML [5][7]. - The course is structured into five major modules: Foundations, Systems, Extensions, Data, and Alignment & Reinforcement Learning [7]. Course Requirements - Students are expected to have proficiency in Python, as most assignments will require extensive coding. The course will provide minimal scaffolding, resulting in a higher volume of code written by students compared to other AI courses [7]. - A background in deep learning and system optimization is necessary, particularly familiarity with PyTorch and basic system concepts like memory hierarchy [7]. - Foundational knowledge in calculus, linear algebra, probability, and statistics is required, along with a basic understanding of machine learning principles [7]. Assignments - The course includes several assignments that cover various aspects of language model development, such as implementing a BPE tokenizer, training models on specific datasets, and optimizing performance on GPUs [8]. - Assignments are designed to simulate real-world challenges, including data processing and model alignment, with a focus on practical application and hands-on experience [8]. Course Schedule - The course is structured with a detailed schedule that outlines topics, materials, and deadlines for assignments, ensuring a systematic approach to learning [9].
新鲜出炉!斯坦福2025 CS336课程全公开:从零开始搓大模型
机器之心· 2025-06-23 04:04
Core Viewpoint - The article announces the launch of Stanford University's CS336 course "Language Models from Scratch" for Spring 2025, which aims to guide students through the entire process of developing their own language models [1][8]. Group 1: Course Overview - CS336 is designed to help students gain a comprehensive understanding of language models by guiding them through various stages, including data collection, model construction, training, and evaluation [8]. - The course structure consists of 5 units and 19 lectures, with a focus on practical implementation and hands-on experience [10]. Group 2: Instructors - Tatsunori Hashimoto, an assistant professor at Stanford, has a strong background in machine learning and has received over 30,000 citations for his research [2]. - Percy Liang, an associate professor and director of the Center for Research on Foundation Models (CRFM), has over 100,000 citations and extensive experience in AI research [6][7]. Group 3: Course Requirements - Students are expected to have proficiency in Python, deep learning, and system optimization, as well as a solid understanding of calculus, linear algebra, and basic probability and statistics [11]. - The course emphasizes minimal scaffolding, requiring students to write significantly more code compared to other AI courses [11].
不是视频模型“学习”慢,而是LLM走捷径|18万引大牛Sergey Levine
量子位· 2025-06-10 07:35
Core Viewpoint - The article discusses the limitations of AI, particularly in the context of language models (LLMs) and video models, using the metaphor of "Plato's Cave" to illustrate the difference between human cognition and AI's understanding of the world [6][30][32]. Group 1: Language Models vs. Video Models - Language models have achieved significant breakthroughs by using a simple algorithm of next-word prediction combined with reinforcement learning [10][19]. - Despite video data being richer than text data, video models have not developed the same level of complex reasoning capabilities as language models [14][19]. - Language models can leverage human knowledge and reasoning paths found in text, allowing them to answer complex questions that video models cannot [21][22][25]. Group 2: The "Cave" Metaphor - The "Plato's Cave" metaphor is used to describe AI's current state, where it learns from human knowledge but does not truly understand the world [29][32]. - AI's capabilities are seen as a reverse engineering of human cognition rather than independent exploration [33]. - The article suggests that AI should aim to move beyond this "shadow dependency" and interact directly with the physical world for true understanding [34][35]. Group 3: Future Directions for AI - The long-term goal for AI is to break free from reliance on human intermediaries, enabling direct interaction with the physical world [35]. - There is a suggestion that bridging different modalities (visual, language, action) could facilitate this exploration without needing to escape the "cave" [35].
完整版|谷歌创始人最新访谈,揭秘Gemini为什么突然变得这么强大?
3 6 Ke· 2025-05-26 00:49
整理:重点君 近日,谷歌创始人谢尔盖·布林接受了知名播客主理人洛根的访谈。对话探讨了谷歌 I/O 大会上的最新动态,以及谷歌 AI 开发的整体现 状。布林说,这一系列发布非常精彩,甚至有一些令他感到意外的元素,例如谷歌搜索中的虚拟试穿功能,反响热烈。但仍有大量工作要 做,才能顺利实现所有已宣布的功能。 布林表示,纵观人工智能发展的更广阔轨迹,看到当前的进展与几年前对奇点的理性推理截然不同,人工智能的发展方式令人惊讶: 1、语言模型已成为人工智能发展的主要驱动力,这在 15 年前并不明显,尤其是考虑到 DeepMind 过去专注于物理基础。 2、思维模型的惊人可解释性,可以洞察其推理过程,从安全角度来看具有显著的积极意义。 从架构上看,布林发现不同的模型非常相似,甚至看似不同的模型,例如视频传播模型训练过程正在不断发展。后训练阶段(微调、强化 学习工作)在整体工作中所占的比例越来越大,此阶段添加了工具使用等功能,使模型更加强大。 关于推理扩展,特别是深度思考计划。布林说,谷歌的愿景是让模型能够思考更长时间(数小时、数天甚至数月),从而对复杂问题给出 更好的答案,这与克服实现长上下文输入的挑战相比,谷歌一直在努力 ...