Workflow
语言模型
icon
Search documents
观点分享:VLA解决的是概念认知,无法有效的建模真实世界的四维时空?
自动驾驶之心· 2025-10-14 07:12
Core Viewpoint - The article discusses the importance of world models in intelligent driving, emphasizing that true understanding of the environment requires a high-bandwidth cognitive system rather than merely extending language models [2][3][5]. Summary by Sections World Model vs. Language Model - The world model focuses on spatiotemporal cognition, while the language model addresses conceptual cognition. Language models have low bandwidth and sparsity, making them ineffective for modeling the real world's four-dimensional space-time [2][3]. - The world model aims to establish capabilities directly at the video level, rather than converting information into language first [3][4]. VLA and WA - VLA (Vision-Language Architecture) is essentially an extension of language models, adding new modalities but still rooted in language. In contrast, the world model seeks to create a comprehensive cognitive system [3][5]. - The ultimate goal of autonomous driving is to achieve open-set interactions, allowing users to express commands freely without being limited to a fixed set of instructions [3][4]. Importance of Language - Language remains crucial for three main reasons: 1. Incorporating physical laws such as gravity and inertia into the model [6]. 2. Understanding and predicting object movements in three-dimensional space over time [6]. 3. Absorbing vast amounts of data from the internet, which aids in training autonomous driving systems [7]. Integration of Models - The combination of language models (conceptual cognition) and world models (spatiotemporal cognition) is essential for advancing towards Artificial General Intelligence (AGI) [8]. Industry Trends - The autonomous driving industry is experiencing intense competition, with many professionals considering transitioning to embodied AI due to the saturation of current technologies [9]. - The ongoing debate between VLA and WA represents a larger industry transformation, highlighting the need for innovative solutions to break through current limitations [9]. Community and Resources - A community platform has been established to facilitate knowledge sharing and collaboration among professionals in the autonomous driving field, featuring resources such as learning routes, technical discussions, and job opportunities [25][26].
Qwen3-Max-Preview 上线,官方称系通义千问系列最强大的语言模型
Sou Hu Cai Jing· 2025-09-06 10:03
Core Insights - Alibaba's Tongyi Qwen has launched the latest Qwen-3-Max-Preview model, which is described as the most powerful language model in the Tongyi Qwen series [1] - The Qwen-3-Max model offers significant improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version [1][3] - The model supports over 100 languages and is optimized for retrieval-augmented generation (RAG) and tool invocation, although it does not include a dedicated "thinking" mode [1][3] Pricing and Performance - The input price for using the Qwen-3-Max model is $1.20 per million tokens, while the output price is $6 per million tokens [2][5] - The model can handle a context of up to 256,000 tokens, with a maximum output of 32,800 tokens [5] Technical Enhancements - Qwen-3-Max provides higher accuracy in mathematical, coding, logic, and scientific tasks, and it reliably follows complex instructions in both Chinese and English [1][3] - The model reduces hallucinations and generates higher-quality responses for open-ended questions, writing, and conversation [1][3]
大佬面对面!斯坦福2025 CS336课程全公开:从零开始搓大模型~
自动驾驶之心· 2025-06-24 11:47
Core Viewpoint - The article discusses the launch of Stanford University's CS336 course "Language Models from Scratch," which aims to provide a comprehensive understanding of language models through practical development and implementation [5][7]. Course Overview - The course focuses on the foundational aspects of language models, which are essential for modern natural language processing (NLP) applications. It emphasizes the importance of understanding language models for scientists and engineers in the fields of AI and ML [5][7]. - The course is structured into five major modules: Foundations, Systems, Extensions, Data, and Alignment & Reinforcement Learning [7]. Course Requirements - Students are expected to have proficiency in Python, as most assignments will require extensive coding. The course will provide minimal scaffolding, resulting in a higher volume of code written by students compared to other AI courses [7]. - A background in deep learning and system optimization is necessary, particularly familiarity with PyTorch and basic system concepts like memory hierarchy [7]. - Foundational knowledge in calculus, linear algebra, probability, and statistics is required, along with a basic understanding of machine learning principles [7]. Assignments - The course includes several assignments that cover various aspects of language model development, such as implementing a BPE tokenizer, training models on specific datasets, and optimizing performance on GPUs [8]. - Assignments are designed to simulate real-world challenges, including data processing and model alignment, with a focus on practical application and hands-on experience [8]. Course Schedule - The course is structured with a detailed schedule that outlines topics, materials, and deadlines for assignments, ensuring a systematic approach to learning [9].
新鲜出炉!斯坦福2025 CS336课程全公开:从零开始搓大模型
机器之心· 2025-06-23 04:04
Core Viewpoint - The article announces the launch of Stanford University's CS336 course "Language Models from Scratch" for Spring 2025, which aims to guide students through the entire process of developing their own language models [1][8]. Group 1: Course Overview - CS336 is designed to help students gain a comprehensive understanding of language models by guiding them through various stages, including data collection, model construction, training, and evaluation [8]. - The course structure consists of 5 units and 19 lectures, with a focus on practical implementation and hands-on experience [10]. Group 2: Instructors - Tatsunori Hashimoto, an assistant professor at Stanford, has a strong background in machine learning and has received over 30,000 citations for his research [2]. - Percy Liang, an associate professor and director of the Center for Research on Foundation Models (CRFM), has over 100,000 citations and extensive experience in AI research [6][7]. Group 3: Course Requirements - Students are expected to have proficiency in Python, deep learning, and system optimization, as well as a solid understanding of calculus, linear algebra, and basic probability and statistics [11]. - The course emphasizes minimal scaffolding, requiring students to write significantly more code compared to other AI courses [11].
不是视频模型“学习”慢,而是LLM走捷径|18万引大牛Sergey Levine
量子位· 2025-06-10 07:35
Core Viewpoint - The article discusses the limitations of AI, particularly in the context of language models (LLMs) and video models, using the metaphor of "Plato's Cave" to illustrate the difference between human cognition and AI's understanding of the world [6][30][32]. Group 1: Language Models vs. Video Models - Language models have achieved significant breakthroughs by using a simple algorithm of next-word prediction combined with reinforcement learning [10][19]. - Despite video data being richer than text data, video models have not developed the same level of complex reasoning capabilities as language models [14][19]. - Language models can leverage human knowledge and reasoning paths found in text, allowing them to answer complex questions that video models cannot [21][22][25]. Group 2: The "Cave" Metaphor - The "Plato's Cave" metaphor is used to describe AI's current state, where it learns from human knowledge but does not truly understand the world [29][32]. - AI's capabilities are seen as a reverse engineering of human cognition rather than independent exploration [33]. - The article suggests that AI should aim to move beyond this "shadow dependency" and interact directly with the physical world for true understanding [34][35]. Group 3: Future Directions for AI - The long-term goal for AI is to break free from reliance on human intermediaries, enabling direct interaction with the physical world [35]. - There is a suggestion that bridging different modalities (visual, language, action) could facilitate this exploration without needing to escape the "cave" [35].
完整版|谷歌创始人最新访谈,揭秘Gemini为什么突然变得这么强大?
3 6 Ke· 2025-05-26 00:49
Core Insights - Sergey Brin discussed Google's recent advancements in AI during an interview, highlighting the excitement around new features like virtual try-ons in Google Search and the ongoing work required to implement these functionalities [2][3] - The evolution of AI has shifted towards language models as the primary driving force, which was not as apparent 15 years ago, with significant improvements in model interpretability and safety [2][14] - Brin expressed optimism about Google's position in AI innovation, noting the company's readiness for transformation due to its experience with large-scale data and machine learning technologies [3][20] AI Development and Models - The focus on extending reasoning capabilities in AI models aims to allow them to think for longer periods, addressing the challenge of long-context inputs [3][17] - The architecture of different models shows surprising similarities, with a growing emphasis on post-training processes that enhance model capabilities through tool usage [3][16] - Gemini 2.5 Pro and Gemini 2.5 Flash represent significant advancements, with the former leading in most benchmarks and the latter being recognized for its speed and performance [3][21] Company Culture and Innovation - Google is undergoing a self-reinvention process to adapt to significant technological shifts, particularly in AI, which aligns with the company's historical focus on large-scale data and machine learning [3][19] - The company has experienced a notable acceleration in product development from 2024 to 2025, indicating a robust pipeline of innovations [3][20] - Brin emphasized the importance of maintaining a startup-like culture within Google to foster continuous innovation and adaptation in the rapidly evolving AI landscape [3][19]