Workflow
语言模型
icon
Search documents
为什么蔚来会押注世界模型?
自动驾驶之心· 2025-12-31 06:27
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 这两天蔚来NWM2.0集中宣发,效果 号称 还不错。根据之前蔚来公开的信息,世界模型可能会有惊喜。任少卿认为智能驾驶真正的上限在世界模型, 即以视频为 核心,通过跨模态的互相预测和重建,让系统学习时空和物理规律 ,让机器能像人一样理解环境。 世界模型解决的是 时空认知 ,语言模型解决的是 概念认知 。语言模型低带宽和稀疏性 无法真正有效的建模真实世界的四维时空(时间+空间)。 世界模型的认知 包含两个层面: 世界模型不是 "语言加法",而是要建立一套高带宽的认知系统,所以直接在视频端建立能力,而不是先转成语言。所以我们看到一些AI巨头在做通用世界模型,李 飞飞Marble、 yann lecun的 V-JEPA 2、DeepMind发布Genie 3。 而在自动驾驶领域,常见的方向是 视频生成/OCC生成 ,此外还有 Lidar点云生成等方向 。很多公司基于这些开源算法搭建自己的云端/车端世界模型,用于长尾数 据生成或者闭环仿真/评测。一些公司也在尝试基于世界模型直接赋能车端驾驶能力。 但世界模型的定义仍然很 ...
自变量王潜:具身智能是物理世界的独立基础模型|MEET2026
具身智能之心· 2025-12-22 01:22
Core Viewpoint - The article discusses the debate on whether embodied intelligence should be viewed as an application or as an independent foundational model, asserting that it is a foundational model specifically designed for the physical world, parallel to language and multimodal models [6][12][60]. Group 1: Differences Between Physical and Virtual Worlds - There is a fundamental difference between the physical world, characterized by randomness and continuous processes, and the virtual world, which is highly reproducible and low in randomness [2][10]. - Existing models based on language and visual modalities are inadequate for accurately representing the complexities and randomness of physical interactions [16][22]. Group 2: Need for a Separate Foundational Model - A separate foundational model for embodied intelligence is necessary due to the unique characteristics of the physical world, which often leads to unpredictable outcomes even under identical conditions [10][11]. - The current architectures and training methods struggle to capture the high randomness present in physical events, necessitating a new approach to model design [12][20]. Group 3: Future of Multimodal Models - Shifting the perspective to view embodied intelligence as an independent foundational model can lead to significant changes in model architecture and data utilization [9][23]. - The learning and perception processes in the physical world differ fundamentally from those in the virtual world, suggesting that future multimodal models should incorporate these differences [24][29]. Group 4: Scaling Laws and Data Utilization - The article emphasizes the importance of scaling laws in the development of large models, particularly in the context of robotics, where data acquisition and utilization are critical [46][51]. - A phased approach to training, utilizing both pre-training and post-training data, is recommended to enhance model performance [48][52]. Group 5: Hardware and AI Integration - The integration of AI in defining hardware is crucial for the development of embodied intelligence, advocating for a simultaneous evolution of both software and hardware [53][54]. - The potential for embodied intelligence to drive exponential growth in resources and capabilities is highlighted, suggesting a transformative impact on the future of artificial general intelligence (AGI) [59][60].
艾瑞观察:语言模型的价值重构与生态突围
艾瑞咨询· 2025-12-18 00:05
Core Insights - By 2025, the global focus of technological competition has shifted to language models, marking a transition from a "Spring and Autumn" period of "hundred models war" to a "Warring States" era where major companies prioritize "value realization" over mere parameter scale competition [1] - Language models are reshaping the underlying logic of the digital economy, with tech giants investing billions in R&D to transform these models from novelty tools into essential national-level utilities [1] Industry Overview - The AI industry is experiencing rapid expansion and deep technological iteration, driven by language models as the core engine [2] - Key trends include multi-modal integration, embodiment intelligence, and the practical application of intelligent agents, with language models serving as the indispensable "central nervous system" [2] Language Model Sub-industry - The language model sub-industry is generally positive but faces three core pain points in consumer applications: insufficient practicality, fragmented scenarios, and cost-ecosystem imbalance [3] - The recent launch of Alibaba's Qianwen APP has seen significant success, with over 10 million downloads within a week of public testing and monthly active users exceeding 30 million within 23 days [4] Qianwen APP's Strategic Approach - Qianwen APP's rise is attributed to its strategic adjustment addressing industry pain points through a "technology + scenario + ecosystem" framework, validating Alibaba's "user-first, AI-driven" strategy [6] - The app leverages Alibaba's Qwen series models, which are competitive with leading closed-source models, enhancing its capabilities in logical reasoning and long-text processing [6][8] Future Development Trends - The language model industry is expected to enter a new development cycle characterized by technological integration, ecological symbiosis, and value orientation [9] - Future models will focus on deep multi-modal integration and vertical precision, with open-source models driving innovation and reducing costs for small and medium enterprises [9] Conclusion - The language model industry is at a critical juncture, transitioning from technological explosion to industrial prosperity, with Qianwen representing a significant breakthrough in both domestic and global markets [10]
腾讯混元2.0上线
Di Yi Cai Jing· 2025-12-05 14:13
(文章来源:第一财经) 12月5日,腾讯混元宣布,最新版语言模型Tencent HY 2.0 Think和Tencent HY 2.0 Instruct正式发布。HY 2.0采用混合专家(MoE)架构,总参数406B,激活参数32B,支持256K上下文窗口。相比上一版本 (Hunyuan-T1-20250822)模型, HY 2.0 Think显著改进了预训练数据和强化学习策略。 ...
观点分享:VLA解决的是概念认知,无法有效的建模真实世界的四维时空?
自动驾驶之心· 2025-10-14 07:12
Core Viewpoint - The article discusses the importance of world models in intelligent driving, emphasizing that true understanding of the environment requires a high-bandwidth cognitive system rather than merely extending language models [2][3][5]. Summary by Sections World Model vs. Language Model - The world model focuses on spatiotemporal cognition, while the language model addresses conceptual cognition. Language models have low bandwidth and sparsity, making them ineffective for modeling the real world's four-dimensional space-time [2][3]. - The world model aims to establish capabilities directly at the video level, rather than converting information into language first [3][4]. VLA and WA - VLA (Vision-Language Architecture) is essentially an extension of language models, adding new modalities but still rooted in language. In contrast, the world model seeks to create a comprehensive cognitive system [3][5]. - The ultimate goal of autonomous driving is to achieve open-set interactions, allowing users to express commands freely without being limited to a fixed set of instructions [3][4]. Importance of Language - Language remains crucial for three main reasons: 1. Incorporating physical laws such as gravity and inertia into the model [6]. 2. Understanding and predicting object movements in three-dimensional space over time [6]. 3. Absorbing vast amounts of data from the internet, which aids in training autonomous driving systems [7]. Integration of Models - The combination of language models (conceptual cognition) and world models (spatiotemporal cognition) is essential for advancing towards Artificial General Intelligence (AGI) [8]. Industry Trends - The autonomous driving industry is experiencing intense competition, with many professionals considering transitioning to embodied AI due to the saturation of current technologies [9]. - The ongoing debate between VLA and WA represents a larger industry transformation, highlighting the need for innovative solutions to break through current limitations [9]. Community and Resources - A community platform has been established to facilitate knowledge sharing and collaboration among professionals in the autonomous driving field, featuring resources such as learning routes, technical discussions, and job opportunities [25][26].
Qwen3-Max-Preview 上线,官方称系通义千问系列最强大的语言模型
Sou Hu Cai Jing· 2025-09-06 10:03
Core Insights - Alibaba's Tongyi Qwen has launched the latest Qwen-3-Max-Preview model, which is described as the most powerful language model in the Tongyi Qwen series [1] - The Qwen-3-Max model offers significant improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version [1][3] - The model supports over 100 languages and is optimized for retrieval-augmented generation (RAG) and tool invocation, although it does not include a dedicated "thinking" mode [1][3] Pricing and Performance - The input price for using the Qwen-3-Max model is $1.20 per million tokens, while the output price is $6 per million tokens [2][5] - The model can handle a context of up to 256,000 tokens, with a maximum output of 32,800 tokens [5] Technical Enhancements - Qwen-3-Max provides higher accuracy in mathematical, coding, logic, and scientific tasks, and it reliably follows complex instructions in both Chinese and English [1][3] - The model reduces hallucinations and generates higher-quality responses for open-ended questions, writing, and conversation [1][3]
大佬面对面!斯坦福2025 CS336课程全公开:从零开始搓大模型~
自动驾驶之心· 2025-06-24 11:47
Core Viewpoint - The article discusses the launch of Stanford University's CS336 course "Language Models from Scratch," which aims to provide a comprehensive understanding of language models through practical development and implementation [5][7]. Course Overview - The course focuses on the foundational aspects of language models, which are essential for modern natural language processing (NLP) applications. It emphasizes the importance of understanding language models for scientists and engineers in the fields of AI and ML [5][7]. - The course is structured into five major modules: Foundations, Systems, Extensions, Data, and Alignment & Reinforcement Learning [7]. Course Requirements - Students are expected to have proficiency in Python, as most assignments will require extensive coding. The course will provide minimal scaffolding, resulting in a higher volume of code written by students compared to other AI courses [7]. - A background in deep learning and system optimization is necessary, particularly familiarity with PyTorch and basic system concepts like memory hierarchy [7]. - Foundational knowledge in calculus, linear algebra, probability, and statistics is required, along with a basic understanding of machine learning principles [7]. Assignments - The course includes several assignments that cover various aspects of language model development, such as implementing a BPE tokenizer, training models on specific datasets, and optimizing performance on GPUs [8]. - Assignments are designed to simulate real-world challenges, including data processing and model alignment, with a focus on practical application and hands-on experience [8]. Course Schedule - The course is structured with a detailed schedule that outlines topics, materials, and deadlines for assignments, ensuring a systematic approach to learning [9].
新鲜出炉!斯坦福2025 CS336课程全公开:从零开始搓大模型
机器之心· 2025-06-23 04:04
Core Viewpoint - The article announces the launch of Stanford University's CS336 course "Language Models from Scratch" for Spring 2025, which aims to guide students through the entire process of developing their own language models [1][8]. Group 1: Course Overview - CS336 is designed to help students gain a comprehensive understanding of language models by guiding them through various stages, including data collection, model construction, training, and evaluation [8]. - The course structure consists of 5 units and 19 lectures, with a focus on practical implementation and hands-on experience [10]. Group 2: Instructors - Tatsunori Hashimoto, an assistant professor at Stanford, has a strong background in machine learning and has received over 30,000 citations for his research [2]. - Percy Liang, an associate professor and director of the Center for Research on Foundation Models (CRFM), has over 100,000 citations and extensive experience in AI research [6][7]. Group 3: Course Requirements - Students are expected to have proficiency in Python, deep learning, and system optimization, as well as a solid understanding of calculus, linear algebra, and basic probability and statistics [11]. - The course emphasizes minimal scaffolding, requiring students to write significantly more code compared to other AI courses [11].
不是视频模型“学习”慢,而是LLM走捷径|18万引大牛Sergey Levine
量子位· 2025-06-10 07:35
Core Viewpoint - The article discusses the limitations of AI, particularly in the context of language models (LLMs) and video models, using the metaphor of "Plato's Cave" to illustrate the difference between human cognition and AI's understanding of the world [6][30][32]. Group 1: Language Models vs. Video Models - Language models have achieved significant breakthroughs by using a simple algorithm of next-word prediction combined with reinforcement learning [10][19]. - Despite video data being richer than text data, video models have not developed the same level of complex reasoning capabilities as language models [14][19]. - Language models can leverage human knowledge and reasoning paths found in text, allowing them to answer complex questions that video models cannot [21][22][25]. Group 2: The "Cave" Metaphor - The "Plato's Cave" metaphor is used to describe AI's current state, where it learns from human knowledge but does not truly understand the world [29][32]. - AI's capabilities are seen as a reverse engineering of human cognition rather than independent exploration [33]. - The article suggests that AI should aim to move beyond this "shadow dependency" and interact directly with the physical world for true understanding [34][35]. Group 3: Future Directions for AI - The long-term goal for AI is to break free from reliance on human intermediaries, enabling direct interaction with the physical world [35]. - There is a suggestion that bridging different modalities (visual, language, action) could facilitate this exploration without needing to escape the "cave" [35].
完整版|谷歌创始人最新访谈,揭秘Gemini为什么突然变得这么强大?
3 6 Ke· 2025-05-26 00:49
Core Insights - Sergey Brin discussed Google's recent advancements in AI during an interview, highlighting the excitement around new features like virtual try-ons in Google Search and the ongoing work required to implement these functionalities [2][3] - The evolution of AI has shifted towards language models as the primary driving force, which was not as apparent 15 years ago, with significant improvements in model interpretability and safety [2][14] - Brin expressed optimism about Google's position in AI innovation, noting the company's readiness for transformation due to its experience with large-scale data and machine learning technologies [3][20] AI Development and Models - The focus on extending reasoning capabilities in AI models aims to allow them to think for longer periods, addressing the challenge of long-context inputs [3][17] - The architecture of different models shows surprising similarities, with a growing emphasis on post-training processes that enhance model capabilities through tool usage [3][16] - Gemini 2.5 Pro and Gemini 2.5 Flash represent significant advancements, with the former leading in most benchmarks and the latter being recognized for its speed and performance [3][21] Company Culture and Innovation - Google is undergoing a self-reinvention process to adapt to significant technological shifts, particularly in AI, which aligns with the company's historical focus on large-scale data and machine learning [3][19] - The company has experienced a notable acceleration in product development from 2024 to 2025, indicating a robust pipeline of innovations [3][20] - Brin emphasized the importance of maintaining a startup-like culture within Google to foster continuous innovation and adaptation in the rapidly evolving AI landscape [3][19]