世界模型
Search documents
图灵奖得主 Yann LeCun:大模型是“死胡同”,下一步押在哪一条路?
3 6 Ke· 2025-11-28 01:43
Core Insights - Yann LeCun, a Turing Award winner, announced his departure from Meta to establish a new company focused on Advanced Machine Intelligence (AMI), marking a significant shift in his career and the AI landscape [1][2] - LeCun criticizes large language models (LLMs), labeling them as a "dead end" for achieving human-like intelligence, emphasizing their lack of real-world understanding and limitations in reasoning and action [3][4] Group 1: Critique of Large Language Models - LeCun argues that while LLMs perform well in language tasks, they do not possess true understanding of the world, lacking common sense and causal reasoning [5][6] - He highlights that the performance of LLMs is reaching a saturation point, where increasing model size does not equate to enhanced intelligence [6][7] - The training data and computational costs are approaching their limits, leading to diminishing returns in understanding [7][8] - LLMs are described as being unable to plan or take action effectively, with LeCun providing examples of how human-like intelligence involves more than just language skills [12][13] Group 2: The Concept of World Models - LeCun proposes that the next generation of AI should focus on building "world models" that allow AI to understand and interact with the physical world [14][15] - He introduces the Joint Embedding Predictive Architecture (JEPA) as a new learning paradigm that contrasts with LLMs by enabling AI to learn from multi-modal inputs and develop an internal representation of the world [16][17] - JEPA emphasizes the importance of action and planning, moving beyond mere language processing to a more holistic understanding of the environment [18][19] Group 3: Diverging Paths in AI Development - Both LeCun and former OpenAI chief scientist Ilya Sutskever are questioning the current trajectory of AI, but they propose different solutions: LeCun focuses on world models, while Sutskever emphasizes safety and control in AI systems [25][26] - The industry is witnessing a shift towards new architectures and approaches, as evidenced by significant investments and developments in embodied intelligence and robotics [34][35] - The future of AI is seen as a marathon rather than a sprint, with both LeCun and Sutskever acknowledging that their proposed directions will take years to mature [38][40] Group 4: Implications for Entrepreneurs and Developers - LeCun's transition signals that larger models do not necessarily equate to better intelligence, highlighting the need for architectural innovation [41] - There are opportunities in vertical applications, particularly in fields requiring physical interaction, such as robotics and autonomous driving [42] - The importance of open-source development is emphasized, as LeCun's new company will continue to support this approach, allowing smaller teams to contribute to new paradigms [43]
理想披露了一些新的技术信息
自动驾驶之心· 2025-11-28 00:49
Core Insights - The article discusses the advancements and challenges faced by Li Auto in the development of its autonomous driving technology, particularly focusing on the end-to-end model and VLA (Vision-Language-Action) integration [2][5][9]. Group 1: Model Performance and Data Utilization - The performance improvement of end-to-end models slows down after reaching a certain amount of training data, specifically after 10 million clips, where the model's MPI (Miles Per Interaction) only doubled in five months [5]. - To enhance model performance, Li Auto adjusted the training data mix, increasing the quantity of generated data, including corner cases, and implementing manual rules for safety and compliance in special scenarios [5][9]. Group 2: VLA Integration and Decision-Making - The introduction of VLA aims to enhance the decision-making capabilities of the end-to-end model, addressing issues such as illogical behavior, lack of deep thinking in decision-making, and insufficient preventive judgment based on scenarios [5][6]. - VLA incorporates spatial intelligence, linguistic intelligence, and action policy, allowing the model to understand and communicate spatial information effectively, and generate smooth driving trajectories using diffusion models [6][9]. Group 3: Simulation and Testing Efficiency - Li Auto upgraded its model evaluation methods by utilizing a world model for closed-loop simulation and testing, significantly reducing testing costs from 18.4 per kilometer to 0.53 per kilometer [9][11]. - The closed-loop training framework AD-R1 was introduced, allowing for efficient data management and reinforcement learning, with high-value data being processed through a series of steps back to the cloud platform [11][12]. Group 4: Computational Power and Resources - Li Auto's total computational power is 13 EFLOPS, with 3 EFLOPS dedicated to inference and 10 EFLOPS for training, utilizing 50,000 training and inference cards [13]. - The emphasis on inference power is crucial in the VLA era, as it is necessary for generating simulation training environments [13].
从游戏工厂到空间智能仿真:混元 3D 为何是腾讯 AI 的“侧翼突围”
AI前线· 2025-11-27 04:02
Core Insights - Tencent's "Hunyuan 3D" has accelerated its global outreach by launching an international version of its creative engine and achieving over 3 million downloads of its open-source model, marking a significant step in its AI strategy [2][3][21] - Tencent's unique position as a technology company lies in its combination of massive 3D demand from various sectors, mature multi-modal capabilities of its Hunyuan model, and a comprehensive distribution network through WeChat, QQ, and Tencent Cloud [3][4] Group 1: Business and Technology Integration - The traditional 3D industry faces challenges of high costs and long production times, with art costs in game development often accounting for 50%-80% of total expenses, and 3D asset creation being the most resource-intensive [6][7] - Hunyuan 3D aims to address these issues by enhancing the efficiency of 3D asset production and solving scene-level construction problems through two main technical lines [8][9] - The integration of Hunyuan 3D into Tencent's internal game projects has shown promising results, significantly reducing the time required to create 3D assets from days to mere hours [12][14] Group 2: Market Applications and Expansion - Hunyuan 3D's applications extend beyond gaming, with over 150 companies across various industries, including e-commerce, film, advertising, and 3D printing, utilizing its models to enhance production efficiency [25][27] - The technology has enabled a shift in consumer 3D printing, allowing users to generate personalized models with minimal expertise, thus expanding the market [26] - In advertising and content creation, Hunyuan 3D is poised to transform how brands engage with consumers by moving from static displays to interactive experiences [27][29] Group 3: Strategic Vision and Competitive Edge - Tencent's AI strategy focuses on building ecological barriers rather than merely scaling operations, emphasizing quality, controllability, and cost-effectiveness as foundational capabilities [31][32] - The company has achieved recognition for its Hunyuan image model, which topped global rankings, indicating its leadership in multi-modal technology [31] - Tencent's approach to 3D generation is characterized by a commitment to understanding industry pain points and fostering an ecosystem that supports sustainable growth [39][40]
没有身体就没有AGI!Hillbot苏昊对谈千寻高阳:具身智能泡沫很大但进展真实
量子位· 2025-11-27 03:00
Core Viewpoints - The discussion emphasizes that embodied intelligence is essential for achieving general artificial intelligence (AGI) [2][19] - The path to AGI requires physical interaction with the environment, which is facilitated by embodied intelligence [21][23] Group 1: Insights from Experts - Su Hao asserts that without embodied intelligence, there can be no general physical intelligence or general intelligence [2][16] - Gao Yang highlights that scaling data is crucial for solving problems in embodied intelligence, indicating that the essence of the challenge remains unchanged [3][10] - Both experts agree that embodied intelligence is a key entry point for understanding AGI [3][4] Group 2: Challenges and Opportunities - The conversation addresses the technical bottlenecks in the evolution of embodied intelligence and the structural advantages China has in this field [7][24] - The experts discuss the importance of real-world data for training models, with China having a significant advantage in data iteration efficiency compared to the U.S. [27][28] - They note that the integration of hardware and software design is critical for the success of embodied intelligence [26][30] Group 3: Future Predictions - Predictions indicate that the next significant breakthrough in embodied intelligence may occur within the next 2-3 years, particularly in the development of embodied models akin to GPT-3.5 [41][39] - The experts believe that achieving AGI will be a continuous process involving multiple breakthroughs rather than a single event [38][40] - The discussion concludes that the current state of embodied intelligence is characterized by both significant progress and notable hype [31][32]
第八届 GAIR 全球人工智能与机器人大会,首批嘉宾公布
雷峰网· 2025-11-27 00:28
Core Viewpoint - The article discusses the evolution of artificial intelligence (AI) from its early days to the present, highlighting the upcoming GAIR 2025 conference as a pivotal event for the future of AI and robotics, focusing on the integration of large models and multi-modal fusion [2][4]. Group 1: Historical Context - The first GAIR conference was held in 2016, initiated by prominent figures in the AI field, marking a significant moment in AI history [2]. - Over the past nine years, GAIR has documented the high points of the global AI industry, transitioning into a new era characterized by large, complex models [3]. Group 2: Future Directions - By 2025, AI is expected to transition from "technological breakthroughs" to "value cultivation," with a focus on multi-modal integration and the restructuring of computational power industry rules [4]. - The upcoming GAIR 2025 conference will feature discussions on cutting-edge topics such as large models, embodied intelligence, AI computing power, world models, and AI hardware, reflecting the collaborative future of academia and industry [4]. Group 3: Conference Details - The GAIR 2025 conference will take place on December 12-13 at the Sheraton Hotel in Shenzhen, featuring three thematic forums and two closed-door meetings [4]. - The event is co-hosted by GAIR Research Institute and Lei Feng Network, with notable figures such as Academician Gao Wen and Professor Zhu Xiaorui leading the conference [4]. Group 4: Notable Participants - The first batch of prominent speakers includes leaders from various institutions, such as Tang Zhimin, Yang Qiang, and Guo Yike, who will contribute to discussions on the future of AI [5][8][10].
闭环训练终于补上了!AD-R1:世界模型端到端闭环强化学习新框架(澳门大学&理想等)
自动驾驶之心· 2025-11-27 00:04
Core Insights - The article discusses the advancements in autonomous driving through the introduction of the AD-R1 framework, which utilizes an Impartial World Model to address the "optimistic bias" found in traditional world models [2][3][57] - The framework allows for closed-loop reinforcement learning, enabling autonomous vehicles to learn from imagined failures, thereby improving safety and decision-making capabilities [9][57] Group 1: Background and Challenges - End-to-end autonomous driving has transformed the industry, but challenges remain, particularly with long-tail event failures due to distribution shifts [6] - Traditional reinforcement learning methods rely on external simulators, which have limitations such as simulation-to-reality gaps and lack of interactivity [6][9] - The need for a paradigm shift towards learning 3D/4D world models as high-fidelity generative simulators is emphasized [6] Group 2: Optimizing World Models - The AD-R1 framework introduces a new approach to mitigate the optimistic bias in world models, which often fail to predict negative outcomes [2][7] - The Impartial World Model (IWM) is designed to accurately reflect the consequences of both safe and unsafe behaviors, enhancing the reliability of predictions [3][10] - A counterfactual synthesis pipeline is implemented to generate a diverse training dataset that includes reasonable collision and lane deviation scenarios [3][10] Group 3: Experimental Results - The IWM significantly outperforms traditional models in risk prediction tasks, demonstrating its ability to accurately foresee failures [47][48] - The application of the AD-R1 framework leads to notable improvements in safety and performance metrics across various baseline models, with absolute increases in planning decision metrics (PDMS) of 1.7% and 1.1% [49] - Ablation studies reveal that the introduction of counterfactual synthesis and model-level optimizations are critical for enhancing causal fidelity and overall performance [51][52] Group 4: Future Directions - Future research may focus on generating counterfactual failure samples from unlabeled data to reduce reliance on high-precision annotations [57] - Expanding the framework to more complex multi-agent interaction scenarios could further enhance the robustness of autonomous driving systems in long-tail events [57]
北京人形机器人!WoW:200万条数据训练的全知世界模型
具身智能之心· 2025-11-27 00:04
Core Insights - The article emphasizes the necessity of large-scale, causally rich interaction data for developing world models with true physical intuition, contrasting with current models that rely on passive observation [2][3] Group 1: WoW Model Overview - WoW is a generative world model trained on 2 million robot interaction trajectories, featuring 14 billion parameters [2] - The model's understanding of physical laws is probabilistic, leading to random instability and physical illusions [2] - The SOPHIA framework is introduced to evaluate the physical plausibility of generated results and guide the model towards physical reality through iterative language instructions [2] Group 2: Evaluation and Performance - WoWBench benchmark was created to systematically assess the model's physical consistency and causal reasoning capabilities [3] - WoW achieved leading performance in both manual and automated evaluations, particularly excelling in adherence to physical laws (80.16%) and instruction comprehension (96.53%) [3] - The research provides solid evidence that large-scale real-world interactions are essential for cultivating AI's physical intuition [3] Group 3: Live Event and Discussion - A live session is scheduled to discuss the latest open-source embodied world model WoW 1.0, covering trends in world model development and breakthroughs in causal and physical consistency [7] - Key highlights include the architecture of agents that imagine, act, and reflect, as well as practical application scenarios [7]
“AI主流发展路线已经遇到瓶颈”
第一财经· 2025-11-26 09:52
Core Insights - The main argument presented by Ilya Sutskever is that the current mainstream AI development path has reached a bottleneck, marking the end of the scaling era and a return to a research-focused paradigm [4][5]. Group 1: AI Development Phases - Sutskever identifies three phases in AI research: from 2012 to 2020 was the research era, from 2020 to 2025 was the scaling era, and now the field is transitioning back to a research era due to diminishing returns from scaling [4]. - He emphasizes that while computational power has increased significantly, it no longer guarantees better performance, leading to a blurred line between scaling and computational waste [4]. Group 2: Generalization and Model Limitations - A fundamental issue in the pursuit of AGI is the poor generalization ability of large models compared to humans [5]. - Sutskever points out that current models perform well on various evaluations but often make simple mistakes, suggesting that the training data may be too narrow, which disconnects evaluation performance from real-world performance [6]. Group 3: Emotional Intelligence in AI - Sutskever proposes that current AI may lack emotional intelligence, which could serve as a guiding value function, essential for effective decision-making [7]. - He draws parallels with humans who have lost emotional processing abilities, indicating that emotions play a crucial role in decision-making and could be a missing element in AI development [7]. Group 4: Alternative Perspectives in AI - Yann LeCun, a Turing Award winner, criticizes the limitations of large language models (LLMs), arguing they cannot perform complex reasoning and are merely statistical models [8]. - LeCun advocates for "world models" that learn from visual information, akin to how young animals learn, as a more promising direction for AI development [8][9]. - Fei-Fei Li also emphasizes the importance of building world models that can understand spatial relationships and interactions, suggesting a need for a new AI paradigm that incorporates generative, multimodal, and interactive capabilities [9]. Group 5: Industry Consensus - There is a lack of consensus in the AI industry regarding the future direction, but it is clear that the era of merely increasing computational power is over, necessitating a reevaluation of the paradigms that will lead to AGI [9].
蔚来汽车
数说新能源· 2025-11-26 05:58
Core Viewpoint - The company has shown significant growth in electric vehicle deliveries and financial performance, driven by new product launches and cost reduction strategies, positioning itself for continued expansion in the market [1][4][5]. Delivery and Sales Performance - In Q3, the company delivered 87,071 smart electric vehicles, a year-on-year increase of 40.8% [1]. - October deliveries reached 40,397 units, marking a 92.6% year-on-year growth and setting a new monthly delivery record for three consecutive months [1]. - Q4 delivery guidance is set at 120,000 to 125,000 units, representing a year-on-year increase of 65.1% to 72% [1]. Financial Performance - Total revenue for Q3 was 21.8 billion RMB, a year-on-year increase of 16.7% [4]. - Vehicle sales revenue was 19.2 billion RMB, up 15% year-on-year, while other sales reached 2.6 billion RMB, a 31.2% increase [4]. - The gross margin for vehicles improved to 14.7%, up from 13.1% year-on-year, attributed to reduced material costs [4][5]. Cost Management and Efficiency - The company achieved a non-GAAP operating loss of 3.5 billion RMB, a reduction of 32.8% year-on-year [5]. - R&D expenses decreased by 28% year-on-year to 2.4 billion RMB, reflecting organizational optimization [4][5]. - The company reported positive operating cash flow and free cash flow for the quarter, supported by an 11.6 billion USD equity financing completed in September [5]. Product Development and Technology - The company launched two new large three-row electric SUVs, ONVO L90 and the new ES8, which received strong market recognition [1]. - The introduction of the world's first world model (NWM) enhances the company's smart driving capabilities [2]. - Upcoming software updates, including COCONUT 2.1.0, aim to improve driving experiences with advanced models [2]. Market Strategy and Expansion - The company operates a comprehensive sales and service network with 172 NIO centers and 3,641 battery swap stations globally [3]. - The company is focusing on expanding its presence in international markets, with plans to introduce new models at competitive price points [16]. - The strategy includes a phased approach to market entry, prioritizing the Firefly brand for overseas expansion [16]. Future Outlook - The company aims for a gross margin of 20% by 2026, driven by high-margin models and cost control measures [10]. - Management expresses confidence in achieving quarterly breakeven in Q4 despite potential impacts from subsidy changes [6]. - The company plans to maintain R&D spending at approximately 2 billion RMB per quarter while ensuring long-term competitiveness [10].
具身智能无共识,就是最好的共识
3 6 Ke· 2025-11-25 23:32
Core Insights - The complexity of embodied intelligence emphasizes that it is sculpted through numerous trials, conflicts, and harmonizations rather than a single correct path [1][3] - The lack of consensus in the industry is seen as an opportunity for innovation and flexibility, allowing diverse teams to explore different technical routes without being constrained by established standards [3][4] Industry Perspective - The absence of consensus breaks the monopoly of a single technical route, preventing the industry from falling into "path dependency" traps [3] - This state of "no consensus" provides opportunities for small and medium enterprises, startups, and cross-industry players to enter the market without adhering to existing technical standards [3] - The rapid iteration of technology in the interdisciplinary field of embodied intelligence suggests that premature consensus could hinder breakthroughs [3] Signals for Future Development - **Signal 1: World Models Are Not Yet Sufficient** The current world models, while valuable for prediction, cannot serve as a universal solution for embodied intelligence due to their reliance on human behavior data, which is not directly applicable to robotic operations [4][5] - **Signal 2: Need for Specialized Models** There is a growing consensus among companies to develop specialized models for embodied intelligence, focusing on actions rather than language, to better adapt to the physical world [6][7] - **Signal 3: Innovation from the Ground Up** The applicability of the Transformer architecture in embodied intelligence is being questioned, with suggestions to explore new architectures that prioritize direct interaction between vision and action [7][8] - **Signal 4: Data as Fuel** Data is recognized as essential for embodied intelligence, but there is no unified approach on the types of data to use, leading to a strategy of multi-source integration based on specific task requirements [9][10] - **Signal 5: Growing Demand for Data** As embodied intelligence penetrates more complex scenarios, the demand for data is increasing in terms of quantity, quality, and variety, necessitating a more comprehensive approach to data collection [11][13][14]