Workflow
世界模型
icon
Search documents
Alex Wang“没资格接替我”,Yann LeCun揭露Meta AI“内斗”真相,直言AGI是“彻头彻尾的胡扯”
3 6 Ke· 2025-12-17 02:45
Core Viewpoint - Yann LeCun criticizes the current AI development path focused on scaling large language models, arguing it leads to a dead end and emphasizes the need for a different approach to achieve true AI capabilities [1][2]. Group 1: AI Development Path - LeCun believes the key limitation in AI progress is not reaching "human-level intelligence" but rather achieving "dog-level intelligence," which challenges the current evaluation systems centered on language capabilities [2]. - He advocates for the development of "world models" that can understand and predict the world, contrasting with mainstream models that focus on generating text or images [2][8]. - LeCun's new company, AMI, aims to pursue this alternative technical route, emphasizing cognitive and perceptual fundamentals rather than merely scaling existing models [2][7]. Group 2: Research and Open Science - LeCun stresses the importance of open research, arguing that true research must be publicly shared and scrutinized to avoid the pitfalls of insular corporate environments [5][6]. - He believes that allowing researchers to publish their work fosters better research quality and motivation, which is often overlooked in many industrial labs [6]. Group 3: World Models and Learning - The concept of world models involves creating abstract representations of the world to predict outcomes, rather than relying on pixel-level predictions, which are ineffective in high-dimensional data [8][10]. - LeCun emphasizes that effective learning requires filtering out unpredictable details and focusing on relevant aspects of reality, which is crucial for developing intelligent systems [10][22]. Group 4: Data and Training - LeCun highlights the vast difference in data requirements between language models and video data, noting that video data is richer and more valuable for learning due to its structural redundancy [18][19]. - He argues that relying solely on text data will never lead to human-level intelligence, as it lacks the necessary complexity and richness found in real-world data [19][25]. Group 5: Future of AI and AGI - LeCun expresses skepticism about the concept of "general intelligence," suggesting it is a flawed notion and that true progress will be gradual rather than sudden [30][32]. - He predicts that achieving "dog-level intelligence" will be the most challenging part of AI development, with significant advancements expected in the next 5 to 10 years if no unforeseen obstacles arise [32][34]. Group 6: Industry Trends and Company Direction - LeCun's departure from Meta and the establishment of AMI reflect a desire to pursue a different technological path amid a trend of companies focusing on large language models [1][48]. - He notes that the competitive environment in Silicon Valley often leads to a monoculture where companies pursue similar technological routes, which can stifle innovation [48].
数字科技产业观察 | 双周要闻(2025.12.02—12.16)
Mei Ri Jing Ji Xin Wen· 2025-12-16 10:45
Government Initiatives - The Ministry of Industry and Information Technology (MIIT) has revised the "Management Measures for Public Service Platforms for Industrial Technology," effective from December 5, 2025, focusing on key industries such as equipment, petrochemicals, steel, and artificial intelligence [1][1] - The National Development and Reform Commission, along with other ministries, has issued opinions to strengthen the construction of data element disciplines and digital talent teams, aiming to support the development of a digital economy and society [1][1] - The Ministry of Ecology and Environment has released guidelines for the construction of a product carbon footprint factor database to support the establishment of a carbon footprint management system [1][1] - MIIT is seeking public opinions on the "Comprehensive Standardization System Construction Guide for the Metaverse Industry (2026 Edition)," aiming to establish over 50 national and industry standards by 2030 [1][1] Local Actions - Shandong Province is promoting the metaverse as a new economic growth point, supporting cities like Jinan and Qingdao in building future industry pilot zones [1][1] - Jiangsu Province has established a Metaverse Standardization Technical Committee in Nanjing to fill the gap in the standardization system within the province [1][1] Industry Developments - The GPU leader, Moore Threads, has officially listed on the STAR Market, becoming the first domestic GPU stock, with a market capitalization of 305.5 billion yuan and an opening surge of 468.78% [3][3] - Google has integrated AI simultaneous translation into all its headphones and launched an experimental browser named "Disco," aiming to redefine web browsing experiences [3][3] Academic Insights - Academician Zhang Yaqin predicts that the future of large models will not exceed ten, emphasizing the integration of information, physical, and biological intelligence [4][4] - Academician Tan Jianrong stresses the importance of small models as the foundation for large models, advocating for a shift towards precision small models and industry-specific intelligent agents [4][4] Technology and Applications - The Ministry of Industry and Information Technology has granted approval for China's first batch of L3-level conditional autonomous driving vehicles, marking a significant step towards commercialization [6][6] - Mathematician Terence Tao and his team have solved the 50-year-old Erdős 1026 problem in just 48 hours using AI tools, showcasing the potential of AI in solving complex mathematical challenges [6][6]
穿越周期的早期投资:从赛道思维到认知红利|甲子引力
Sou Hu Cai Jing· 2025-12-16 10:45
Core Insights - The article discusses the shift from "track thinking" to "cognitive dividends" in early-stage investment, emphasizing the need for investors to develop a deep understanding of people, cycles, and non-consensus views in a crowded market [1][2]. Group 1: Investment Strategies - Investors are moving away from simply betting on popular sectors and are focusing on building their own cognitive models and project radars to identify unique opportunities [1][2]. - The importance of maintaining a "feel" for the market and establishing positive feedback loops during industry downturns is highlighted as key to capturing the next big opportunity [1][2]. Group 2: Key Investment Areas - Major investment themes identified include AI applications, AI-driven consumer electronics, embodied intelligence, and energy systems related to AI [8][9]. - The focus on AI hardware and AI for Science is emphasized, with a recognition of the rapid evolution of sectors like quantum technology and biomanufacturing [9][10]. Group 3: Cognitive Differentiation - Investors are encouraged to develop unique cognitive perspectives that differentiate their investment decisions, even when consensus exists around certain sectors [12][21]. - Examples of successful investments based on unique cognitive insights include early support for companies that later gained significant market traction, despite initial skepticism from the broader investment community [14][15]. Group 4: Project Sourcing and Influence - The role of personal influence and brand visibility in attracting quality projects is discussed, with a focus on how public engagement can enhance investment opportunities [25][26]. - The importance of continuous learning and sharing insights through platforms like podcasts and articles is noted as a way to build a network of potential investment opportunities [27][28]. Group 5: Future Outlook - The consensus among investors is to continue focusing heavily on AI-related investments, with specific attention to foundational AI technologies and applications [32][33].
许华哲,抓紧时间慢慢等具身的未来......
具身智能之心· 2025-12-16 00:02
作者丨 许华哲 编辑丨具身智能之心 本文已经得到许华哲博士的授权,未经允许,不得二次转载。 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 昨天看到了许华哲老师在社交媒体上的分享,关于数据、量产、本体和场景。类似的观点,今年IROS圆桌期间,许博也站在智能第一性原理上,将具身的未来发展 方向划分为欲望、先验和经验三个模块。 欲望。 在做智能体的时候,无论是物理的还是虚拟的,总觉得现在机器学习没有自己的学习欲望。我们可以设想一下,能不能给机器人一种自己的欲望? 经验。 经验是完成世界最终闭环的一种手段。有一天,在家里面看到一位维修师傅就是帮我们修煤气灶,他踩在一个梯子上拧一个东西,整个身体造型极为扭曲, 但他仍可以完美控制重心保持平衡,并且手上还可以做非常精细的操作。 ★ 这种思想也贯穿在后续的研发和学术探索上。 回想起几年前,我们还在讨论机器人什么时候能全地形走路,后来发现这个话题变成了"跑酷"、"跳舞"、"篮球"。这个变化速率让我知道这个事儿已经成了,如果 明年可以攀岩我并不吃惊。 但这极快的变化速率又显得格外不协调,因为我没在任何地方看到人形机器人真正服务人 ...
世界模型与自动驾驶:最新算法&实战项目(特斯拉、视频、OCC等)
自动驾驶之心· 2025-12-15 06:00
Core Viewpoint - The article introduces a new course focused on world models in autonomous driving, highlighting its relevance and the collaboration with industry experts to provide comprehensive training in this emerging field [2][4]. Course Overview - The course will cover various aspects of world models, including their historical development, current applications, and different methodologies such as pure simulation, simulation plus planning, and generative sensor input [7]. - It aims to equip participants with the necessary skills and knowledge to understand and implement world models in autonomous driving [12]. Course Structure - **Chapter 1: Introduction to World Models** This chapter will provide an overview of world models and their connection to end-to-end autonomous driving, discussing various streams and their applications in the industry [7]. - **Chapter 2: Background Knowledge of World Models** This chapter will delve into foundational knowledge, including scene representation, Transformer technology, and BEV perception, which are crucial for understanding world models [8]. - **Chapter 3: Discussion on General World Models** Focused on popular models like Marble and Genie 3, this chapter will explore their core technologies and design philosophies [9]. - **Chapter 4: Video Generation-Based World Models** This chapter will cover video generation algorithms, highlighting significant works and recent advancements in the field [10]. - **Chapter 5: OCC-Based World Models** This chapter will focus on OCC generation methods, discussing their applications in trajectory planning and end-to-end systems [11]. - **Chapter 6: World Model Job Topics** This chapter will provide insights into industry applications, challenges, and interview preparation for roles related to world models [11]. Target Audience and Learning Outcomes - The course is designed for individuals aiming to advance their knowledge in end-to-end autonomous driving and world models, with expectations to reach a level equivalent to one year of experience in the field [15]. - Participants will gain a deep understanding of key technologies such as video generation, OCC generation, BEV perception, and more, enabling them to apply these concepts in real-world projects [15].
中游智驾厂商正在快速抢占端到端人才......
自动驾驶之心· 2025-12-15 00:04
Core Viewpoint - The article discusses the technological anxiety in intelligent driving, particularly among mid-tier manufacturers, and highlights the anticipated growth in demand for end-to-end (E2E) and VLA (Vision-Language-Action) technologies in the coming year [2]. Group 1: Industry Trends - The mass production of cutting-edge technologies like end-to-end systems is expected to begin next year, with L2 technologies becoming more standardized and moving towards lower-tier markets [2]. - The total sales of passenger vehicles priced above 200,000 are around 7 million, but leading new forces account for less than one-third of this, indicating a slow adoption of end-to-end mass production models [2]. - The maturity of end-to-end technology is seen as a precursor to larger-scale production, with the advancement of L3 regulations necessitating urgent technological upgrades among mid-tier manufacturers [2]. Group 2: Recruitment and Training - There is a growing demand for positions related to end-to-end and VLA technologies, as many professionals are seeking to quickly learn these advanced skills [3]. - The article mentions the launch of specialized courses aimed at practical applications of end-to-end and VLA technologies, designed for individuals already working in the field [3][6]. - The courses will cover various modules, including navigation information application, reinforcement learning optimization, and production experiences related to diffusion and autoregressive models [3][6]. Group 3: Course Details - The end-to-end production course will focus on practical implementation, detailing key modules and offering seven practical exercises suitable for those looking to advance their careers [3][6]. - The VLA course will cover foundational algorithms and theories, including BEV perception and large language models, with practical applications based on diffusion models and VLA algorithms [6][11]. - The instructors for these courses are experienced professionals from top-tier companies and academic institutions, ensuring a high level of expertise in the training provided [5][8][13].
东方理工金鑫:如何找到自动驾驶与机器人统一的「空间语言」丨GAIR 2025
雷峰网· 2025-12-14 06:27
Core Viewpoint - The article discusses the emerging paradigm of "world models" in AI, emphasizing the importance of integrating physical rules and data-driven methods to enhance machine intelligence and its applications in industries like manufacturing and autonomous driving [2][4][5]. Group 1: Researcher and Team Insights - Researcher Jin Xin from Ningbo Oriental Institute of Technology is focusing on "embodied world models" for decision-making, collaborating with institutions like Shanghai Jiao Tong University and Tsinghua University [3]. - Jin's team is exploring a "hybrid" approach to building world models, combining explicit physical rules with data-driven methods to address complex phenomena [4]. Group 2: Applications and Industry Collaboration - The team is applying their methods in industrial manufacturing, collaborating with leading companies in Ningbo to validate their "factory world model" [5]. - The advancements in world models are seen as a significant leap in technology, with applications in autonomous driving, robotics, AIGC, AR, and VR [9]. Group 3: Space Intelligence Framework - The framework for space intelligence is divided into three parts: spatial perception, spatial interactivity, and spatial understanding, generalization, and generation [10][12][13][14]. - The process involves a "modeling-training" loop where AI agents are trained in simulated environments, leading to continuous optimization [18]. Group 4: Specific Projects and Innovations - The project "UniScene" focuses on generating driving scenarios, addressing the limitations of traditional data collection methods in the automotive industry [20][22]. - The "OmniNWM" project introduces a closed-loop mechanism for planning and generating future states based on trajectory inputs [42][44]. - The "InterVLA" dataset aims to provide first-person perspective data for robots, enhancing their interaction capabilities [46][57]. Group 5: Challenges and Future Directions - The article highlights the challenges in creating realistic world models, particularly in embedding complex physical rules and ensuring data quality [98][104]. - The research emphasizes a mixed approach, combining knowledge-based constraints with data-driven learning to improve the understanding of physical laws in AI models [106].
GAIR 2025 「数据&一脑多形」分论坛,激辩 AI 演进路径
雷峰网· 2025-12-14 06:27
Core Insights - The article emphasizes the transition of AI from "specialized" to "generalized" language understanding over the past decade, with the next key battle being the expansion of this generality from the realm of language to the physical world [1] Group 1: Data Paradigm Shift - Data is evolving from a traditional "resource" role to a more fundamental "cognitive foundation" and "value carrier" [3] - High-quality, structured, and logically coherent data is becoming essential for defining the cognitive boundaries and aligning the value of models [3][4] - The forum discussed building a more interpretable, credible, and evolutionary knowledge system amidst the data deluge, highlighting data as a core link driving intelligent evolution and harmonious coexistence with society [4] Group 2: One Brain, Many Forms - The "One Brain, Many Forms" paradigm is redefining how intelligence is constructed, moving beyond single models for specific tasks to a unified cognitive core that can dynamically generate various forms for different scenarios [5] - This approach aims to achieve a leap from "specialized intelligence" to "unified intelligence," allowing the same "brain" to understand language, interpret visuals, and manipulate entities while sharing knowledge across different forms [5] Group 3: Embodied Intelligence and Data Collection - The founder of Noitom Robotics, Dr. Dai Ruoli, highlighted the high demand for quality data in the field of humanoid robots and embodied intelligence, emphasizing the relationship between data scale, quality, and model capability [10] - Dr. Dai identified three structural challenges in remote operation as a data acquisition method, pushing the industry to explore more universal and scalable data acquisition paradigms [11][12] - The concept of a "data pyramid" was introduced, stressing the importance of understanding the core value of data at different levels to create sustainable engineering and business paths [12] Group 4: Future of Embodied Data - The CEO of Jishudai Iteration, Tong Xianqiao, predicted an explosive growth in embodied data volume in the coming years, positioning "embodied data services" as a significant opportunity in the robotics sector [15] - Current data collection methods were categorized into two paths: real machine end and simulation end, focusing on various techniques for data acquisition [16] - A platform design approach was proposed to enhance data collection efficiency and optimize deployment, introducing the concept of AI agents for automatic annotation and resource management [17] Group 5: One Brain, Many Forms Discussions - The forum on "One Brain, Many Forms" featured discussions on the development of embodied intelligence and the integration of world models, with participants emphasizing the ongoing exploration phase in the industry [45][46] - The challenges of achieving a universal controller were discussed, with insights on the differences in performance based on hardware capabilities and algorithmic approaches [47] - The panel concluded with reflections on the future of embodied intelligence, highlighting the gap between innovative ideas and practical applications in the industry [48]
“世界模型”竞赛升级:Runway推出GWM-1,实时交互可持续数分钟之久
硬AI· 2025-12-13 12:45
Core Insights - Runway aims to evolve from being a "special effects supplier" in the film industry to becoming an "AI architect" in the physical world [2][20] - The company has launched its first General World Model (GWM-1), entering the "world simulation" arena dominated by giants like Google and Nvidia [2][20] - GWM-1 is designed to understand physical laws, geometric structures, and environmental dynamics, focusing on "coherence" and "interactivity" [2][5] GWM-1 Breakdown - The world model allows AI to simulate the mechanisms of the real world without traversing all possible scenarios, enabling reasoning, planning, and action [5] - GWM-1 consists of three autoregressive models tailored for different domains: GWM-Worlds, GWM-Robotics, and GWM-Avatars, all built on the latest Gen-4.5 base model [5][6] GWM-Worlds - GWM-Worlds provides an interactive digital environment exploration interface, allowing users to intervene in real-time and predict subsequent events [8] - The model generates environments at 24fps and 720p resolution, maintaining coherence in long sequences of motion [8] GWM-Robotics - GWM-Robotics addresses the challenge of acquiring real data for extreme weather and unexpected obstacles by generating high-quality synthetic data [10][11] - This approach significantly reduces training costs and helps predict compliance risks before deploying robots in the real world [11] GWM-Avatars - GWM-Avatars integrates video generation with voice, enabling digital avatars to engage in long, continuous conversations without quality loss [14][15] - If successful, this technology could disrupt customer service and online education sectors [15] Base Model Evolution and Computational Power - Runway has upgraded its Gen-4.5 model to enhance native audio and multi-camera editing capabilities, supporting video generation of up to one minute [18] - The company has partnered with CoreWeave to utilize Nvidia's cloud infrastructure for model training and inference, addressing the computational demands of world simulation [18] Strategic Expansion - Runway's strategy is rapidly expanding from creative tools in film to robotics simulation, but it faces stiff competition from established players like Google and Nvidia [19][20] - The ability to leverage GWM-1 to prove its capabilities beyond a special effects supplier will be crucial for the company's valuation growth [20]
专家指具身智能大规模落地仍处于早期阶段
Zhong Guo Xin Wen Wang· 2025-12-13 12:33
Core Insights - The current state of embodied intelligence has achieved breakthroughs in both cognitive and physical intelligence, but large-scale implementation is still in its early stages [1][2] - The future direction of embodied intelligence is characterized by ongoing competition and rapid evolution [1] Group 1: Key Issues in the Industry - The first core issue is the debate over model pathways, specifically whether large model paradigms are applicable to robotics. While large models have seen success in language, image, and video domains, it remains unproven if the same paradigm can be directly transferred to robot control [1] - The second core issue is the contention over data training paradigms. Data continues to be a critical bottleneck limiting the leap in robotic capabilities, with various approaches such as mixed data, multimodal data, and world model generation data being explored [1] - The third core issue is the debate over the form factor of robots, questioning whether humanoid robots represent a "true demand." Companies like Tesla and Figure AI are pursuing a fully humanoid approach, while several Chinese companies have introduced "wheel-arm composite robots" this year, emphasizing "engineering feasibility" for scalable commercial applications in the short term [1] Group 2: Future Development Paths - There is a consensus in the industry that enhancing robots' generalization capabilities using large models is essential, but effective application of large models in robotic systems still involves multiple technical pathways [2] - Looking ahead, the introduction of world models based on visual-language-action models (VLA) is expected to significantly enhance the capabilities of large models in robotics by leveraging their understanding, prediction, and reasoning abilities regarding the physical world [2]