Workflow
世界模型
icon
Search documents
LeCun世界模型出2代了!62小时搞定机器人训练,开启物理推理新时代
量子位· 2025-06-12 08:17
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 物理学正在走向人工智能—— Meta开源发布 V-JEPA 2 世界模型:一个能像人类一样理解物理世界的AI模型。 图灵奖得主、Meta首席AI科学家 Yann LeCun 亲自出镜宣传,并称: 我们相信世界模型将为机器人技术带来一个新时代,使现实世界中的AI智能体能够在不需要大量机器人训练数据的情况下帮助完成家务 和体力任务。 那什么是世界模型呢? 简单说,就是能够对真实物理世界做出反应的AI模型。 它应该具备以下几种能力: V-JEPA 2 (Meta Video Joint Embedding Predictive Architecture 2 ) 是首个 基于视频训练 的世界模型 (视频是关于世界信息丰富且 易于获取的来源) 。 它提升了动作预测和物理世界建模能力,能够用于 在新环境中进行零样本规划 和 机器人控制 。 理解:世界模型应该能够理解世界的观察,包括识别视频中物体、动作和运动等事物。 预测:一个世界模型应该能够预测世界将如何演变,以及如果智能体采取行动,世界将如何变化。 规划:基于预测能力,世界模型应能用于规划实现给定目标的行动序列。 ...
LeCun世界模型出2代了!62小时搞定机器人训练,开启物理推理新时代
量子位· 2025-06-12 08:16
Meta开源发布 V-JEPA 2 世界模型:一个能像人类一样理解物理世界的AI模型。 闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 物理学正在走向人工智能—— 图灵奖得主、Meta首席AI科学家 Yann LeCun 亲自出镜宣传,并称: 我们相信世界模型将为机器人技术带来一个新时代,使现实世界中的AI智能体能够在不需要大量机器人训练数据的情况下帮助完成家务 和体力任务。 那什么是世界模型呢? 简单说,就是能够对真实物理世界做出反应的AI模型。 它应该具备以下几种能力: V-JEPA 2 (Meta Video Joint Embedding Predictive Architecture 2 ) 是首个 基于视频训练 的世界模型 (视频是关于世界信息丰富且 易于获取的来源) 。 它提升了动作预测和物理世界建模能力,能够用于 在新环境中进行零样本规划 和 机器人控制 。 理解:世界模型应该能够理解世界的观察,包括识别视频中物体、动作和运动等事物。 预测:一个世界模型应该能够预测世界将如何演变,以及如果智能体采取行动,世界将如何变化。 规划:基于预测能力,世界模型应能用于规划实现给定目标的行动序列。 ...
被“网暴”两个月后,Yann LeCun 携最新世界模型杀回!小扎千万美元激励抢人,Meta AI 内部权利之争开始
AI前线· 2025-06-12 06:07
Core Viewpoint - Meta has launched its new "world model" V-JEPA 2, aimed at enhancing AI's physical reasoning capabilities for better understanding and predicting the physical world [1][3][11] Group 1: V-JEPA 2 Overview - V-JEPA 2 is described as a "realistic abstract digital twin" that enables AI to predict the consequences of its actions and plan accordingly [1][3] - The model is 30 times faster than Nvidia's Cosmos model and has been open-sourced for developers to access and integrate into various applications [1][6][5] - V-JEPA 2 builds on the previous V-JEPA model released by Meta, further improving understanding and prediction capabilities [4] Group 2: AI Capabilities - The model provides AI with three core abilities: understanding, predicting, and planning, allowing it to create realistic internal simulations [3][17] - V-JEPA 2 can perform reasoning without the need for labeled video segments, distinguishing it from existing generative AI systems like ChatGPT [3][4] Group 3: Applications and Impact - The model is designed for real-time spatial understanding in AI-driven technologies such as autonomous vehicles, warehouse robots, and drone delivery systems [3][5] - Meta anticipates that V-JEPA 2 will pave the way for AI to operate autonomously in unfamiliar environments, potentially impacting sectors like healthcare, agriculture, and disaster response [18][19] Group 4: Competitive Landscape - The release of V-JEPA 2 is seen as a critical milestone in Meta's long-term AI roadmap, especially in the context of increasing competition with OpenAI, Microsoft, and Google [11][13] - The growing importance of world models in AI research is highlighted, with other companies like Google DeepMind also exploring similar projects [19] Group 5: Leadership and Strategy - Yann LeCun, Meta's Chief AI Scientist, emphasizes the need for AI to build models of how the world operates rather than merely mimicking human text [8][9] - Meta's CEO Mark Zuckerberg is reportedly taking a more hands-on approach to AI development, including significant investments in AI training data and the formation of new teams focused on achieving "superintelligence" [13][14][15]
刚刚,LeCun亲自出镜,Meta推出新世界模型!
机器之心· 2025-06-12 00:53
Core Insights - Meta is actively pursuing advancements in artificial intelligence, particularly through the establishment of a "Super Intelligence Team" and the introduction of the V-JEPA 2 model, which focuses on video-based training for world modeling and predictive capabilities [2][3][4]. Group 1: Meta's AI Developments - Meta is forming a "Super Intelligence Team" led by Mark Zuckerberg, offering nine-figure salaries to attract talent for the development of general artificial intelligence [3]. - The newly launched V-JEPA 2 model is designed to enhance environmental understanding and predictive abilities, enabling zero-shot planning and robot control in unfamiliar environments [4][5]. - Yann LeCun, Meta's Chief AI Scientist, emphasizes that world models allow AI to understand and predict physical interactions without extensive trial and error, which can significantly impact various applications, including assistive technologies and personalized education [6]. Group 2: V-JEPA 2 Model Specifications - V-JEPA 2 consists of 1.2 billion parameters and is built on the Joint Embedding Predictive Architecture (JEPA), which has shown strong performance in handling images and 3D point clouds [8]. - The model improves upon its predecessor, V-JEPA, by enhancing action prediction and world modeling capabilities, allowing robots to interact with unfamiliar objects and environments [9]. - V-JEPA 2 demonstrates superior performance in various tasks, achieving 100% in planning and robot control tasks and significantly improving action anticipation and understanding benchmarks compared to previous models [12]. Group 3: Training and Performance - The training of V-JEPA 2 involves two phases: a pre-training phase using over 1 million hours of video and 1 million images, followed by action-conditioned training with minimal robot data [21][25]. - The model's ability to predict world states and plan actions is showcased through its performance in tasks such as grasping and placing objects, achieving success rates of 65% to 80% in new environments [26]. - Meta has introduced new benchmarks to evaluate models' understanding of physical interactions, revealing that while V-JEPA 2 ranks first in physical reasoning, there remains a significant gap compared to human performance [28][34]. Group 4: Future Directions - Meta plans to explore hierarchical JEPA models capable of learning and planning across multiple time and space scales, as well as multi-modal models that integrate various sensory inputs for enhanced predictive capabilities [36].
星尘智能来杰:具身智能 “超级助理” 如何走进真实世界? | Deep Talk
锦秋集· 2025-06-11 12:22
Core Viewpoint - The article presents the vision of Stardust Intelligence, led by founder Lai Jie, to create embodied intelligence that enhances human creativity and intelligence through advanced robotics, rather than merely replacing human jobs [2][4]. Group 1: Company Vision and Philosophy - Lai Jie emphasizes the importance of creating a new "incremental market" for embodied intelligence, positioning robots as "super assistants" that amplify human capabilities [2][4]. - The company aims to redefine intelligence not as the absence of mistakes but as the ability to adapt and learn from failures, akin to human problem-solving [4][5]. Group 2: Technical Innovations - Stardust Intelligence adopts a unique "rope drive" mechanism for its robots, which mimics biological tendons, allowing for better force perception and control compared to traditional methods [4][30]. - The company focuses on a "fast-slow brain" model architecture, where the fast system handles immediate reactions while the slow system manages higher-level planning, ensuring robust decision-making in real-world scenarios [5][26]. Group 3: Data Strategy and Learning - Stardust's approach to data collection emphasizes efficiency, aiming to reduce the amount of data needed for training tasks from 1,000 to just 20 by enhancing the model's transfer learning capabilities [5][45]. - The company believes in the importance of "imitation learning" and "random adaptability," allowing robots to learn from fewer examples and adapt to new tasks through trial and error [42][46]. Group 4: Market Positioning and Future Directions - Lai Jie envisions Stardust Intelligence as a company that will revolutionize the market by making robots affordable and practical for everyday use, particularly in domestic settings [22][24]. - The company is actively pursuing partnerships, such as with a nursing home, to implement robots in real-life scenarios, demonstrating their commitment to enhancing human life rather than replacing it [63][66]. Group 5: Long-term Vision - The ultimate goal is to create robots that can perform complex tasks, thereby unlocking new levels of human creativity and productivity, similar to how personal computers transformed information access [18][66]. - The relationship between embodied intelligence and world models is seen as symbiotic, where advancements in one area will enhance the other, leading to a more comprehensive understanding of both digital and physical realities [67][68].
Z Potentials|专访陈羽北,Aizip打破效率瓶颈,让AI进入真实产品,推动On-Device AI的未来革命
Z Potentials· 2025-06-11 02:21
Core Viewpoint - The article discusses the rapid evolution of AI technology and its applications, highlighting the challenges of energy consumption, model size, and learning mechanisms. Aizip, a company focused on on-device AI models, aims to overcome these efficiency bottlenecks and drive the integration of AI into everyday life [1]. Group 1: AI Efficiency and Innovation - Aizip's mission is to enhance energy efficiency, model efficiency, and learning efficiency in AI systems, moving from "usable" to "efficiently usable" AI [3][10]. - The company emphasizes creating the "smallest and most efficient" AI systems, contrasting with the mainstream focus on general artificial intelligence (AGI) [3][14]. - Aizip's approach is to support businesses that require AI capabilities but lack full-stack AI expertise, allowing them to focus on application development [3][32]. Group 2: Founder's Background and Vision - The founder, Chen Yubei, has a strong academic background in AI and has shifted from theoretical research to practical applications, driven by a desire to see AI implemented in real-world products [4][16]. - The founding of Aizip was catalyzed by the COVID-19 pandemic, which disrupted initial plans for postdoctoral research and prompted discussions about entrepreneurship [6][16]. - Aizip's team comprises experienced individuals with diverse backgrounds, emphasizing a culture of collaboration and long-term value over short-term gains [17][18]. Group 3: On-Device AI Revolution - The article predicts that over 50% of AI reasoning will occur on-device in the near future, driven by advancements in hardware and user demand for low-latency, privacy-focused AI products [30][31]. - Aizip's product line includes multi-modal perception models and language models, focusing on seamless integration into various devices to enhance user experience without overtly displaying AI functionality [22][23]. - The company aims to create a comprehensive AI model ecosystem compatible with mainstream hardware, facilitating easier integration for clients [34][36]. Group 4: Market Position and Future Outlook - Aizip positions itself as a foundational support for companies lacking the resources to build their own on-device AI teams, anticipating a growing market for such capabilities [32][34]. - The company has established partnerships with leading hardware manufacturers and has achieved recognition for its innovative AI products [38]. - Aizip's strategy focuses on gradual commercialization, prioritizing technology validation and model stability before scaling operations [35][36].
一个md文件收获超400 star,这份综述分四大范式全面解析了3D场景生成
机器之心· 2025-06-10 08:41
Core Insights - The article discusses the advancements in 3D scene generation, highlighting a comprehensive survey that categorizes existing methods into four main paradigms: procedural methods, neural network-based 3D representation generation, image-driven generation, and video-driven generation [2][4][7]. Summary by Sections Overview of 3D Scene Generation - A survey titled "3D Scene Generation: A Survey" reviews over 300 representative papers and outlines the rapid growth in the field since 2021, driven by the rise of generative models and new 3D representations [2][4][5]. Four Main Paradigms - The four paradigms provide a clear technical roadmap for 3D scene generation, with performance metrics compared across dimensions such as realism, diversity, viewpoint consistency, semantic consistency, efficiency, controllability, and physical realism [7]. Procedural Generation - Procedural generation methods automatically construct complex 3D environments using predefined rules and constraints, widely applied in gaming and graphics engines. This category can be further divided into neural network-based generation, rule-based generation, constraint optimization, and large language model-assisted generation [8]. Image-based and Video-based Generation - Image-based generation leverages 2D image models to reconstruct 3D structures, while video-based generation treats 3D scenes as sequences of images, integrating spatial modeling with temporal consistency [9]. Challenges in 3D Scene Generation - Despite significant progress, challenges remain in achieving controllable, high-fidelity, and physically realistic 3D modeling. Key issues include uneven generation capabilities, the need for improved 3D representations, high-quality data limitations, and a lack of unified evaluation standards [10][16]. Future Directions - Future advancements should focus on higher fidelity generation, parameter control, holistic scene generation, and integrating physical constraints to ensure structural and semantic consistency. Additionally, supporting interactive scene generation and unifying perception and generation capabilities are crucial for the next generation of 3D modeling systems [12][18].
让你的公司像大脑一样思考、连接与成长
3 6 Ke· 2025-06-09 11:51
Core Viewpoint - Companies should operate like a brain, focusing on prediction and adaptation to minimize unexpected outcomes and enhance performance [2][3][4] Group 1: Importance of Predictive Operations - The brain functions as a "prediction machine," constantly adjusting its judgments to align reality with expectations [3] - Companies that succeed are not necessarily the smartest but those with the most accurate "world model" that can quickly adapt to changes [2][8] Group 2: Training the Organizational "Brain" - Leaders must train the organization to reduce surprises, respond quickly, and evolve continuously [4] - Two approaches to training: a rigid method relying on control measures and a flexible method that embraces change and real-time learning [5] Group 3: Shared Understanding and Decision-Making - A unified "world model" is essential for all departments to avoid misalignment and wasted efforts [6][7] - Companies should collaboratively define their understanding of customers, competition, and internal challenges to ensure coherent decision-making [7] Group 4: Redesigning the Organization - Companies should adopt a neural network-like structure to enhance flexibility, intelligence, and error reduction [9] - Key practices include breaking down departmental silos, establishing rapid feedback mechanisms, decentralizing decision-making, treating failures as learning opportunities, and implementing flexible processes for growth [10][11][12][13][14]
李飞飞自曝详细创业经历:五年前因眼睛受伤,坚定要做世界模型
量子位· 2025-06-09 09:27
Core Viewpoint - The article emphasizes the importance of developing world models in AI, highlighting that spatial intelligence is a critical yet missing component in current AI systems. The establishment of World Labs aims to address this gap by creating AI models that truly understand the physical world [4][15][22]. Group 1: Importance of Spatial Intelligence - Li Fei-Fei's experience of temporarily losing her stereoscopic vision reinforced her belief in the necessity of spatial understanding for AI, akin to how language models require context to process text [3][4]. - The article discusses how current AI models, driven by large datasets, exhibit emergent behaviors that surpass initial expectations, yet still lack true spatial comprehension [9][10]. - The need for AI to reconstruct complete three-dimensional scenes from single images is identified as a key technological breakthrough that could revolutionize interactions with the physical world [25][39]. Group 2: World Labs and Its Mission - World Labs was founded not as a trend-following venture but as a continuation of the exploration of intelligence's essence, focusing on building AI that comprehends physical space [10][11]. - The mission of World Labs is to create AI models that can genuinely understand the physical world, which is essential for tasks like robotics, material design, and virtual universe exploration [15][24]. - The article highlights the collaboration between Li Fei-Fei and Martin Casado, emphasizing their shared vision of addressing the lack of world models in AI [17][19]. Group 3: Technological and Team Advantages - World Labs aims to leverage existing advancements in computer vision, such as Neural Radiance Fields (NeRF) and Gaussian Splatting, to push the boundaries of three-dimensional AI research [31][32]. - The company is assembling a top-tier interdisciplinary team that combines expertise in AI, computer graphics, and optimization algorithms to tackle the challenges of spatial intelligence [34][35]. - The article notes that the current approach contrasts with the fragmented efforts seen in the early development of large language models, suggesting a more unified strategy is essential for success [36][37].
对话智源研究院院长王仲远:AI正加速从数字世界走向物理世界
Core Insights - The rapid advancement of AI technology is shifting from digital to physical applications, with a focus on humanoid robots as practical tools rather than mere mascots [1][2] - The development trajectory of large models is moving towards multi-modal world models, which aim to enhance AI's understanding and interaction with the physical world [2][3] AI Technology Development - The performance of large language models is reaching a bottleneck, necessitating improvements through reinforcement learning, high-quality synthetic data, and activation of underutilized multi-modal data [1][2] - The introduction of the "Wujie" series of large models, including the Emu3 multi-modal world model, signifies a strategic shift towards understanding physical causal relationships [2][3] Embodied Intelligence - Humanoid robots are recognized for their long-term value due to their design compatibility with human environments and the availability of extensive human behavior data for model training [3][4] - The current limitations in data volume hinder the training of models that integrate both "big brain" and "small brain" functionalities, indicating a need for further development [4][6] Industry Trends - The focus on embodied intelligence is expected to prioritize applications in controlled environments, such as logistics and repetitive tasks, where safety and efficiency are paramount [3][4] - The concept of "big brain" and "small brain" integration is acknowledged as a potential future trend, but current data limitations prevent immediate implementation [4][5] AGI Development - The emergence of Agents in AI signifies a new phase where foundational models can support the development of various applications, akin to mobile apps in the internet era [5][6] - The industry is still in the early stages of embodied intelligence development, facing challenges similar to those encountered in the early days of AI large models [5][6]