Workflow
世界模型
icon
Search documents
一句话,就能创造出随便乱逛的3D世界!
自动驾驶之心· 2025-11-14 00:04
Core Insights - The article discusses the launch of Marble, a world model developed by WorldLabs, which allows users to create immersive 3D environments using a single image or text prompt [2][3][7]. Group 1: Product Features - Marble enables the generation of persistent, downloadable 3D environments, distinguishing it from other real-time models [28]. - Users can upload 2D images or 3D models (with a fee) to generate worlds, achieving high realism akin to AAA video games [14][16]. - The platform includes AI-native editing tools and a mixed 3D editor, allowing users to construct spatial frameworks and fill in visual details [31]. Group 2: User Experience - The initial testing phase showed impressive results, with the ability to create interactive 3D scenes from a single image [32]. - Users can input multiple images or short videos to create more accurate 3D worlds, enhancing the creative process [48]. - The editing process is iterative, allowing users to modify generated worlds extensively, from minor adjustments to major structural changes [49][50]. Group 3: Pricing and Accessibility - Marble offers three pricing tiers, with the highest tier costing $95 per month for generating up to 75 worlds, while the free version allows for 4 worlds [83][84]. - The Pro version is available for the first month at just $1, with standard pricing at $20 per month [85]. Group 4: Future Implications - The article emphasizes that Marble represents a significant step towards achieving spatial intelligence in AI, which is expected to unlock new applications in simulation and robotics [70][71]. - The integration of interactive capabilities in future world models is highlighted as a key opportunity for enhancing user engagement and application [69].
不用术语看懂世界模型:从日常预测到自动驾驶
自动驾驶之心· 2025-11-14 00:04
Group 1 - The core concept of the article is the definition and function of the "world model," which predicts future scenarios based on past sensory data, similar to how humans anticipate events in daily life [2][3][30] - The world model operates by taking various forms of input, such as images, sounds, and sensor data, and outputs predictions about future states, emphasizing the importance of recognizing patterns and making forecasts [4][30] - The distinction between world models and neural networks is highlighted, where neural networks serve as tools for recognition and imitation, while world models are the core that enables prediction and understanding [5][10][30] Group 2 - The article discusses the limitations of creating a "universal" world model due to the vast differences in rules and requirements across various scenarios, leading to the necessity for specialized models [11][12][30] - Various specialized world models are introduced, including video generation, music generation, game, and industrial production models, each focusing on specific domains to achieve precise predictions [12][14][18][30] - The automatic driving world model is described as the most stringent type, as its predictions directly impact safety, requiring rapid response times and high accuracy [18][22][30] Group 3 - The VLA model is presented as an enhanced version of the automatic driving world model, incorporating language logic to improve the prediction of actions based on user commands and traffic rules [23][26][30] - The article concludes that the future of world models lies in becoming more specialized rather than universal, focusing on improving prediction accuracy and speed in specific scenarios [29][30]
LeCun在Meta的最后一篇论文
量子位· 2025-11-13 11:52
Core Insights - The article discusses the introduction of LeJEPA, a self-supervised learning method developed by Yann LeCun, marking his farewell from Meta [2][3][4]. - LeJEPA aims to address the representation collapse issue in traditional JEPA frameworks by utilizing isotropic Gaussian embeddings and introducing SIGReg regularization to enhance model generalization [5][6]. Group 1: LeJEPA Overview - LeJEPA is based on isotropic Gaussian embeddings, which effectively mitigate the representation collapse problem and significantly improve model generalization capabilities [5]. - The traditional JEPA framework often encounters representation collapse, where models map all inputs to a single point, hindering the capture of semantic differences [6]. Group 2: Impact of Embedding Distribution - The study analyzed the impact of embedding distribution on bias and variance through ordinary least squares regression, revealing that isotropic Gaussian distribution minimizes both during training [8][9]. - Isotropic Gaussian distribution ensures lower bias and variance compared to non-isotropic distributions, enhancing stability and accuracy in downstream tasks [9][11][13]. Group 3: SIGReg Regularization - SIGReg (Sketched Isotropic Gaussian Regularization) is introduced as a method to achieve distribution matching, transforming the problem into a hypothesis testing framework [15][17]. - It employs a combination of univariate directional tests and Epps-Pulley tests to assess the match between the embedding distribution and the target isotropic Gaussian distribution [16][17]. Group 4: High-Dimensional Challenges - SIGReg addresses computational challenges in high-dimensional spaces by combining SIGReg and predictive loss, ensuring efficient and stable training through mini-batch training [19][21]. - The total loss in LeJEPA is a weighted sum of SIGReg loss and predictive loss, with a hyperparameter λ to balance their contributions [22]. Group 5: Experimental Validation - Extensive experiments on large architectures, including ViT, ConvNeXt, ResNet, MaxViT, and Swin Transformer, demonstrated that LeJEPA outperforms existing methods while maintaining training simplicity and robustness [20][23]. - In domain-specific datasets like Galaxy10 and Food101, LeJEPA surpassed DINOv2-based transfer learning methods when pre-trained directly on target data [24]. Group 6: JEPA Framework Evolution - JEPA (Joint-Embedding Predictive Architecture) has evolved over three years since its introduction by LeCun, focusing on enhancing model expressiveness and reasoning capabilities through joint prediction methods [31][28]. - Unlike generative models, JEPA captures the dependencies between x and y without explicitly generating predictions for y [32]. Group 7: Future Directions - Although LeJEPA signifies the end of LeCun's research at Meta, it does not mark the conclusion of JEPA's development, as LeCun is reportedly raising funds to establish a startup focused on world models [72][71]. - LeCun's departure from Meta, while not entirely graceful, reflects a significant period of achievement in AI research, contributing to the field's advancement [74][79].
李飞飞的世界模型来了,一句话生成3D世界,AI 真的开始理解现实了
3 6 Ke· 2025-11-13 11:42
Core Insights - The launch of Marble by World Labs marks the first public product in the realm of world models, showcasing significant advancements in spatial intelligence and AI capabilities [1][2][3] Group 1: Marble's Core Capabilities - Marble features three main capabilities: multimodal generation, AI-native world editing, and a practical production workflow [1] - It can reconstruct a complete 3D world from various inputs, including text, images, and videos, allowing for a seamless creative process [4][7] - Users can edit generated worlds similarly to real scenes, enabling continuous refinement and expansion of 3D environments [13][14] Group 2: Applications and Integration - Marble allows for the export of generated worlds into various formats compatible with industry-standard tools like Unreal, Unity, and Blender, facilitating integration into game and film production workflows [15][17] - The platform supports high-quality rendering and video generation, enhancing the usability of created worlds in real-world applications [18][19] Group 3: Theoretical Foundations and Future Implications - The development of Marble is rooted in the concept of spatial intelligence, which is essential for AI to interact with the physical world [20][21] - A mature world model must possess generative, multimodal, and interactive capabilities, which are foundational for future advancements in robotics and scientific research [22][23][24] - Marble's release signifies a step towards achieving comprehensive spatial intelligence, paving the way for future applications in automation and simulation [27]
AI界巨震!图灵奖得主Yann LeCun即将离职Meta,投身「世界模型」创业
机器人圈· 2025-11-13 10:40
Core Viewpoint - The departure of Yann LeCun from Meta signifies a major shift in the AI landscape, highlighting internal strategic disagreements and a pivot in Meta's AI development approach [2][3][4]. Group 1: Departure and Strategic Shift - Yann LeCun, a prominent figure in AI and Meta's Chief AI Scientist, is leaving the company after 12 years, marking a formal split with CEO Mark Zuckerberg over AI strategy [2][3]. - The decision to leave was foreshadowed by increasing disagreements with Meta's management regarding the AI development roadmap and company strategy [3][4]. - Meta's internal restructuring has shifted focus from long-term foundational research led by LeCun's FAIR lab to a more agile product development approach, driven by immediate market needs [4][7]. Group 2: Internal Changes and Leadership Dynamics - Meta has made significant changes, including a $100 million compensation package to attract young talent from competitors, and the formation of a new "superintelligence" team led by 28-year-old Alexandr Wang [4]. - LeCun's reporting structure changed, requiring him to report to Wang instead of the Chief Product Officer, which marginalized his FAIR lab and its research initiatives [4][7]. Group 3: Technological Disagreements - LeCun has publicly criticized the current trend of large language models (LLMs), arguing they are inadequate for achieving true reasoning and planning capabilities, which diverges from Zuckerberg's focus on immediate monetization [7][8]. - The emphasis on "world models," which LeCun advocates, contrasts sharply with the short-term goals set by Meta's leadership, leading to his decision to leave [7][8]. Group 4: Future Aspirations - Post-Meta, LeCun aims to fully commit to developing "world models," which he believes will redefine AI by enabling machines to learn from observing the physical world, akin to human cognitive development [8]. - He predicts that within 3-5 years, "world models" will become the mainstream AI architecture, challenging the current dominance of LLMs [8]. Group 5: Legacy and Impact - LeCun's career has been pivotal in the evolution of AI, having co-developed convolutional neural networks (CNNs) and led the FAIR lab to prominence [9]. - His departure is seen as a significant loss for Meta, indicating a potential shift in the AI research landscape and the company's future direction [9].
数字经济双周报(2025年第20期):科技巨头联手布局,全球AI算力联盟加速成型-20251113
Yin He Zheng Quan· 2025-11-13 09:07
Core Insights - The collaboration between OpenAI and Amazon marks a new phase in the global AI computing landscape, transitioning into a "multi-cloud collaboration" model [1][5][6] - OpenAI's partnership with AWS completes its supply chain in North America, integrating major cloud providers like Microsoft, Oracle, Google, and Amazon into its computing ecosystem [1][6] - The AI industry is returning to a "compute is king" paradigm, with a focus on computational power and capital investment as key competitive advantages [11] Section Summaries 1. Focus of the Report: Expansion of the Computing Alliance with OpenAI and Amazon - OpenAI's collaboration with AWS is seen as the final piece of its computing ecosystem, indicating a comprehensive multi-cloud strategy [5][6] - The partnership is expected to enhance OpenAI's capabilities, with significant investments in GPU resources and infrastructure planned for the coming years [5][6] 2. China Dynamics: Accelerated "Artificial Intelligence +" Initiatives - Chinese government policies are increasingly focused on integrating AI into manufacturing, transportation, and healthcare sectors, with a systematic approach to AI deployment [12][13] - Local policies and industry funds are fostering collaboration across regions, creating new industrial hubs [14][15] - Financial tools and capital markets are aligning to support AI initiatives, indicating a robust investment environment [15] 3. U.S. Dynamics: Parallel Expansion of AI Computing and Regulatory Restructuring - NVIDIA continues to strengthen its dominance in the AI ecosystem, with its market capitalization surpassing $5 trillion, raising concerns about systemic risks [18] - The dual focus on chip and energy sectors by companies like AMD and Google is creating a resonance between computing power and energy supply [19] 4. European Dynamics: Technology Sovereignty and AI Governance - The EU is reshaping its technological sovereignty through initiatives that combine funding and infrastructure development, alongside AI governance [3][4] 5. Technological Frontiers: Rise of World Models and Acceleration of Physical Intelligence - The development of world models and embodied intelligence is pushing AI towards a new era of "physical intelligence" [3][6] 6. Think Tank Insights: IDC's Three Forces Reshaping Future IT Landscape - IDC identifies three major forces driving the transformation of the IT landscape, with AI becoming a core engine of enterprise leadership [3][6]
图灵奖得主杨立昆离职创业,Meta股票蒸发1400亿
Tai Mei Ti A P P· 2025-11-13 08:38
Core Viewpoint - The departure of Yann LeCun, a Turing Award winner and chief scientist at Meta, has caused significant turmoil in the AI industry, leading to a 1.5% drop in Meta's stock price and a market value loss of 140 billion yuan [1][2]. Group 1: Background and Context - Yann LeCun is a foundational figure in deep learning, credited with developing the Convolutional Neural Network (CNN) architecture, which has been pivotal for modern AI advancements [1]. - LeCun's departure is not merely a personal career change but reflects a broader ideological conflict regarding the future direction of AI development, particularly between his vision of "world models" and Meta's focus on Large Language Models (LLMs) [2][3]. Group 2: Internal Dynamics at Meta - Meta has faced challenges in the AI space, with competitors like DeepSeek making breakthroughs in the Mixture of Experts (MoE) architecture, while Meta's own Llama4 model series has received lackluster market feedback [4]. - The company's financial commitment to AI has increased, with capital expenditures for AI reaching 70 billion yuan, and organizational restructuring has led to the establishment of a "Super Intelligence Lab" under new leadership, sidelining LeCun [6][7]. - LeCun's role has shifted from a strategic leader to a symbolic figure within the company, as he now reports to a younger executive and faces restrictions on publishing his team's research [6][7]. Group 3: Ideological Conflict - The ideological rift between LeCun and Meta's leadership became apparent with the emergence of ChatGPT, as Meta was slow to engage with LLM technology, leading to internal dissatisfaction and frustration [8][9]. - LeCun's insistence that LLMs represent a "dead end" in AI development has been a point of contention, as he believes they lack the necessary understanding of the physical world and cannot achieve true AGI [14][16]. - He advocates for a "world model" approach, which emphasizes learning through interaction with the environment rather than solely through text, proposing a modular AI architecture that contrasts with the monolithic nature of LLMs [17].
主打空间智能!“AI教母”李飞飞发布首款商用世界模型
Hua Er Jie Jian Wen· 2025-11-13 06:21
Core Insights - World Labs, co-founded by Stanford professor Fei-Fei Li, has launched its first commercial product, Marble, marking a significant step in the commercialization of AI in the realm of spatial intelligence [1][12] - Marble utilizes a multi-modal world model to generate editable and downloadable 3D interactive environments, providing a competitive edge against tech giants like Google [1][6] Product Features - The official version of Marble has expanded its functionality compared to the limited preview version, supporting larger-scale multi-modal inputs and introducing Marble Labs as a creative hub [4] - Marble aims to address the creative control issue in AI-generated content, allowing users to maintain their creativity while providing flexibility in input and editing [8][9] - Users can create expansive environments and combine multiple independent worlds, enhancing creative freedom [9] Business Model - Marble adopts a freemium and subscription-based model, with four tiers: a free version offering four generations per month, a standard version at $20/month, a professional version at $35/month, and a flagship version at $95/month, which unlocks all features [11] - The target market includes three main sectors: game development, visual effects (VFX), and virtual reality (VR), with a focus on providing new asset generation tools for creators [4][11] Competitive Landscape - Marble stands out as the first commercially viable product in the emerging world model space, while competitors like Google's Genie model remain in limited research preview stages [6] - The product's ability to generate persistent, downloadable 3D environments differentiates it from real-time models, reducing scene distortion and inconsistencies [6] Vision and Future Goals - Fei-Fei Li envisions achieving "spatial intelligence," enabling machines to understand and interact with the physical world, which is seen as essential for true general artificial intelligence [12][15] - World Labs has raised approximately $230 million since its founding in 2024, achieving a valuation exceeding $1 billion, supported by major investors including a16z, Nvidia Ventures, AMD Ventures, and Intel Capital [15]
小鹏成“最像特斯拉的中国公司”?
Di Yi Cai Jing Zi Xun· 2025-11-13 04:22
Core Insights - Xiaopeng Motors aims to redefine its identity beyond just an automotive company, focusing on becoming a leader in "physical AI" technology, which integrates digital and physical worlds [2][3] - The company recently held a technology day where it unveiled its second-generation VLA model and introduced products like Robotaxi, humanoid robots, and flying cars, indicating a shift towards broader technological ambitions [2][3] Company Strategy - Xiaopeng Motors' new slogan emphasizes its transition from being merely an AI automotive company to a "physical AI" company, reflecting its ambition to lead in various tech sectors [2] - The second-generation VLA model is designed to enhance the company's autonomous driving capabilities, with significant investments in computational power and data training [5][6] Market Position - Xiaopeng Motors briefly surpassed Li Auto in market capitalization, becoming the highest-valued new energy vehicle company in China, with a market cap of approximately $21.4 billion [3] - The company is perceived as the most similar to Tesla among Chinese automakers, with Tesla's market cap at $1.4 trillion, highlighting the competitive landscape [3] Product Development - The second-generation VLA model aims to improve the efficiency of autonomous driving by reducing information loss during data processing, although it still incorporates elements of the previous model [5][6] - Xiaopeng plans to launch three Robotaxi models by 2026, marking its entry into the Robotaxi market, which is currently untested by other new energy vehicle companies in China [12][14] Technological Innovation - The second-generation VLA is expected to outperform its predecessor in complex driving scenarios, with a reported 13-fold improvement in average takeover mileage on complicated roads [11] - Xiaopeng's humanoid robot, IRON, showcases advancements in locomotion but faces challenges in manipulation, which is crucial for broader applications [18][20] Future Outlook - The year 2026 is identified as a critical milestone for Xiaopeng Motors, with plans for mass production of its new technologies, including the second-generation VLA and humanoid robots [4][11] - The company is strategically avoiding the complexities of industrial applications for its robots, focusing instead on service-oriented roles in the initial phase of commercialization [20]
95后AI才女,官宣加入小米!雷军千万年薪挖人
Sou Hu Cai Jing· 2025-11-13 04:20
雷军去年开出千万年薪挖角的95后AI才女,如今终于官宣入职小米。 2024年12月底多家媒体报道,小米创始人雷军亲自出面,想用千万年薪招揽,曾在国际顶会发表8篇论文、DeepSeek-V2关键开发者之一的罗福莉,领导 小米AI大模型团队。 但在那之后,双方都没给出官方消息,罗福莉也因为不想被过度打扰,慢慢淡出了公众视野,"我不是天才少女,只想安安静静做难而正确的事情。" 她到底有没有加入小米,也成了谜。 直到11月12日,罗福莉发了条朋友圈,正式确认已经加入小米。 尘埃落定 加入Xiaomi MiMo团队 虽说这是罗福莉的正式官宣,但她与小米之间早已有了不少羁绊。 今年2月,罗福莉的家属曾透露她已到新岗位上班,但当时小米的员工系统中并未出现她的名字,这让她的去向多了一层悬念。 9月,罗福莉在知乎上评论了小米语音大模型开源的帖子,直言"小米开源了一个语音大模型,非常强!建议马上实装"。 到了10月,她的名字出现在小米论文中,以通讯作者身份位列作者最后一位。 这篇论文由"北京大学计算机学院多媒体信息处理国家重点实验室",以及"小米大模型核心团队"联合署名,却并未标注罗福莉所属团队。 因此外界推测她可能是合作研究, ...