世界模型
Search documents
极佳视界获新一轮亿元级 A1 轮融资,CEO:“物理世界 ChatGPT 时刻”将在 2 至 3 年内到来
AI前线· 2025-11-05 05:09
Core Viewpoint - The article discusses the recent financing round of GigaVision, highlighting its focus on physical AI and the development of world models that drive general intelligence in the physical world. The company has completed three rounds of financing within two months, indicating strong investor interest and confidence in its technology and market potential [2][4]. Financing and Company Background - GigaVision has successfully completed a new round of financing amounting to hundreds of millions, led by Huawei Hubble and Huakong Fund. This follows two previous rounds of financing in August, also totaling hundreds of millions [2]. - Founded in 2023, GigaVision focuses on physical AI and offers a range of products including the GigaWorld platform, GigaBrain model, and Maker ontology [2][4]. Team and Expertise - The core team of GigaVision is closely associated with Tsinghua University's Automation Department and includes top researchers from prestigious institutions and executives from leading companies like Baidu and Microsoft. The team has published over 200 top AI papers and won numerous global AI competition awards [4]. World Model Technology - GigaVision emphasizes the immediate value of world model technology, which addresses issues such as high-dimensional data scarcity and the Sim2Real gap in traditional simulators. This technology allows AI to model physical environments digitally, improving decision-making and reducing trial-and-error in unfamiliar settings [6][9]. - Major tech companies like NVIDIA, Google DeepMind, and Tesla are also investing in world model applications, indicating its significance in the industry [6][7]. Future Predictions and Goals - GigaVision's CEO predicts that a "Physical World ChatGPT moment" will occur within 2 to 3 years, driven by advancements in world models, VLA, and reinforcement learning, aiming for a 95% success rate in 90% of common tasks [8][14]. - The company aims to create a high-availability world model system that can learn from limited real data, generate high-fidelity synthetic data, and enhance the realism of generated data through multi-modal feedback [9][10]. Collaborations and Market Strategy - GigaVision has established deep collaborations with various humanoid robot innovation centers, research institutions, and cloud computing companies to build a leading data factory and physical AI platform [13]. - The company plans to continue advancing physical AI model development and commercial applications, focusing on a three-pronged approach of "intelligence - ontology - scenarios" to accelerate the realization of its vision [14].
谷歌Dreamer大神离职,自曝错过Transformer
3 6 Ke· 2025-11-05 02:20
刚刚,「Dreamer」大神Danijar Hafner,宣布离开他曾工作近十年的谷歌。 离职前Danijar担任Google DeepMind旧金山分部的资深研究科学家(Staff Research Scientist)。 他的研究目标是「构建能够理解世界并与世界互动的通用智能体」。 作为谷歌世界模型大牛,Danijar曾主导/联合主导了Dreamer系列(Dreamer、DreamerV3、Dreamer4 等)的开发。 Danijar Hafner 他在推文中写道:「今天是我在DeepMind的最后一天」。 回顾了在Google和DeepMind将近10年的工作经历,Danijar认为「一个重要的篇章走到了终点」。 Danijar在谷歌的早期经历,多是以研究员的身份参与谷歌研究院、DeepMind、Brain Team等团队的工作。 从他的教育经历中,也能清晰看出他的职业发展轨迹。 | Researcher 研究员 | Google (google.com) | | 2023 - Present | | --- | --- | --- | --- | | 谷歌 (google.com) | | | 20 ...
理想郎咸朋:VLA 加强化学习将成为车企真正的护城河
晚点LatePost· 2025-11-04 08:03
Core Viewpoint - The article discusses the evolution of Li Auto's autonomous driving technology, particularly focusing on the development and implementation of the VLA (Vision-Language-Action) model, which aims to enhance the driving experience by integrating multi-modal AI capabilities. The article highlights the challenges faced by the team, the strategic decisions made, and the competitive landscape in the autonomous driving sector [5][6][18]. Team Development and Structure - The Li Auto autonomous driving team has undergone significant changes since its inception in 2018, with three generations of core personnel. The recent restructuring aimed to create a flatter organization with 11 new departments, enhancing communication and decision-making efficiency [8][9][51]. - The team has shifted from a centralized, closed development model to a more open and collaborative approach, reflecting the need for agility in AI development [10][11]. Strategic Decisions - The decision to pursue the VLA model was driven by the recognition that simply following existing paths, such as those taken by competitors like Huawei and Tesla, would not suffice. The team aimed to create a new competitive edge through innovative technology [6][14][18]. - The VLA model is positioned as a significant advancement over previous methods, with the goal of achieving L4 level autonomous driving capabilities. The model emphasizes the importance of human-like reasoning and decision-making in driving [21][29]. Challenges and Criticism - The VLA model has faced skepticism from industry experts, with concerns about its feasibility and the technical challenges associated with multi-modal AI integration. Critics argue that the approach may be overly simplistic or "tricksy" compared to other methods [22][24]. - Despite the criticism, the team believes that the challenges presented by the VLA model are indicative of its potential correctness and innovation [24][25]. Future Outlook - The company aims to establish a robust reinforcement learning loop to enhance the VLA model's capabilities, with expectations of significant improvements in user experience by the end of 2023 and into 2024 [28][39]. - The long-term vision includes achieving L4 autonomous driving by 2027, with a focus on building a comprehensive data-driven ecosystem that supports continuous learning and adaptation [41][44].
对话郎咸朋:VLA 技术论战、团队换血与不被看好时的自我证明
晚点Auto· 2025-11-04 03:58
Core Viewpoint - The article discusses the evolution of Li Auto's autonomous driving technology, particularly focusing on the development and implementation of the VLA (Vision-Language-Action) model, which aims to enhance the driving experience by enabling the system to think like a human rather than merely mimicking driving behavior [2][3][4]. Development of Li Auto's Autonomous Driving Team - The autonomous driving team at Li Auto was established in 2018 and has undergone three generations of key personnel changes, reflecting the challenges and growth within the organization [4][7][46]. - The team initially lacked resources and had to adapt by retrofitting existing vehicles with laser radar for technology research [3][4]. Shift to VLA Model - Li Auto transitioned to the VLA model to differentiate itself from competitors like Huawei and Tesla, emphasizing the need for next-generation technology rather than merely following existing paths [3][4][17]. - The VLA model utilizes multi-modal AI to improve the driving experience, aiming for a more human-like decision-making process [3][4][21]. Internal and External Challenges - The development of VLA has faced internal team restructuring and external skepticism, with industry leaders questioning its feasibility and effectiveness [3][4][21][22]. - Despite criticism, the company believes that the challenges posed by competitors validate the direction of the VLA model [4][21]. Organizational Changes - In September 2023, Li Auto restructured its autonomous driving department into 11 sub-departments to promote a more efficient and AI-focused organization [6][7]. - The new structure aims to enhance communication and decision-making efficiency, moving away from a centralized development model [8][9]. Future Goals and Expectations - Li Auto aims to achieve L4 level autonomous driving by 2027, with significant milestones set for 2021 and 2023 [37][39]. - The company anticipates that the VLA model will enable self-iteration and improvement, potentially surpassing competitors in the Chinese market [39][40]. Technical Considerations - The VLA model is designed to operate on existing autonomous driving chips, although these chips were not originally optimized for large models [33][34]. - Li Auto is investing in cloud computing capabilities, with a current training capacity of 10 EFLOPS and plans for further expansion [32][33]. Market Positioning - The company is focused on establishing a strong market presence in China before expanding internationally, recognizing the unique challenges of commercializing autonomous driving technology [41][42].
从DriveVLA-W0出发:探讨世界模型如何放大VLA的扩展定律(中科院)
自动驾驶之心· 2025-11-04 00:03
戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>直播和内容获取转到 → 自动驾驶之心知识星球 点击按钮预约直播 在自动驾驶领域,通过大规模数据来扩展视觉-语言-动作模型,为构建更通用的驾驶智能提供了一条充满前景的道路。然而,VLA模型一直面临" 监督缺失 "的问 题:其庞大的模型能力仅由稀疏、低维的动作信号进行监督,导致其大部分表征潜力未能得到充分利用。 为解决此问题,中科院和华为引望的团队提出了 DriveVLA-W0, 一种利用世界模型来预测未来图像的训练范式。 为验证DriveVLA-W0的通用性,本文在两种主流 VLA架构上展开验证:针对采用离散视觉token的VLA模型,设计自回归世界模型;针对基于连续视觉特征的VLA模型,设计扩散世界模型。基于世界建模学习到的 丰富表征,本文进一步引入轻量级动作专家(action expert),以解决实时部署中的推理耗时问题。 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 DriveVLA-W0: 利用世界模型放大VLA的 拓展定律 时间:11.4 / 19:30-20:30 直播简介 VLA模型是通向通用自动驾驶的希望路 径,却受限于"监督赤字": ...
极佳视界完成新一轮亿元级A1轮融资 华为哈勃和华控基金联合领投
Zheng Quan Shi Bao Wang· 2025-11-03 11:36
根据现有技术进展速度,黄冠预测称,"物理世界ChatGPT时刻"将在2至3年内到来。 11月3日,记者获悉,具身智能基础模型企业极佳视界近日完成新一轮亿元级A1轮融资,本轮融资由华 为哈勃、华控基金联合投资。此前8月底,极佳视界宣布完成Pre-A&Pre-A+连续两轮数亿元融资。 极佳视界成立于2023年,聚焦物理AI,专注于"世界模型驱动的物理世界通用智能"。其产品包括世界模 型平台GigaWorld(驾驶和具身)、具身基础模型GigaBrain、通用具身本体Maker等物理AI全栈软硬件产 品。 极佳视界认为,世界模型的技术兑现不必等到明年,其技术价值已在当前阶段展现:不仅在高维高质量 数据稀缺、传统仿真器Sim2Real Gap两大问题上产生改善,也对强化学习训练效果带来提升。 极佳视界创始人兼CEO黄冠表示,"无论从真实业务与技术需求,还是产业与学术层面的共识看,世界 模型已成为具身智能关键且热门的方向。华为也将世界模型列为未来智能世界2035年十大技术趋势之 首,这也是投资极佳视界的底层逻辑。"另外不光投资,华为也从多个业务线和极佳推进战略合作。 具体而言,世界模型将主要解决泛化性,同时VLA负责解 ...
詹锟兼任理想美国硅谷研发中心负责人并将直播讨论世界模型与VLA
理想TOP2· 2025-11-03 07:33
Core Viewpoint - The article discusses the advancements in Tesla's FSD v14 and explores the potential of VLA (Vehicle Language Architecture) in defining the next generation of autonomous driving solutions, comparing it with WA (World Model Architecture) [1]. Group 1: Technology Discussion - The article highlights the exploration of world models and the future development direction of VLA, questioning the possibility of a unified approach [3]. - It emphasizes the high demand for data and computing power, which is making it increasingly difficult for academia to participate in the intelligent driving sector, while also considering what opportunities remain for academic involvement [3]. Group 2: Expert Insights - The article features insights from various experts in the field, including a senior director from Li Auto's VLA team, a senior algorithm scientist from Bosch, and a parking team leader from Changan Automobile, indicating a diverse range of perspectives on the topic [4]. - The discussion is moderated by a professor from Shanghai Jiao Tong University, showcasing the academic interest in the advancements of autonomous driving technologies [6].
华为哈勃+华控基金联合领投极佳视界A1轮,引领物理AI终局路线
3 6 Ke· 2025-11-03 05:12
突破三大技术瓶颈,极佳视界推出高可用世界模型体系。 当AI能力开始与物理世界连通,"具身智能"成为落地载体。 今年以来,"世界模型"在具身领域快速升温:谷歌、OpenAI、特斯拉、英伟达等巨头密集布局。 多位业内人士判断,世界模型将会缓解具身智能在数据稀缺与泛化困难上的瓶颈,极可能在VLA之 后,成为2026年的核心技术趋势。 在这一背景下,专注世界模型方向的物理AI公司——极佳视界,在近两个月内连续完成三轮融资,并 宣布最新进展。 据《智能涌现》获悉,极佳视界近日完成新一轮亿元级A1轮融资,本轮融资由华为哈勃、华控基金联 合投资。此前8月底,极佳视界宣布完成Pre-A&Pre-A+连续两轮数亿元融资。 通俗讲,世界模型是在数字世界里建模物理世界和运行规律:让AI在动手前,先在"脑内"搭建简化的物 理沙盒,预测下一秒会发生什么并据此选择执行动作,从而在陌生环境中少试错、更稳健。 2个月3轮融资,体现了资本市场对极佳视界团队实力、技术路线和业务推进的认可,也折射出投资方 对"物理世界通用智能"(物理AI)关键转折点的判断。 成立于2023年的极佳视界,聚焦物理AI,专注于"世界模型驱动的物理世界通用智能"。其产 ...
美团新独立APP,点不了菜只能点AI
猿大侠· 2025-11-03 04:11
Core Viewpoint - Meituan has launched the LongCat-Flash-Omni model, which supports multi-modal capabilities and has achieved state-of-the-art (SOTA) performance in open-source benchmarks, surpassing competitors like Qwen3-Omni and Gemini-2.5-Flash [2][4][8]. Group 1: Model Performance - LongCat-Flash-Omni is capable of handling text, images, audio, and video inputs effectively, maintaining high performance across all modalities [3][27]. - The model features a total of 560 billion parameters, with only 27 billion activated, allowing for high inference efficiency while retaining a large knowledge base [4][40]. - It is the first open-source model to achieve real-time interaction across all modalities under current flagship model performance standards [8][42]. Group 2: User Experience - Users can experience the LongCat model through the LongCat APP and Web, which support various input methods including text, voice, and image uploads [9][10]. - The model demonstrates quick response times and smooth interactions, even in complex scenarios, enhancing user experience [27][28][30]. Group 3: Development Strategy - Meituan's iterative model development strategy focuses on speed, specialization, and comprehensive capabilities, aiming to create a robust "world model" that integrates digital and physical worlds [31][45]. - The company has invested in both software and hardware to achieve deep connections between the digital and physical realms, emphasizing the importance of hardware in extending software's impact [46][47]. Group 4: Future Outlook - Meituan's long-term vision includes advancing embodied intelligence and creating a comprehensive robotics framework that connects various service scenarios [57][62]. - The company aims to leverage AI and robotics to transform the retail industry, enhancing efficiency and user experience across its services [60][63].
美团新独立APP,点不了菜只能点AI
量子位· 2025-11-03 03:12
Core Viewpoint - Meituan is leveraging its expertise in delivery services to develop advanced AI models, with the latest being LongCat-Flash-Omni, which supports multimodal capabilities and achieves state-of-the-art performance in open-source benchmarks [2][8]. Group 1: Model Performance and Features - LongCat-Flash-Omni has surpassed other models like Qwen3-Omni and Gemini-2.5-Flash in comprehensive multimodal benchmarks, achieving open-source state-of-the-art status [2]. - The model maintains high performance across individual modalities such as text, image, audio, and video, demonstrating robust capabilities without sacrificing intelligence [3]. - With a total of 560 billion parameters and only 27 billion active parameters, the model utilizes a "large total parameters, small active" MoE architecture, ensuring high inference efficiency while retaining extensive knowledge [4]. Group 2: User Experience and Accessibility - LongCat-Flash-Omni is the first open-source model capable of real-time multimodal interaction, enhancing user experience significantly [8]. - The model is available for free on Meituan's LongCat APP and web platform, supporting various input methods including text, voice, and image uploads [9][10]. - Users have reported a smooth interaction experience, with quick response times and effective handling of complex multimodal tasks [25][26]. Group 3: Development Strategy - Meituan's iterative model development strategy focuses on speed, specialization, and comprehensive capabilities, aiming to create an AI that can understand and interact with complex real-world scenarios [29][31]. - The company has a clear path for expanding its AI capabilities, moving from basic chatbots to advanced multimodal models, thereby laying the groundwork for a "world model" that deeply understands reality [47][62]. - Meituan's investments in embodied intelligence and robotics are part of a broader strategy to connect the digital and physical worlds, enhancing service efficiency and user experience [42][56]. Group 4: Challenges and Innovations - The development of multimodal models presents challenges such as high integration difficulty, real-time interaction performance, and training efficiency [33][36]. - LongCat-Flash-Omni addresses these challenges through innovative architectural designs, including a unified end-to-end architecture and progressive training methods that enhance multimodal capabilities [38][39]. - The model's design allows for low-latency real-time interactions, setting it apart from existing models that struggle with responsiveness [36][39].