Workflow
世界模型
icon
Search documents
黄仁勋随特朗普访英:26亿美元下注英国AI,智驾公司Wayve或获5亿美元加码
Sou Hu Cai Jing· 2025-09-20 09:57
Core Insights - NVIDIA's CEO Jensen Huang announced a £2 billion (approximately $2.6 billion) investment in the UK to catalyze the AI startup ecosystem and accelerate the creation of new companies and jobs in the AI sector [1] - Wayve, a UK-based autonomous driving startup, is expected to secure one-fifth of this investment, with NVIDIA evaluating a $500 million investment in its upcoming funding round [1][2] - Wayve's upcoming Gen 3 hardware platform will be built on NVIDIA's DRIVE AGX Thor in-vehicle computing platform [1] Company Overview - Wayve was founded in 2017 with the mission to reimagine autonomous mobility using embodied AI [3] - The company has developed a unique technology path focused on embodied AI and end-to-end deep learning models, distinguishing itself from mainstream autonomous driving companies [3][8] - Wayve is the first company in the world to deploy an end-to-end deep learning driving system on public roads [3] Technology and Innovation - Embodied AI allows an AI system to learn tasks through direct interaction with the physical environment, contrasting with traditional systems that rely on manually coded rules [8] - Wayve's end-to-end model, referred to as AV2.0, integrates deep neural networks with reinforcement learning, processing raw sensor data to output vehicle control commands [8][10] - To address the challenges of explainability in end-to-end models, Wayve developed the LINGO-2 model, which uses visual and language inputs to predict driving behavior and explain actions [10][12] Data and Training - Wayve has created the GAIA-2 world model, a video generation model designed for autonomous driving, which generates realistic driving scenarios based on structured inputs [14][15] - GAIA-2 is trained on a large dataset covering various geographical and driving conditions, allowing for effective training without extensive real-world driving data [16][17] - The model's ability to simulate edge cases enhances training efficiency and scalability [18] Strategic Partnerships - Wayve's technology does not rely on high-definition maps and is hardware-agnostic, allowing compatibility with various sensor suites and vehicle platforms [20] - The company has established partnerships with Nissan and Uber to test its autonomous driving technology [20] Leadership and Team - Wayve's leadership team includes experienced professionals from leading companies in the autonomous driving sector, enhancing its strategic direction and technological capabilities [25][26]
任少卿加入中科大......
自动驾驶之心· 2025-09-20 05:35
Core Viewpoint - Ren Shaoqing, a prominent figure in AI and autonomous driving, has returned to his alma mater, the University of Science and Technology of China, to start a new academic program focusing on advanced AI topics [4][6]. Group 1: Background of Ren Shaoqing - Ren Shaoqing is a co-founder of Momenta and former Vice President of NIO, with a strong academic background including a PhD from the University of Science and Technology of China [4]. - He is recognized for his contributions to AI, particularly as the author of ResNet and Faster R-CNN, with over 440,000 citations, making him the most cited Chinese scholar globally [4]. Group 2: Academic Program Details - The new program will focus on areas such as AGI (Artificial General Intelligence), world models, embodied intelligence, and AI for Science [6]. - The program is open for recruitment of master's and doctoral students, with urgent interviews scheduled for students with recommendation qualifications starting next Monday [6].
任少卿在中科大招生了!硕博都可,推免学生下周一紧急面试
量子位· 2025-09-20 05:12
Core Viewpoint - Ren Shaoqing, a prominent figure in AI and computer vision, is starting a recruitment program at his alma mater, the University of Science and Technology of China, focusing on advanced topics in AI such as AGI, world models, embodied intelligence, and AI for Science [1][2]. Group 1: Recruitment Details - The recruitment is open for both master's and doctoral students, with emergency interviews starting on the upcoming Monday for students with recommendation qualifications [3]. - Interested students can send their resumes to Ren Shaoqing's email for inquiries regarding the application process and interview details [16]. Group 2: Background of Ren Shaoqing - Ren Shaoqing is an expert in computer vision and autonomous driving, having graduated from the University of Science and Technology of China and obtained a joint PhD with Microsoft Research Asia [4][5]. - He has been recognized as one of the most influential scholars in AI, ranking 10th in the AI 2000 list, and received the Future Science Prize in Mathematics and Computer Science in 2023 [6]. Group 3: Contributions to AI - Ren is a co-author of ResNet, a groundbreaking work in deep learning that addresses the vanishing gradient problem, significantly impacting fields requiring high perception capabilities like computer vision and autonomous driving [7]. - ResNet has received over 290,000 citations and won the Best Paper Award at CVPR 2016 [8]. - He also contributed to Faster R-CNN, an efficient two-stage object detection algorithm that balances speed and accuracy [10]. Group 4: Role in NIO - After completing his PhD, Ren co-founded Momenta and later joined NIO, where he played a key role in developing autonomous driving algorithms and leading the smart driving R&D team [13]. - At NIO, he developed the NIO World Model (NWM), which integrates spatiotemporal cognition and generative capabilities, allowing for high-fidelity scene reconstruction and long-term scenario simulation [14][15].
具身的这几个方向,组成了所谓的大小脑算法
具身智能之心· 2025-09-19 00:03
Core Viewpoint - The article discusses the evolution and current trends in embodied intelligence technology, emphasizing the integration of various models and techniques to enhance robotic capabilities in real-world environments [3][10]. Group 1: Technology Development Stages - The development of embodied intelligence has progressed through several stages, starting from grasp pose detection to behavior cloning, and now to diffusion policy and VLA models [7][10]. - The first stage focused on static object grasping with limited decision-making capabilities [7]. - The second stage introduced behavior cloning, allowing robots to learn from expert demonstrations but faced challenges in generalization and error accumulation [7]. - The third stage, marked by the introduction of diffusion policy methods, improved stability and generalization by modeling action sequences [8]. - The fourth stage, beginning in 2025, explores the integration of VLA models with reinforcement learning and world models to enhance predictive capabilities and multi-modal perception [9][10]. Group 2: Key Technologies and Techniques - Key technologies in embodied intelligence include VLA, diffusion policy, and reinforcement learning, which collectively enhance robots' task execution and adaptability [5][10]. - VLA models combine visual perception, language understanding, and action generation, enabling robots to interpret human commands and perform complex tasks [8]. - The integration of tactile sensing with VLA models expands the sensory capabilities of robots, allowing for more precise operations in unstructured environments [10]. Group 3: Industry Implications and Opportunities - The advancements in embodied intelligence are leading to increased demand for engineering and system capabilities, transitioning from theoretical research to practical deployment [10][14]. - There is a growing interest in training and deploying various models, including diffusion policy and VLA, on platforms like Mujoco and IsaacGym [14]. - The industry is witnessing a surge in job opportunities and research interest, prompting many professionals to shift focus towards embodied intelligence [10].
从 ChatGPT 到 Marble,李飞飞押注的下一个爆发点是 3D 世界生成?
锦秋集· 2025-09-18 07:33
Core Viewpoint - The article discusses the launch of World Labs' latest spatial intelligence model, Marble, which allows users to generate persistent and navigable 3D worlds from images or text prompts, marking a significant advancement in spatial intelligence technology [1][2]. Summary by Sections Marble's Features and Comparison - Marble shows significant improvements over similar products in geometric consistency, style diversity, world scale, and cross-device support, allowing users to truly "walk into" AI-generated spaces [2]. Li Feifei's Vision and World Model Narrative - Li Feifei's approach emphasizes a transition from language understanding to world understanding, culminating in spatial intelligence as a pathway to AGI (Artificial General Intelligence) [3][6]. Limitations of LLMs - While acknowledging the achievements of large language models (LLMs), Li Feifei highlights their limitations in understanding the three-dimensional world, asserting that true intelligence requires spatial awareness [5][7]. The Necessity of Spatial Intelligence for AGI - Spatial intelligence is deemed essential for AGI, as the real world is inherently three-dimensional, and understanding it requires more than just two-dimensional observations [16]. Evolution of AI Learning Paradigms - The article outlines three phases of AI learning evolution: supervised learning, generative modeling, and the current focus on three-dimensional world models, emphasizing the importance of data, computation, and algorithms [21][24]. Data Strategy for World Models - A mixed approach to data collection is necessary for training world models, combining real data acquisition, reconstruction, and simulation to overcome the scarcity of high-quality three-dimensional data [26]. Practical Applications and Development Path - The initial focus for Marble's application is on content production, transitioning to robotics and AR/VR, with an emphasis on creating interactive 3D worlds for various industries [29][30].
来自MIT最强AI实验室:OpenAI天才华人研究员博士毕业了
3 6 Ke· 2025-09-17 07:05
Core Insights - The article highlights the achievements of Boyuan Chen, a Chinese researcher at OpenAI, who recently completed his PhD at MIT in under four years, focusing on world models, embodied AI, and reinforcement learning [1][5][7]. Group 1: Academic Background and Achievements - Boyuan Chen holds a PhD in Electrical Engineering and Computer Science from MIT, with a minor in philosophy [7][24]. - He has been involved in significant projects at OpenAI, including the development of GPT image generation technology and the Sora video generation team [5][1]. - During his time at Google DeepMind, he contributed to the training of multimodal large language models (MLLM) using large-scale synthetic data [7][10]. Group 2: Research Focus and Future Aspirations - Chen emphasizes the importance of visual world models for embodied intelligence, believing that integrating these fields will enhance AI's understanding and interaction with the physical world [4][7]. - He expresses optimism about the future of embodied intelligence, predicting it will be a key technology for the next century and hopes to witness the emergence of general-purpose robots [17][20]. - OpenAI is reportedly increasing its efforts in robotics technology, aiming to develop algorithms for controlling robots and hiring experts in humanoid robotics [20].
DeepMind哈萨比斯最新认知都在这里了
量子位· 2025-09-15 05:57
Core Insights - The discussion emphasizes the potential of achieving Artificial General Intelligence (AGI) within the next decade, which could usher in a new scientific renaissance and significant advancements across various fields such as energy and health [2][7][51] - Current AI systems, while advanced, lack true creativity and the ability to generate new hypotheses, which are essential characteristics of AGI [5][34] Group 1: AGI Development - Demis Hassabis predicts that AGI could be realized around 2030, but current AI systems are not yet at a "PhD-level intelligence" due to their limited capabilities in various domains [4][35] - The construction of AGI requires a comprehensive understanding of the physical world, not just abstract concepts like language or mathematics [6][22] - Hassabis believes that the arrival of AGI will lead to a "scientific golden age," providing immense benefits to humanity [7][51] Group 2: DeepMind's Role - DeepMind is viewed as a central engine within Alphabet, integrating various AI teams to develop models like Gemini, which are now embedded in Google's ecosystem [15] - The team at DeepMind consists of approximately 5,000 members, primarily engineers and researchers, focusing on advancing AI technologies [16] Group 3: Innovations in AI Models - The Genie 3 model represents a breakthrough in creating interactive virtual environments based on textual descriptions, showcasing the ability to generate realistic physical interactions [17][20] - The development of mixed models, which combine learning components with established solutions, is seen as crucial for advancing AGI [45][47] Group 4: Future of Robotics - Hassabis envisions a future where robots can understand and interact with the physical world through language commands, enhancing their utility in everyday tasks [23][25] - The design of humanoid robots is considered beneficial for navigating human environments, while specialized robots will still have their unique applications [26][27] Group 5: AI in Drug Development - DeepMind is working on transforming drug development processes, aiming to reduce the timeline from years to weeks or days, leveraging breakthroughs like AlphaFold [41][43] - Collaborations with pharmaceutical companies are underway to advance research in areas such as cancer and immunology [44] Group 6: Energy Efficiency and AI - The conversation highlights the importance of energy efficiency in AI systems, with advancements in model architecture and hardware optimization potentially mitigating energy demands [49][50] - Hassabis believes that the contributions of AI to energy efficiency and climate change will outweigh its energy consumption in the long run [50] Group 7: Creative Tools and User Experience - The future of creative tools like Nano Banana is characterized by their ability to allow users to interact intuitively, enabling rapid iterations and creative processes [38][39] - These tools are designed to democratize creativity, making advanced capabilities accessible to a broader audience while enhancing the productivity of professional creators [39][40]
理想汽车推送OTA 8.0版本,李想称公司辅助驾驶开始“全面领先”,VLA优于世界模型?
Mei Ri Jing Ji Xin Wen· 2025-09-12 10:06
Core Viewpoint - Li Auto's advanced driver assistance and smart cockpit have transitioned from "partially leading" to "fully leading" following the OTA 8.0 update of their vehicle system [1] Group 1: OTA 8.0 Update - The OTA 8.0 version has officially launched, enhancing driver assistance, smart cockpit, and smart electric features [3] - The new VLA (Vision-Language-Action Model) driver model is being fully pushed to Li MEGA and L series AD Max models [3] - Li Auto's chairman, Li Xiang, described VLA as the third generation of their driver assistance technology, emphasizing its ability to understand road conditions, comprehend human commands, and remember user habits [3] Group 2: VLA Model Features - The current version of VLA is referred to as a "crippled version" due to the temporary absence of a highly praised feature [4] - Li Auto has acknowledged the need for a cautious approach in rolling out new features, especially after the suspension of the VLA remote summon function [4] - The VLA model enhances the accuracy of route selection in complex scenarios and remembers user speed preferences for specific roads [6] Group 3: Industry Competition and Technology - Other companies like Yuanrong Qixing and XPeng Motors are also developing VLA models, indicating a competitive landscape in this technology [7] - The VLA model is seen as an "intelligent enhanced version" of end-to-end models, addressing challenges in handling unseen scenarios [8] - The VLA model integrates perception, action execution, and language processing, enhancing its ability to understand and make decisions in complex environments [8] Group 4: Differing Approaches - Huawei's approach focuses on the World Action model, which bypasses the language processing step, emphasizing direct control through vision [12] - The debate between VLA and world models highlights differing strategies in achieving advanced autonomous driving capabilities [12][13] - Experts suggest that both VLA and world models can coexist and complement each other, with different companies choosing paths based on their specific goals [13]
成都研发出国内首个基于世界模型的机器人任务执行系统 让人形机器人实现“类人思考”
Si Chuan Ri Bao· 2025-09-12 06:23
据悉,在操作中,给机器人一张想达成的目标图片后,机器人便会自动判断现有状态,自主规划任 务并执行,最终使结果与目标图片相符合。该系统在陌生环境中展现出强大的自适应性和任务完成度, 从源头上解决了人形机器人不够"聪明"的问题,成为加快机器人迈向实用化、商业化的重要一步。 成都人形机器人创新中心相关负责人介绍,世界模型是一种真正接近人类大脑思考方式的系统框 架,通过学习现实世界中的物理和因果规律,具备"类似条件反射的物理直觉",可在内部模拟环境变 化,基于当前环境状态推演未来状态,并评估行为所产生的后果。以人类为例,当人们看到乌云密布 时,就会自然地预判"马上就要下雨了",因为人的大脑已经提前模拟了未来的天气变化。 记者在演示视频中看到,当给机器人一张插有吸管的玻璃瓶图片作为目标,机器人随即对现场环境 进行观察,看到了一个没有吸管的玻璃瓶。此时,R-WMES系统通过规划,生成一套完整的机器人动作 方案:指示机器人先抓取一根吸管,再插入玻璃瓶,从而使最终结果与预设的"带吸管玻璃瓶"目标图片 完全一致。 这是国内首个基于世界模型的机器人任务执行系统(R-WMES),标志着成都在核心人工智能与人 形机器人技术的"世界模 ...
特斯拉、华为与新势力决胜:世界模型大战
3 6 Ke· 2025-09-12 02:45
Core Viewpoint - The emergence of "World Models" has complicated the high-end intelligent driving landscape, leading to debates over the authenticity and effectiveness of various models like VLA, WEWA, and others [3][5]. Group 1: Company Perspectives - Xiaopeng Motors claims to be the only company in China that has truly developed VLA, criticizing competitors for creating modified versions [3][7]. - Huawei's CEO of Intelligent Automotive Solutions stated that the company will not pursue the VLA path, emphasizing a focus on World Action (WA) instead of language processing [3][5]. - Li Auto is developing a foundational model to support its MindVLA algorithm, which is positioned as a marketing strategy rather than a true VLA implementation [7][8]. Group 2: Technical Insights - VLA (Vision-Language-Action) is seen as an evolution of the end-to-end plus VLM (Vision-Language Model) approach, addressing some limitations of the previous models [5][7]. - Xiaopeng is developing a large-scale driving model with 720 billion parameters, utilizing cloud distillation to deploy smaller models to vehicles [8][15]. - The concept of World Models, initially proposed by Tesla, aims to create a virtual environment for autonomous driving learning and validation [9][11]. Group 3: Industry Trends - The industry is witnessing a shift from perception-driven to cognition-driven approaches, with various companies exploring different architectures for intelligent driving [12][13]. - The debate over the effectiveness of VLA versus World Models reflects a broader struggle within the industry to define the best methodologies for achieving autonomous driving capabilities [17]. - The integration of cloud and vehicle-based models is seen as essential for optimizing perception and decision-making in autonomous systems [17].