世界模型
Search documents
理想汽车的VLA“长征”
Jing Ji Guan Cha Wang· 2025-08-12 10:04
Core Insights - The core philosophy of Li Auto's CEO, Li Xiang, emphasizes a long-term approach to success, advocating for patience and resilience in the face of industry challenges [1] - The launch event for the Li Auto i8 highlighted the introduction of the VLA driver model, which reflects the company's commitment to long-term innovation rather than short-term gains [1][3] Group 1: VLA Driver Model - The VLA driver model distinguishes itself from traditional end-to-end architectures by utilizing reinforcement learning to enhance machine understanding of driving decisions [4][11] - The goal for VLA is to significantly improve safety metrics, aiming for an accident rate of one in 600 million kilometers, compared to current figures of 350-400 million kilometers for Li Auto's assisted driving [4][8] - VLA's ability to adapt to individual driving styles through continuous learning is a key feature, allowing for a personalized driving experience [4][8] Group 2: Testing and Efficiency - Li Auto has opted for simulation testing over extensive real-world testing, achieving over 40 million kilometers of simulated driving by mid-2025, with daily peaks of 300,000 kilometers [5][9] - The company has focused on creating a robust simulation environment to address the limitations of real-world testing, which cannot fully replicate extreme driving scenarios [9][10] - The efficiency of VLA's testing process is a critical factor in its development, with a strong emphasis on transforming research and development workflows [5][9] Group 3: Technical Challenges - Li Auto's approach to developing the VLA model involves overcoming significant challenges in data, algorithms, computing power, and engineering capabilities [19] - The company has accumulated 4.3 billion kilometers of assisted driving data and 1.2 billion kilometers of valid feedback data, which are essential for refining the VLA model [9] - The VLA model's architecture is designed to provide logical reasoning capabilities, addressing the shortcomings of traditional end-to-end models [11][12] Group 4: Market Response and Future Goals - The market response to the VLA model has been positive, with a 72.4% trial rate and a 92% satisfaction rate reported for Li Auto's intelligent driving features [8] - Li Auto aims to enhance its MPI takeover mileage to 400-500 kilometers by the end of 2025, with aspirations to reach 1,000 kilometers in the near future [8] - The company's commitment to long-term innovation is reflected in its strategic decisions, prioritizing safety and effective computing power over immediate performance metrics [25][26]
对话星动纪元陈建宇:人形机器人的通途与征途
Huan Qiu Wang Zi Xun· 2025-08-12 10:01
Core Insights - The core viewpoint of the article is that the robotics industry is experiencing a significant convergence towards the "end-to-end" VLA (Vision-Language-Action) paradigm, which is becoming the foundational technology for embodied intelligence [1][2]. VLA Paradigm - The VLA paradigm is defined as a complete closed loop encompassing perception (Vision), understanding (Language), and action (Action), allowing robots to perform tasks in the physical world [2]. - The recent focus on "world models" is seen as an important evolution within the VLA framework, aimed at enhancing robots' precision, generalization, and cognitive abilities [2]. Efficiency and Collaboration - Current humanoid robots still lag behind human efficiency, but there is optimism as some industrial applications have achieved over 70% efficiency compared to humans, with expectations to reach 90% next year [3]. - The end-to-end architecture facilitates real-time feedback and control, breaking the traditional phase delays in recognition, planning, and execution, which is crucial for efficiency improvements [3]. - Deep collaboration between software and hardware is emphasized, with a focus on self-developed dexterous hands that have achieved stable mass production and significant cost reductions [3]. Application Pathway - The pathway to killer applications for humanoid robots is outlined as starting with B-end (business applications) before moving to household applications, with industrial scenarios serving as a necessary phase for technology validation and data accumulation [4]. - The next five years are predicted to be a critical window for the explosion of household robots, with simple forms expected to become widespread and high-net-worth families potentially being the first to adopt general-purpose humanoid robots [4]. Ecosystem Development - The company advocates for a "software defines hardware" approach, where models can adapt to different hardware, but hardware sets the upper limits of model capabilities [5]. - Open-source initiatives are highlighted as a strategic choice, with the company's humanoid robot reinforcement learning framework "Humanoid Gym" and generative large model "VPP" gaining significant attention in the community [5]. - The belief in ecosystem co-prosperity is emphasized, suggesting that improvements made by others on their work will ultimately benefit the company as well [5]. Future Aspirations - The company continues to strive for world-class achievements, with the founder expressing humility about not yet reaching the set standards [6].
商汤王晓刚:世界模型将加快AI从数字空间进入物理世界,「悟能」想做那个桥梁
机器之心· 2025-08-12 07:34
Core Viewpoint - The article discusses the emergence of embodied intelligence and the significance of the "world model" as a core component in advancing AI towards human-like intelligence, highlighting the competitive landscape in the AI industry as it evolves towards embodied intelligence [1][2]. Industry Developments - Major companies like Google, Huawei, and ByteDance are launching various embodied intelligence platforms and models, indicating a rapid evolution in this field [3]. - SenseTime, leveraging its expertise in computer vision and multi-modal large models, aims to empower the industry through its "Wuneng" embodied intelligence platform, which integrates years of technological accumulation [3][5]. Technical Challenges - The industry faces challenges such as data scarcity, difficulty in large-scale production, and the need for generalization in embodied intelligence applications [5][13]. - The reliance on computer vision expertise is seen as a potential solution to enhance the learning of world models and improve the capabilities of embodied intelligence [14]. World Model Significance - The world model is recognized as a crucial element for predicting and planning in autonomous systems, enabling robots to interact intelligently with their environments [12][17]. - SenseTime's "Kaigu" world model is designed to provide extensive data and facilitate simulation-based learning, significantly reducing data collection costs [17][20]. Platform Features - The "Wuneng" platform offers a comprehensive approach by combining first-person and third-person perspectives for robot learning, enhancing the understanding of robot behavior [27][29]. - The platform aims to address the data challenges in the industry by providing synthetic data and facilitating the development of various robotic applications [26][31]. Future Implications - As embodied intelligence matures, it is expected to transform human-robot interactions and create new social networks involving robots, enhancing their roles in daily life [36][37]. - The integration of embodied intelligence into common environments like homes and workplaces is anticipated to unlock significant value and functionality [39].
WRC 2025聚焦(2):人形机器人临近“CHATGPT时刻” 模型架构成核心突破口
Xin Lang Cai Jing· 2025-08-12 06:33
Core Insights - The humanoid robot industry is on the brink of a "ChatGPT moment," with significant breakthroughs expected within 1-2 years driven by policy and demand [1] - The average growth rate for domestic humanoid robot manufacturers and component suppliers is projected to be between 50-100% in the first half of 2025 [1] - The main challenge in the industry is not hardware but the architecture of embodied intelligent AI models, with the VLA model having inherent limitations [1][4] Short-term Outlook (1-2 years) - The domestic market is expected to maintain rapid growth due to policy subsidies and the expansion of application scenarios, with high visibility of orders for complete machines and core components [2] - Key players like Tesla and Figure AI could accelerate global supply chain division and standardization once they achieve mass production [2] Mid-term Outlook (2-5 years) - The integration of end-to-end embodied intelligent models with world models and RL Scaling Law could become the mainstream architecture, facilitating the transition from prototype to large-scale commercialization [2] - Distributed computing is anticipated to become a critical supporting infrastructure, collaborating with 5G/6G and edge computing providers [2] - Investment opportunities include hardware manufacturers entering the mass production phase, AI companies with video generation world model capabilities, and distributed computing centers and edge cloud service providers [2] Long-term Outlook (5+ years) - If end-to-end embodied intelligence and low-latency distributed computing are realized, the market for household and industrial humanoid robots could expand rapidly, potentially reaching annual shipment volumes in the millions [2] - The focus of competition is expected to shift from technological breakthroughs to cost control and ecosystem development [2] Hardware Status - Current humanoid robot hardware can meet most application needs, although optimization is still required in mass production and engineering [3] AI Model Challenges - The VLA model is considered a "foolproof architecture" but struggles with real-world interactions due to insufficient data, and its effectiveness remains limited even after reinforcement learning training [4] - The video generation/world model approach is seen as more promising, allowing for task simulation before real-world application, which may lead to faster convergence [4] RL Scaling Law - Current reinforcement learning training lacks transferability, requiring new tasks to be trained from scratch, which is inefficient [5] - Achieving a scaling law similar to that of language models could significantly accelerate the learning speed of new skills [5] Distributed Computing Trends - Humanoid robots are limited by size and power consumption, with onboard computing equivalent to a few smartphones [6] - Future developments will rely on localized distributed servers to reduce latency, ensure safety, and lower the cost of individual computing units [6]
昆仑万维:正式发布并开源「Matrix-Game 2.0」模型
Zheng Quan Shi Bao Wang· 2025-08-12 03:52
Core Insights - Kunlun Wanwei has launched an upgraded version of its self-developed world model Matrix series, named "Matrix-Game2.0," which is the first open-source solution for real-time long-sequence interactive generation in general scenarios [1] - The new version emphasizes low latency and high frame rate long-sequence interaction performance, achieving stable continuous video content generation at 25 FPS across various complex scenes, with generation duration extendable to minutes [1] - "Matrix-Game2.0" breaks down barriers between content generation and interaction, opening new possibilities for applications in virtual humans, game engines, and embodied intelligence, providing a strong technical foundation for building a universal virtual world [1] Industry Impact - The world model is considered the next frontier towards embodied intelligence and advanced spatial reasoning [2] - "Matrix-Game2.0" is expected to bring transformative impacts in areas such as training and data generation for embodied intelligence, rapid construction of virtual game worlds, and content production for film and the metaverse [2]
对话星动纪元陈建宇:世界模型是VLA的一个路径,未来5年家庭机器人会爆发
Tai Mei Ti A P P· 2025-08-12 02:00
Core Insights - The future trend in AI technology is the development of general humanoid robots, which will significantly enhance productivity and social service capabilities [2][4] - The VLA model is a broader concept that encompasses various applications of visual perception, language, and actions in robotics, with the world model being a pathway within this framework [3][4] Company Overview - Star Motion Era was established in August 2023 as an incubated project from Tsinghua University's Institute for Interdisciplinary Information Research, focusing on creating general intelligent agents in the physical world [5] - The company has completed three rounds of financing within two years, raising nearly 500 million yuan in Series A funding led by Dinghui VGC and Haier Capital [5] Product Development - Star Motion Era is developing embodied intelligent robots, integrating a general brain and ontology, with the VLA model ERA-42 unifying functions like vision, understanding, prediction, and action into an end-to-end model [5][6] - The company has introduced the Star Motion L7, a full-size bipedal humanoid robot, and the Star Motion Q5, designed for service industries, showcasing capabilities in logistics and daily tasks [6] Market Potential - The next five years are anticipated to be a breakthrough period for household robots, with simpler forms entering homes and high-net-worth individuals potentially using more advanced humanoid robots [4][9] - The humanoid robot's ultimate application is expected to be in households, although initial deployments will focus on B2B scenarios to refine technology and data accumulation [9][10] Industry Insights - Current intelligent robots achieve about 70% efficiency compared to humans, with projections to reach 90% in the coming year, indicating significant advancements in software and hardware [8] - The industry has not yet reached a "bubble" phase, as valuations have not matched those of sectors like smart vehicles, with a potential for a capital explosion once leading companies achieve scalable commercial applications [8]
昆仑万维发布并开源Matrix-Game 2.0模型
Xin Lang Cai Jing· 2025-08-12 01:22
Core Insights - Kunlun Wanwei released an upgraded version of its self-developed world model Matrix series, named "Matrix-Game 2.0" on August 12 [1] - Matrix-Game 2.0 can generate long-duration videos across different scenes while maintaining temporal consistency in actions and visuals [1] - The model supports continuous user input during interactive processes [1]
CMU最新!跨实体世界模型助力小样本机器人学习
具身智能之心· 2025-08-12 00:03
Core Viewpoint - The article discusses a novel approach to training visuomotor policies for robots by leveraging existing low-cost data sources, which significantly reduces the need for expensive real-world data collection [2][11]. Group 1: Methodology - The proposed method is based on two key insights: 1. Embodiment-agnostic world model pretraining using optic flow as an action representation, allowing for cross-embodiment data set training followed by fine-tuning with minimal target embodiment data [3][12]. 2. Latent Policy Steering (LPS) method improves policy outputs by searching for better action sequences in the latent space of the world model [3][12]. Group 2: Experimental Results - Real-world experiments showed that combining the policy with a pretrained world model from existing datasets led to significant performance improvements, with 30 demonstrations yielding over 50% relative improvement and 50 demonstrations yielding over 20% relative improvement [3][9]. Group 3: Challenges and Solutions - The article highlights the challenges posed by embodiment gaps in pretraining models across different robots, and emphasizes that world models are more suitable for cross-embodiment pretraining and fine-tuning for new embodiments [11][12].
本来决定去具身,现在有点犹豫了。。。
自动驾驶之心· 2025-08-11 12:17
Core Insights - Embodied intelligence is a hot topic this year, transitioning from previous years' silence to last year's frenzy, and now gradually cooling down as the industry realizes that embodied robots are far from being productive [1] Group 1: Industry Trends - The demand for multi-sensor fusion and positioning in robotics is significant, with a focus on SLAM and ROS technologies [3] - Many robotics companies are rapidly developing and have secured considerable funding, indicating a promising future for the sector [3] - Traditional robotics remains the main product line, despite the excitement around embodied intelligence [3] Group 2: Community and Resources - The community has established a closed loop across various fields including industry, academia, and job seeking, aiming to create a valuable exchange platform [4][6] - The community offers access to over 40 technical routes and invites industry leaders for discussions, enhancing learning and networking opportunities [6][20] - Members can freely ask questions regarding job choices or research directions, receiving guidance from experienced professionals [83] Group 3: Educational Content - Comprehensive resources for beginners and advanced learners are available, including technical stacks and learning roadmaps for autonomous driving and robotics [13][16] - The community has compiled a list of notable domestic and international research labs and companies in the autonomous driving and robotics sectors, aiding members in their academic and career pursuits [27][29]
OpenAI发布最强AI模型GPT-5;英特尔CEO发全员信:回应辞职要求;微信员工回应“改手机日期可恢复过期文件” | Q资讯
Sou Hu Cai Jing· 2025-08-10 02:43
Group 1: OpenAI and AI Models - OpenAI has officially released its latest AI model, GPT-5, which features intelligent model version switching, lower hallucination rates, enhanced coding capabilities, and personalized settings [1][3] - GPT-5 achieved state-of-the-art scores in key coding benchmarks, scoring 74.9% in SWE-bench Verified tests and 88% in Aider polyglot tests, positioning it as a strong coding collaborator [3] - The model excels in front-end coding tasks, outperforming previous versions in 70% of internal tests [3] Group 2: Intel and CEO Response - Intel CEO Pat Gelsinger addressed employees in a letter, clarifying misconceptions and indicating he will not resign, emphasizing his commitment to the company's future goals and investments [4][5] - Intel has a 56-year history of semiconductor production in the U.S. and plans to invest billions in semiconductor R&D and manufacturing, including a new fab in Arizona [4] Group 3: Microsoft Layoffs - Microsoft has initiated a new round of layoffs in Washington state, reducing approximately 40 positions, bringing the total layoffs in the state to 3,160 this year [6] - The layoffs are part of a broader plan to cut over 15,000 jobs globally, with the latest round being relatively small compared to previous months [6] Group 4: ByteDance Recruitment - ByteDance has launched its 2026 campus recruitment, offering over 5,000 positions, a significant increase from the previous year's 4,000+ offers [10] - The recruitment focuses on various roles, with a 23% increase in R&D positions, particularly in algorithms and front-end development [10] Group 5: Gaming and Service Outages - Multiple games under NetEase experienced login issues, leading to a significant outage that lasted over 2 hours, attributed to internal server problems [8][9] - The outage affected several popular titles, causing widespread player frustration and highlighting the challenges in troubleshooting large-scale service disruptions [8][9] Group 6: AI Developments - OpenAI released two open-weight AI models, GPT-oss-120b and GPT-oss-20b, which can mimic human reasoning and perform complex tasks, although they are not fully open-source [13] - Google DeepMind introduced Genie 3, a universal world model capable of generating interactive 3D environments in real-time, marking a significant advancement in world modeling technology [14][15]