VLA
Search documents
机器人浓度最高的一届春晚后,具身智能离走进千家万户还有多远?
AI前线· 2026-03-18 08:33
作者 | QCon 全球软件开发大会 策划 | Kitty 编辑 | 宇琪 具身智能作为 AI 从数字世界迈向物理现实的核心跃迁,是通往 AGI 的关键路径,却依然受困于模型 泛化性不足、数据采集难、闭环难以实现等深层难题,真正的产业落地仍举步维艰。那么,具身智能 究竟卡在哪儿了? 近日 InfoQ《极客有约》X QCon 直播栏目特别邀请 地瓜机器人算法副总裁隋伟博士 担任主持人, 和 地瓜机器人具身智能负责人何泳澔博士、乐享科技 CTO 李元庆、北京科技大学副教授彭君然博士 一起,在 2026 年 QCon 全球软件开发大会( 北京站) 即将召开之际,共同探讨具身智能落地实战 中的卡点。 部分精彩观点如下: 在 4 月 16-18 日将于北京举办的 QCon 全球软件开发大会(北京站) 上,我们特别设置了 【具身智 能与物理世界交互】 专题。该专题将深度拆解具身智能技术链路,探讨模型现状、核心挑战与机会, 加速具身智能技术研发转化与产业规模化落地。查看大会日程解锁更多精彩内容: https://qcon.infoq.cn/2026/beijing/schedule 工业场景并不需要追求通用性,如果能将某个 ...
对话星动纪元陈建宇:28 岁成为清华博导,想做万亿市值具身公司
晚点LatePost· 2026-03-13 06:06
Core Viewpoint - The company aims to achieve a trillion-dollar valuation by focusing on embodied intelligence and strategic partnerships, emphasizing that financing is not just about capital but also about acquiring resources and building competitive advantages [5][8][36]. Financing and Investment - In a span of three months, the company secured 20 billion RMB in financing, with two rounds of 10 billion RMB each, indicating a strong interest in the embodied intelligence sector [4][8]. - The CEO highlighted that many investors are now recognizing the potential of embodied intelligence, leading to increased competition for funding [9]. - The company is actively seeking strategic partners through financing, which can enhance its competitive position in the market [8][9]. Technology and Research - The company is focused on developing full-sized humanoid robots with dexterous hands and legs, aiming to enhance movement control and resource allocation for embodied intelligence models [7][21]. - The CEO believes that the industry is increasingly prioritizing the practical application of technology, moving from demonstrations to real-world tasks [11]. - The company is exploring the integration of VLA (Vision-Language-Action) and world models, which has shown to improve performance by approximately 40% [14][16]. Market Position and Strategy - The company is positioning itself as a leader in the embodied intelligence sector, with a focus on high-value market segments such as logistics and manufacturing, which have significant cost-saving potential [25][27]. - The CEO expressed that the company is not currently focused on consumer household robots, as the technology is not yet mature enough for widespread deployment [27][28]. - The company aims to achieve a sustainable high-value output rather than focusing on short-term sales volume, which is often driven by performance demonstrations [26][27]. Future Outlook - The CEO anticipates that the company will reach a trillion-dollar valuation within the next ten years, contingent on advancements in technology and market acceptance [36][38]. - The company is committed to addressing challenges in hardware durability, particularly in dexterous hands, to enhance the longevity and reliability of its products [25][37].
AI智能涌现新阶段-智驾VLA与世界模型之争
2026-03-04 14:17
Summary of Conference Call Records Industry Overview - The conference call discusses the evolution of intelligent driving paradigms, transitioning from "rules + maps" to "VLA (Vision-Language-Action) + world models" with significant advancements expected post-2025, particularly with the introduction of cost-effective reasoning models like Deepseek [1][3][4]. Key Points and Arguments Technological Advancements - The parameter scale of models is increasing, with vehicle-side models reaching tens of billions and cloud-side models approaching hundreds of billions. Xiaopeng's second-generation VLA has achieved a 33% reduction in prediction error through a 32-fold ultra-dense visual reasoning chain [1][12]. - The training paradigm is shifting from imitation learning to a combination of "pre-training + SFT (Supervised Fine-Tuning) + reinforcement learning," which enhances reasoning capabilities and addresses risk asymmetry in emergency scenarios [1][8]. Industry Dynamics - The competitive landscape is characterized by a divergence in technical paths: Huawei and NIO focus on "cloud-based world engines + vehicle-side action models," while Xiaopeng and Li Auto emphasize the VOA route, integrating LLMs (Large Language Models) into their algorithms to improve generalization in long-tail scenarios [1][2][12]. - The introduction of L2 strong standards is anticipated in Q2 2026, with external catalysts such as Tesla's Cybercab mass production and FSD (Full Self-Driving) entering China, indicating a nearing commercial breakthrough for L3/L4 [1][13]. Model Development and Training - The evolution of general AI models since 2017 has been marked by significant milestones, including the introduction of the Transformer architecture and the integration of multimodal capabilities, leading to enhanced reasoning abilities [4][5]. - The scaling law emphasizes the critical role of model size, data, and computational power in enhancing capabilities, which is also applicable to intelligent driving models [4][6]. Future Projections - By 2026, key players are expected to focus on VLA-type large models, with significant advancements in the integration of visual, language, and action components within a unified framework [9][10][12]. - The world model's role is to simulate and predict future states of the physical environment, enhancing the vehicle's ability to anticipate and respond to complex scenarios [11][12]. Additional Important Insights - The transition from traditional end-to-end systems to VLA and world models is driven by the need for better understanding of physical laws and improved decision-making capabilities in complex environments [7][10]. - The industry is witnessing a shift towards more integrated models that combine perception, reasoning, and action generation, with a focus on enhancing the interpretability and robustness of outputs [10][11]. - Key players are diversifying their strategies, with Xiaopeng focusing on enhancing driving experience through its second-generation VOA, while Huawei and NIO are leaning towards world model approaches [12][13]. Investment Focus - Investment opportunities are concentrated in areas such as LiDAR technology (e.g., Hesai), high-level autonomous driving chip localization (e.g., Horizon Robotics), and the commercialization of Robotaxi services (e.g., Pony.ai, WeRide) [2][13].
印奇挂帅后,阶跃星辰要做大模型第三股?
2 1 Shi Ji Jing Ji Bao Dao· 2026-02-27 12:25
Core Viewpoint - AI model company Jumpshare is considering an IPO on the Hong Kong Stock Exchange, aiming to raise approximately $500 million, shortly after completing a record-breaking B+ round financing of over 5 billion yuan [1] Group 1: Company Developments - Jumpshare's new chairman, Yin Qi, has a strong background in technology and strategy, focusing on organizational change and commercialization [1] - The company has developed a series of models called the Step series, which cover language, multi-modal, and reasoning capabilities, and has open-sourced several leading multi-modal models [1][2] - The company plans to focus on creating AI agents for smart terminal devices, particularly in automotive, mobile, and IoT applications, with API call volume expected to grow nearly 170% over three consecutive quarters by the end of 2025 [3] Group 2: Market Context - The competition in the domestic AI model market has shifted from purely parameter scale and general capabilities to a focus on application depth and industry integration [3] - The Hong Kong stock market is experiencing an unprecedented wave of AI listings, with several companies successfully going public and achieving significant market valuations [4][5] - Jumpshare's business model combines "end + cloud" revenue streams, charging for licenses on the device side and consumption on the cloud side, which is seen as sustainable in the current market [6]
为什么春晚的机器人不“僵”了?具身智能正在经历一场大脑进化
机器人大讲堂· 2026-02-19 00:00
Core Viewpoint - The evolution of humanoid robots is moving from performance in controlled environments to practical applications in real-world scenarios, emphasizing the need for robots to understand and predict physical interactions [5][6][26]. Group 1: Humanoid Robot Performance - The performance of humanoid robots at the Spring Festival Gala has shown significant advancements, with previous years featuring coordinated movements and complex formations [1][2]. - This year's robots demonstrated a level of agility and responsiveness that suggests a breakthrough in their control algorithms and hardware integration [5]. Group 2: Challenges in Real-World Applications - Despite advancements, the transition from staged performances to real-world applications remains challenging, as robots must navigate unpredictable environments like factories and homes [5][6]. - Current humanoid robots lack the ability to understand physical laws, which limits their effectiveness in dynamic settings [13][22]. Group 3: VLA Paradigm and Industry Anxiety - The dominant paradigm for embodied intelligence is the Visual-Language-Action (VLA) model, which is currently highly competitive [7]. - Companies like Ant Group and Horizon are developing advanced VLA models that enhance spatial awareness and adaptability across different robotic configurations [8][10]. Group 4: Transition to World Models - The industry is recognizing the need to evolve from VLA to embodied world models that allow robots to simulate and predict physical interactions [14][15]. - Ant Group's LingBot-World is a notable example, providing a high-fidelity simulation environment for robots to learn and adapt without real-world consequences [16]. Group 5: Impact on Industry Scalability - The shift from action mapping to physical pre-simulation is expected to reduce the data requirements for training new skills significantly, from thousands of examples to just 30-50 [23]. - Robots equipped with predictive capabilities have shown a high success rate in complex tasks, achieving over 91% in multi-task scenarios [24]. Group 6: Conclusion and Future Directions - The journey of humanoid robots is transitioning from mere demonstrations to practical applications, with a focus on understanding physical laws and improving operational capabilities in real-world environments [26][28]. - The ongoing debate about the best approaches for robotic intelligence continues, with various strategies being explored to enhance performance in unpredictable settings [27].
世界模型,是自动驾驶的终极答案吗?
3 6 Ke· 2026-02-05 04:30
Core Insights - The concept of "world model" has become a trendy term in the intelligent driving sector, with various companies like Xpeng, NIO, and Huawei adopting different terminologies for similar technologies [2][3][4] - World models are seen as a crucial component in the development of "physical world AI," enabling artificial intelligence to understand and replicate real-world dynamics [3][4] - The current application of world models in the intelligent driving industry is primarily cloud-based, with no direct implementation in vehicles yet [6] Group 1: Industry Trends - The shift from rule-based systems to AI-driven models in intelligent driving has led to a unified approach, where perception, prediction, and planning are integrated into a single network [7] - Despite the advancements, the transition to end-to-end models has revealed shortcomings in traditional simulation tools, necessitating the development of more sophisticated simulation environments [10][11] - The introduction of world models aims to address the limitations of existing simulators by providing a more comprehensive and realistic virtual environment for testing and validation [10][11] Group 2: Technical Challenges - The effectiveness of AI-driven models is hindered by the "black box" nature of end-to-end systems, making it difficult to diagnose errors and ensure reliability [9][10] - Current world models in the industry are still in the early stages, with limitations in generating realistic and diverse scenarios for training purposes [16][18] - The challenge lies in ensuring that generated scenarios accurately reflect real-world conditions, as inaccuracies can lead to poor model performance in practical applications [17][18] Group 3: Future Directions - Companies are exploring various approaches to enhance world models, with some opting for more controllable methods like 3D Gaussian reconstruction [14][15] - The ultimate goal is to develop world models that can support decision-making processes in vehicles, moving beyond their current use as training and validation tools [19] - Achieving a high level of accuracy and reliability in world models is essential for their deployment in real-world driving scenarios, which remains a significant hurdle for the industry [19]
见谈|地平线吕鹏:端到端是基石,做不好端到端就做不好VLA
2 1 Shi Ji Jing Ji Bao Dao· 2026-02-03 08:04
Core Viewpoint - The interview highlights the current divergence in smart driving technology routes, emphasizing that there is no need for terminology anxiety as various approaches like end-to-end, VLA, WA, and VA are fundamentally aligned in their technical architecture [1] Group 1: Technology Perspectives - The market should not be concerned about the different terminologies in smart driving technology, as they are all fundamentally based on an end-to-end architecture [1] - End-to-end is considered the cornerstone for integrating new modalities and enhancing product performance, indicating its critical role in the development of smart driving technologies [1] - If end-to-end is not executed well, it will hinder the effectiveness of VLA, suggesting a strong interdependence among these technological approaches [1]
五一视界(6651.HK)物理AI的“左右互搏”:世界模型与VLA的闭环进化论
Zhong Jin Zai Xian· 2026-01-28 02:39
Core Insights - AI technology is experiencing three major breakthroughs: the evolution from chatbots to intelligent agents, the lowering of entry barriers through open-source models, and the understanding of the physical world through physical AI [1] - Physical AI is recognized as the next wave of AI development, showcasing its potential in understanding complex scientific principles [1] Group 1: VLA and World Models - The VLA (Vision-Language-Action) model and world models are emerging as a dual-model paradigm to address the data scarcity and safety issues in physical AI [2][3] - World models can generate infinite simulation data at a low cost, allowing VLA to learn from various scenarios without the risks associated with real-world data collection [3] - The integration of VLA and world models is seen as the optimal solution for enhancing embodied intelligence in physical AI [3] Group 2: Development Stages - The development of VLA and world models can be structured into four stages: cold start, interface alignment, training in simulated environments, and real-world transfer and calibration [4][5] - The cold start phase involves training a basic VLA model using existing robot datasets while the world model is pre-trained on vast amounts of video data [4] - The interface alignment phase focuses on mapping VLA's action outputs to the world model's input conditions to simulate the resulting scenarios [4] - In the training phase, VLA operates within the simulated environments generated by the world model, allowing for extensive reinforcement learning without physical wear on robotic components [4] Group 3: Addressing Challenges - Generative models often produce inconsistent outputs, leading to incorrect physical assumptions; introducing 3D geometry and material constraints can mitigate this issue [6] - A reward model can be implemented to evaluate the success of tasks in generated scenarios, providing feedback to the VLA [6] - The speed of world model predictions is crucial for training efficiency; techniques like latent consistency models can enhance prediction speed by focusing on feature changes rather than pixel-level details [6] Group 4: Data Sharing and Best Practices - The architecture of world models is evolving, but the necessity for real and synthetic data remains constant [7] - Sharing visual encoders between VLA and world models can optimize memory usage and ensure synchronized understanding of the environment [7] - Generating counterfactual data allows VLA to learn from hypothetical failure scenarios, improving robustness and reducing real-world testing costs [7] Group 5: Towards General Artificial Intelligence - The future of world models involves generating interactive 4D environments, enabling VLA to train in dynamic settings rather than static ones [8] - The integration of fast and slow systems within AI, where VLA handles real-time responses and world models manage long-term planning, is a key goal for advancements in autonomous systems [8] - Ultimately, VLA and world models may converge into a unified model capable of predicting both actions and future states, aligning with the vision of AI understanding physical laws [9][10]
从 DeepMind 到投身具身智能,王佳楠:算法最终还是要服务真实世界|万有引力
AI科技大本营· 2026-01-23 10:09
以下文章来源于CSDN ,作者万有引力 CSDN . 成就一亿技术人 对话 | 唐小引 嘉宾 | 王佳 楠 责编 | 梦依丹 出品 | CSDN(ID:CSDNnews) 通往 AGI 的终点,是代码,还是身体? 在王佳楠看来,答案明确指向了——具身智能。 左:王佳楠,右:唐小引 在 2025 全球机器学习技术大会现场 , CSDN &《新程序员》执行总编唐小引 与星尘智能副总 裁、前 DeepMind 研究员王佳楠展开了一次深入对 话。从 AGI 的终极想象,到具身智能的现实瓶颈,从快慢系统的工程逻辑,到通用机器人的时间表与开发者应有的信念,她给 出了一个既冷静、也充 满长期主义色彩的答案。王佳楠在采访中提到的核心观点有: 欢迎 收听音频播客,如有兴趣观看完整视频,可在文末获取 她曾在牛津大学完成学业,加入 DeepMind,从事强化学习与持续学习研究,亲历了 AlphaStar 等标志性项目的诞生,也在国内生成式 AI 尚处早期 阶段时,参与过统一生成框架的探索,走在 AIGC 爆发之前的科研前沿。无论是在"纯算法"的巅峰,还是在生成式模型的起点,她都站在浪潮内部。 2024 年,她加入星尘智能,选择直面 ...
2025年几家自动驾驶公司的采访总结
自动驾驶之心· 2026-01-22 09:07
Core Algorithm - The industry has shifted towards end-to-end solutions, moving away from modular approaches, at least in public discourse [1] - The introduction of world models is prevalent, with some companies using them to generate training data, while others incorporate them into end-to-end models to enhance performance [1][8] - There is a divergence in opinions regarding the necessity of language models (VLA) in autonomous driving, with some companies arguing that language is not essential for driving tasks [1][11] Simulation and Infrastructure - The closed-loop systems have evolved from data-driven to simulation testing and training loops [2] - 3DGS is highlighted as a crucial technology for building simulation environments, as emphasized by Tesla at CVPR 2025 [5] - Infrastructure is critical, with companies like Xiaomi and Li Auto noting its benefits for development efficiency [3][14] Organizational Capability - Organizational ability is vital, as large autonomous driving teams face significant management challenges [4] - Team culture and collaboration are emphasized as essential for overcoming complex technical and management issues [5] Technical Choices Comparison - A comparison of various companies' technical choices reveals differing approaches to core technologies and the role of world models and simulation tools [9] - Companies like Li Auto advocate for a training loop that evolves from imitation to self-learning, while NVIDIA emphasizes interpretability and reasoning in AI [9] Key Non-Core Factors - R&D infrastructure and engineering efficiency are crucial for the success of autonomous driving technologies [14] - Simulation and synthetic data are becoming essential for addressing corner cases that real-world data cannot cover [14] - The scale of computing power and chip adaptation is critical, as autonomous driving is not just a software issue but also a hardware challenge [15] User Experience and Safety - User experience and safety are paramount, with companies like Xiaomi stressing the importance of balancing advanced technology with user concerns [17] - The need for a dual-stack safety mechanism is highlighted, ensuring that even aggressive end-to-end models have a fallback to traditional rule-based systems for safety [19]