VLA
Search documents
印奇挂帅后,阶跃星辰要做大模型第三股?
2 1 Shi Ji Jing Ji Bao Dao· 2026-02-27 12:25
21世纪经济报道记者 董静怡 近日,有报道称AI大模型公司阶跃星辰考虑在港交所IPO,计划筹集约5亿美元。21世纪经济报道记者 就此消息向阶跃星辰核实,截至发稿暂未收到回复。 此时距离这家公司完成超50亿元B+轮融资刷新行业融资纪录,仅仅过去一个月。 也是一个月前,旷视科技联合创始人、千里科技董事长印奇正式出任阶跃星辰董事长,与CEO姜大昕、 首席科学家张祥雨、CTO朱亦博组成全新核心管理团队。印奇给自己的定位是:负责战略方向与技术方 向制定,抓组织变革与技术攻坚,以及他更擅长的那部分,终端商业化。 从融资、印奇挂帅到IPO传闻传出,都在这一个月内。这家被视为大模型"六小虎"中行事相对低调的公 司,突然按下了加速键。 阶跃星辰的技术路径,一直带有鲜明的创始人烙印。 在2025年,公司的业务方向更加聚焦,将落地重心聚焦于为智能终端设备打造AI智能体(Agent),重 点布局汽车、手机、物联网设备等关键应用场景。数据显示,截至2025年年底,阶跃星辰的终端智能体 的API调用量连续三个季度增长近170%。 CEO姜大昕出身微软,是典型的技术派,信奉"多模态是通往AGI的必经之路"。公司成立仅两年,便构 建起覆盖语 ...
为什么春晚的机器人不“僵”了?具身智能正在经历一场大脑进化
机器人大讲堂· 2026-02-19 00:00
Core Viewpoint - The evolution of humanoid robots is moving from performance in controlled environments to practical applications in real-world scenarios, emphasizing the need for robots to understand and predict physical interactions [5][6][26]. Group 1: Humanoid Robot Performance - The performance of humanoid robots at the Spring Festival Gala has shown significant advancements, with previous years featuring coordinated movements and complex formations [1][2]. - This year's robots demonstrated a level of agility and responsiveness that suggests a breakthrough in their control algorithms and hardware integration [5]. Group 2: Challenges in Real-World Applications - Despite advancements, the transition from staged performances to real-world applications remains challenging, as robots must navigate unpredictable environments like factories and homes [5][6]. - Current humanoid robots lack the ability to understand physical laws, which limits their effectiveness in dynamic settings [13][22]. Group 3: VLA Paradigm and Industry Anxiety - The dominant paradigm for embodied intelligence is the Visual-Language-Action (VLA) model, which is currently highly competitive [7]. - Companies like Ant Group and Horizon are developing advanced VLA models that enhance spatial awareness and adaptability across different robotic configurations [8][10]. Group 4: Transition to World Models - The industry is recognizing the need to evolve from VLA to embodied world models that allow robots to simulate and predict physical interactions [14][15]. - Ant Group's LingBot-World is a notable example, providing a high-fidelity simulation environment for robots to learn and adapt without real-world consequences [16]. Group 5: Impact on Industry Scalability - The shift from action mapping to physical pre-simulation is expected to reduce the data requirements for training new skills significantly, from thousands of examples to just 30-50 [23]. - Robots equipped with predictive capabilities have shown a high success rate in complex tasks, achieving over 91% in multi-task scenarios [24]. Group 6: Conclusion and Future Directions - The journey of humanoid robots is transitioning from mere demonstrations to practical applications, with a focus on understanding physical laws and improving operational capabilities in real-world environments [26][28]. - The ongoing debate about the best approaches for robotic intelligence continues, with various strategies being explored to enhance performance in unpredictable settings [27].
世界模型,是自动驾驶的终极答案吗?
3 6 Ke· 2026-02-05 04:30
Core Insights - The concept of "world model" has become a trendy term in the intelligent driving sector, with various companies like Xpeng, NIO, and Huawei adopting different terminologies for similar technologies [2][3][4] - World models are seen as a crucial component in the development of "physical world AI," enabling artificial intelligence to understand and replicate real-world dynamics [3][4] - The current application of world models in the intelligent driving industry is primarily cloud-based, with no direct implementation in vehicles yet [6] Group 1: Industry Trends - The shift from rule-based systems to AI-driven models in intelligent driving has led to a unified approach, where perception, prediction, and planning are integrated into a single network [7] - Despite the advancements, the transition to end-to-end models has revealed shortcomings in traditional simulation tools, necessitating the development of more sophisticated simulation environments [10][11] - The introduction of world models aims to address the limitations of existing simulators by providing a more comprehensive and realistic virtual environment for testing and validation [10][11] Group 2: Technical Challenges - The effectiveness of AI-driven models is hindered by the "black box" nature of end-to-end systems, making it difficult to diagnose errors and ensure reliability [9][10] - Current world models in the industry are still in the early stages, with limitations in generating realistic and diverse scenarios for training purposes [16][18] - The challenge lies in ensuring that generated scenarios accurately reflect real-world conditions, as inaccuracies can lead to poor model performance in practical applications [17][18] Group 3: Future Directions - Companies are exploring various approaches to enhance world models, with some opting for more controllable methods like 3D Gaussian reconstruction [14][15] - The ultimate goal is to develop world models that can support decision-making processes in vehicles, moving beyond their current use as training and validation tools [19] - Achieving a high level of accuracy and reliability in world models is essential for their deployment in real-world driving scenarios, which remains a significant hurdle for the industry [19]
见谈|地平线吕鹏:端到端是基石,做不好端到端就做不好VLA
2 1 Shi Ji Jing Ji Bao Dao· 2026-02-03 08:04
Core Viewpoint - The interview highlights the current divergence in smart driving technology routes, emphasizing that there is no need for terminology anxiety as various approaches like end-to-end, VLA, WA, and VA are fundamentally aligned in their technical architecture [1] Group 1: Technology Perspectives - The market should not be concerned about the different terminologies in smart driving technology, as they are all fundamentally based on an end-to-end architecture [1] - End-to-end is considered the cornerstone for integrating new modalities and enhancing product performance, indicating its critical role in the development of smart driving technologies [1] - If end-to-end is not executed well, it will hinder the effectiveness of VLA, suggesting a strong interdependence among these technological approaches [1]
五一视界(6651.HK)物理AI的“左右互搏”:世界模型与VLA的闭环进化论
Zhong Jin Zai Xian· 2026-01-28 02:39
Core Insights - AI technology is experiencing three major breakthroughs: the evolution from chatbots to intelligent agents, the lowering of entry barriers through open-source models, and the understanding of the physical world through physical AI [1] - Physical AI is recognized as the next wave of AI development, showcasing its potential in understanding complex scientific principles [1] Group 1: VLA and World Models - The VLA (Vision-Language-Action) model and world models are emerging as a dual-model paradigm to address the data scarcity and safety issues in physical AI [2][3] - World models can generate infinite simulation data at a low cost, allowing VLA to learn from various scenarios without the risks associated with real-world data collection [3] - The integration of VLA and world models is seen as the optimal solution for enhancing embodied intelligence in physical AI [3] Group 2: Development Stages - The development of VLA and world models can be structured into four stages: cold start, interface alignment, training in simulated environments, and real-world transfer and calibration [4][5] - The cold start phase involves training a basic VLA model using existing robot datasets while the world model is pre-trained on vast amounts of video data [4] - The interface alignment phase focuses on mapping VLA's action outputs to the world model's input conditions to simulate the resulting scenarios [4] - In the training phase, VLA operates within the simulated environments generated by the world model, allowing for extensive reinforcement learning without physical wear on robotic components [4] Group 3: Addressing Challenges - Generative models often produce inconsistent outputs, leading to incorrect physical assumptions; introducing 3D geometry and material constraints can mitigate this issue [6] - A reward model can be implemented to evaluate the success of tasks in generated scenarios, providing feedback to the VLA [6] - The speed of world model predictions is crucial for training efficiency; techniques like latent consistency models can enhance prediction speed by focusing on feature changes rather than pixel-level details [6] Group 4: Data Sharing and Best Practices - The architecture of world models is evolving, but the necessity for real and synthetic data remains constant [7] - Sharing visual encoders between VLA and world models can optimize memory usage and ensure synchronized understanding of the environment [7] - Generating counterfactual data allows VLA to learn from hypothetical failure scenarios, improving robustness and reducing real-world testing costs [7] Group 5: Towards General Artificial Intelligence - The future of world models involves generating interactive 4D environments, enabling VLA to train in dynamic settings rather than static ones [8] - The integration of fast and slow systems within AI, where VLA handles real-time responses and world models manage long-term planning, is a key goal for advancements in autonomous systems [8] - Ultimately, VLA and world models may converge into a unified model capable of predicting both actions and future states, aligning with the vision of AI understanding physical laws [9][10]
从 DeepMind 到投身具身智能,王佳楠:算法最终还是要服务真实世界|万有引力
AI科技大本营· 2026-01-23 10:09
以下文章来源于CSDN ,作者万有引力 CSDN . 成就一亿技术人 对话 | 唐小引 嘉宾 | 王佳 楠 责编 | 梦依丹 出品 | CSDN(ID:CSDNnews) 通往 AGI 的终点,是代码,还是身体? 在王佳楠看来,答案明确指向了——具身智能。 左:王佳楠,右:唐小引 在 2025 全球机器学习技术大会现场 , CSDN &《新程序员》执行总编唐小引 与星尘智能副总 裁、前 DeepMind 研究员王佳楠展开了一次深入对 话。从 AGI 的终极想象,到具身智能的现实瓶颈,从快慢系统的工程逻辑,到通用机器人的时间表与开发者应有的信念,她给 出了一个既冷静、也充 满长期主义色彩的答案。王佳楠在采访中提到的核心观点有: 欢迎 收听音频播客,如有兴趣观看完整视频,可在文末获取 她曾在牛津大学完成学业,加入 DeepMind,从事强化学习与持续学习研究,亲历了 AlphaStar 等标志性项目的诞生,也在国内生成式 AI 尚处早期 阶段时,参与过统一生成框架的探索,走在 AIGC 爆发之前的科研前沿。无论是在"纯算法"的巅峰,还是在生成式模型的起点,她都站在浪潮内部。 2024 年,她加入星尘智能,选择直面 ...
2025年几家自动驾驶公司的采访总结
自动驾驶之心· 2026-01-22 09:07
Core Algorithm - The industry has shifted towards end-to-end solutions, moving away from modular approaches, at least in public discourse [1] - The introduction of world models is prevalent, with some companies using them to generate training data, while others incorporate them into end-to-end models to enhance performance [1][8] - There is a divergence in opinions regarding the necessity of language models (VLA) in autonomous driving, with some companies arguing that language is not essential for driving tasks [1][11] Simulation and Infrastructure - The closed-loop systems have evolved from data-driven to simulation testing and training loops [2] - 3DGS is highlighted as a crucial technology for building simulation environments, as emphasized by Tesla at CVPR 2025 [5] - Infrastructure is critical, with companies like Xiaomi and Li Auto noting its benefits for development efficiency [3][14] Organizational Capability - Organizational ability is vital, as large autonomous driving teams face significant management challenges [4] - Team culture and collaboration are emphasized as essential for overcoming complex technical and management issues [5] Technical Choices Comparison - A comparison of various companies' technical choices reveals differing approaches to core technologies and the role of world models and simulation tools [9] - Companies like Li Auto advocate for a training loop that evolves from imitation to self-learning, while NVIDIA emphasizes interpretability and reasoning in AI [9] Key Non-Core Factors - R&D infrastructure and engineering efficiency are crucial for the success of autonomous driving technologies [14] - Simulation and synthetic data are becoming essential for addressing corner cases that real-world data cannot cover [14] - The scale of computing power and chip adaptation is critical, as autonomous driving is not just a software issue but also a hardware challenge [15] User Experience and Safety - User experience and safety are paramount, with companies like Xiaomi stressing the importance of balancing advanced technology with user concerns [17] - The need for a dual-stack safety mechanism is highlighted, ensuring that even aggressive end-to-end models have a fallback to traditional rule-based systems for safety [19]
VLA任务的成本马上被干到了白菜价......
具身智能之心· 2026-01-20 09:30
Core Viewpoint - The cost of robotic arms has significantly decreased, with prices now below 5000 yuan, making them more accessible for various VLA tasks [1][2]. Group 1: Cost Trends - Two years ago, the price for a single robotic arm for VLA tasks was over 30,000 yuan, which dropped to around 15,000 yuan last year, and now it is below 5,000 yuan [2]. - The reduction in costs allows for easier implementation of various VLA tasks such as pi0 and pi0.5 [2]. Group 2: Challenges for Beginners - Many beginners face difficulties in replicating VLA tasks due to high costs and lack of effective data collection methods [3][4]. - A significant amount of time is wasted by beginners on troubleshooting and overcoming obstacles in data collection and model training [4]. Group 3: Educational Initiatives - The company has developed a comprehensive course aimed at addressing the challenges faced by beginners in the VLA field, covering hardware, data collection, algorithms, and practical experiments [9][14]. - The course includes a free SO-100 robotic arm for participants, enhancing hands-on learning [19]. Group 4: Target Audience and Requirements - The course is designed for individuals seeking practical experience in VLA, including students and professionals transitioning from other fields [26]. - Participants are expected to have a foundational knowledge of Python and Pytorch, as well as experience in debugging and data collection with real machines [26].
2026,中国智驾驶入决赛圈
3 6 Ke· 2026-01-15 03:46
Core Insights - Tesla's Full Self-Driving (FSD) technology has demonstrated its capability by completing a 4,397 km journey across the U.S. without human intervention, showcasing its stability in complex driving conditions [1] - The competition in the autonomous driving sector is intensifying, particularly in China, where several companies are facing significant challenges, leading to a consolidation of players [1] - The industry consensus is that by 2026, only two to three companies will emerge as leaders in the autonomous driving space [1] Group 1: Tesla's Technological Advancements - Tesla's FSD V12 and V14 represent critical turning points, with V12 proving the feasibility of a model-driven end-to-end approach, prompting the industry to shift towards this model [2] - FSD V14 addresses the limitations of previous versions by integrating a reasoning capability, leading to a potential unification of L2 and L4 development paradigms [2] Group 2: Competitive Landscape in China - Companies like Horizon Robotics, Zhaojun Technology, and WeRide are emerging as strong competitors, with Horizon completing a significant technology architecture switch and launching its HSD model [3][4] - WeRide has shifted focus from L4 Robotaxi to L2+ solutions, achieving rapid development and production timelines [3] - Zhaojun Technology has adopted an aggressive strategy by completely overhauling its previous technology framework to focus on end-to-end solutions [4] Group 3: Industry Trends and Challenges - The industry is witnessing a shift from rule-based to model-driven approaches, with VLA (Vision-Language-Action) models gaining traction among manufacturers like Xpeng and Li Auto [5][6] - Huawei is taking a different approach by rejecting VLA in favor of WA (World Action) models, emphasizing the need for a more streamlined process [6] - The competition is expected to intensify as companies strive to secure sufficient data and funding to support their autonomous driving technologies [10][11] Group 4: Future Outlook - The autonomous driving sector is entering a phase of stricter regulations and increased competition, with a focus on L2+ and urban navigation assistance (NOA) as immediate priorities for many companies [12] - By 2026, the market is anticipated to narrow down to a few key players, with Huawei currently leading the pack, followed by Horizon, Momenta, and WeRide [12][13]
VLA学习“成本太高”的问题,正在被解决......
具身智能之心· 2026-01-14 09:00
Core Viewpoint - The article discusses the challenges faced by beginners in the field of VLA (Vision-Language Alignment) tasks due to high costs and the complexity of data collection and model training, while introducing a comprehensive course aimed at addressing these issues and providing practical skills for aspiring professionals in the field [3][5][9]. Group 1: Challenges in VLA Tasks - Many beginners express frustration over the high costs associated with mechanical arms and sensors, which can exceed 15,000 yuan, making it difficult for self-learners or those without equipment to engage in VLA tasks [3]. - Open-source low-cost robotic arms are available, but many beginners struggle to achieve effective results due to difficulties in data collection and model training [4]. - A significant amount of time is wasted by beginners on troubleshooting and overcoming obstacles in data collection, model training, and deployment, particularly with complex models like π0 and π0.5 [5]. Group 2: Course Offerings - The "Embodied Intelligence Heart" platform has developed a course that replicates methods such as ACT, GR00T, π0, and π0.5, aimed at helping individuals who lack access to expensive equipment and do not know how to get started [8]. - The course includes practical tutorials and is designed to assist students in effectively learning VLA techniques, even if they have access to real machines but are unsure how to utilize them [9]. - The curriculum covers a wide range of topics, including hardware for robotic arms, data collection, VLA algorithms, evaluation, simulation, deployment of mainstream VLA models, and various real machine experiments [14]. Group 3: Course Details and Target Audience - The course is the most comprehensive offering from "Embodied Intelligence Heart," combining both software and hardware aspects to facilitate effective learning [15]. - It is targeted at individuals seeking practical experience and projects in the VLA field, including those transitioning from traditional computer vision, robotics, or autonomous driving [25]. - Participants will receive a SO-100 robotic arm as part of the course, which includes both teaching and execution arms, enhancing hands-on learning [18].