VLA（视觉 - 语言 - 动作）模型 - filings, earnings calls, financial reports, news

VLA（视觉 - 语言 - 动作）模型

Search documents

晚点Auto· 2026-01-29 14:51

Core Viewpoint - The article discusses the emergence of a new company, It Stone, founded by Chen Yilun, focusing on embodied intelligence and its unique approach to data collection and model development, diverging from mainstream methods like VLA (Vision-Language-Action) [4][38]. Group 1: Company Overview - It Stone has raised a record $1.2 billion in angel funding, marking a significant milestone in China's embodied intelligence sector [4]. - The company aims to develop its own model, AWE (AI World Engine), which emphasizes the expression of physical quantities and world information rather than relying on visual and language models [4][38]. Group 2: Data Collection Strategy - It Stone has developed wearable devices for data collection, allowing workers to gather real-world task data without the high costs associated with remote operation methods [5][24]. - The company has already collected approximately 100,000 hours of data, with plans to significantly increase this volume in the coming year [31]. Group 3: Technical Insights - Chen Yilun emphasizes that the current bottleneck in embodied intelligence is data acquisition, which is challenging and expensive compared to the vast amounts of data available for language models [15]. - The company’s approach to data collection is designed to be more efficient and scalable, aiming for a foundational scale of at least 10 million hours of data for effective training [27][28]. Group 4: Market Position and Future Outlook - It Stone is positioning itself to address complex tasks in industrial manufacturing, particularly in areas like wire harness assembly, which traditional robots struggle to perform [41]. - The company believes that the embodied intelligence sector is on the verge of significant advancements, with expectations for scaling and performance improvements in the coming years [40].

晚点Auto· 2026-01-22 16:15

Core Viewpoint - Li Auto is restarting its humanoid robot development, indicating a strategic shift towards integrating robotics into its product offerings, with a focus on creating "embodied intelligence" products that redefine vehicles as robots with sensory and cognitive capabilities [3][4][6]. Group 1: Humanoid Robot Development - Li Auto has opened several humanoid robot R&D positions, signaling a renewed commitment to this area after previously pausing due to technological and supply chain challenges [3]. - The company aims to develop a wheeled humanoid robot for specific industrial tasks, such as "screw tightening" in manufacturing [3]. - The recruitment for the humanoid robot project covers all aspects from core components to system integration, including roles for embedded software engineers and mechanical design engineers [3][7]. Group 2: Strategic Positioning - Li Auto's founder, Li Xiang, has expressed that the future of vehicles lies in their evolution into robots, emphasizing the need for vehicles to possess sensory and cognitive functions [4][6]. - The company is positioning itself against competitors like Tesla and XPeng, which are also pursuing advanced humanoid robotics, based on the premise that robots must closely resemble humans to effectively utilize human tools and skills [5]. - The complexity of humanoid robots is highlighted, with Li Auto's approach requiring a significant investment in talent and technology, as the development of humanoid robots is more challenging than that of electric vehicles [4][5]. Group 3: Market Dynamics and Financial Considerations - Li Auto has a substantial cash reserve compared to other domestic automakers, allowing it to invest significantly in emerging business areas like robotics [5][6]. - The competitive landscape for talent is intense, with many skilled professionals leaving to start their own ventures in embodied intelligence, which poses a challenge for Li Auto in building its R&D team [6]. - The market response to Li Auto's renewed focus on robotics has been positive, with stock prices reflecting investor optimism about the potential of embodied intelligence [8].

具身智能之心· 2026-01-19 09:30

但现实很骨感：VLA 模型的性能上限，往往取决于你数据采集的质量。很多同学在复现 π0、GR00T 或 ACT 时，最常吐槽的就是：" 数据太难采了！ " 具身智能的本质是"本体交互"。如果没有高质量的遥操作数据，再强大的 VLA 算法也只是空中楼阁。为了帮助大家节省"踩坑"时间，具身智能之心正式推出国内首个《具身数采与遥操算法全栈课程》。这门课不只讲理论，更注重"手感"与"实战"。我们将带你从零 DIY 遥操硬件，打通数据采集的全链路。最近在具身智能圈子里，VLA（视觉-语言-动作）模型无疑是流量中心。无论是学术界的论文爆发，还是工业界的 HR 急招，VLA 都被顶到了风口浪尖。 ★ 课程大纲：更多内容，欢迎咨询小助理仿真生成数据不真实：仿真与真机的 Gap（Sim2Real）巨大，模型在仿真里跑得溜，真机上一碰就碎。遥操手感极差：动作生涩、延迟高，采集出来的轨迹充满噪声，模型根本学不会。硬件门槛高：专业级遥操设备动辄数万，普通学生和初创团队难以负担。技术全链路断层：知道怎么控机械臂，但不知道怎么把数据格式对齐 LeRobot 或 RT-X 格式。 | 遥操作概述与基础 | ...

拒绝垃圾数据，如何高效、高质量的采集具身数据？

具身智能之心· 2026-01-10 01:03

Core Insights - The VLA (Vision-Language-Action) model is currently a focal point in the field of embodied intelligence, attracting significant attention in both academia and industry [1][2] - The performance of VLA models is heavily dependent on the quality of data collection, with many practitioners facing challenges in data acquisition [2][3] Course Overview - The course titled "Full-Stack Course on Data Collection and Remote Operation Algorithms for Embodied Intelligence" aims to provide practical skills in DIY remote operation hardware and data collection [3] - The curriculum emphasizes hands-on experience and practical applications rather than just theoretical knowledge [3][8] Challenges in Remote Operation - There is a significant gap between simulation and real-world applications (Sim2Real), leading to poor performance when models trained in simulation are applied to real machines [5] - Remote operation often suffers from poor tactile feedback, high latency, and noisy trajectory data, making it difficult for models to learn effectively [5] - High costs associated with professional remote operation equipment pose a barrier for students and startups [5] Course Highlights - The course combines both simulation and real-world applications, covering data collection in the MuJoCo simulation environment and practical operations [7][8] - Introduction of the Ringo hardware solution for hand-held remote operation, which addresses issues of perspective and control alignment [9] - Comprehensive coverage of various scenarios, from single-arm to full-body motion capture, including dual-arm collaboration and force feedback data collection [10][12] Detailed Curriculum - The course includes modules on remote operation basics, data collection methods, and advanced topics such as TCP mapping and joint isomorphic remote operation [6][14][16] - It also covers the principles of motion capture systems, including sensor layout and coordinate remapping [17] Target Audience - The course is designed for job seekers in the embodied intelligence field, researchers in VLA or robotics, developers transitioning from other tech fields, and hardware enthusiasts interested in DIY solutions [26]