Workflow
VLA模型
icon
Search documents
加州大学最新!做什么?教VLA模型拒绝不可能的任务
具身智能之心· 2025-08-25 06:00
Core Viewpoint - The article discusses the development and performance of the VLA model in handling robotic tasks, particularly focusing on its ability to detect and respond to false premise instructions through the proposed IVA framework, which enhances the model's robustness in real-world applications [4][10]. Group 1: Problem Identification and Solution - The VLA model excels in various robotic tasks by relying on multimodal inputs, but it struggles with false premise instructions, which involve commands that reference non-existent objects or conditions [6][10]. - The IVA framework is introduced to address this issue, enabling the model to detect unexecutable commands, clarify or correct them through language, and associate reasonable alternatives with perception and action [4][10]. Group 2: Research Gaps and Contributions - Current research primarily focuses on successful execution rates of correct commands, neglecting the handling of ambiguous or unexecutable instructions [6][10]. - The core contributions of this work include the introduction of the IVA framework, the construction of a large-scale dataset for training, and validation of the model's performance across eight robotic tasks, demonstrating significant improvements in detecting false premises and executing valid commands [10][25]. Group 3: Experimental Results - The IVA framework achieved a false premise detection accuracy of 97.56% and a 50.78% increase in successful responses under false premise scenarios compared to baseline models [5][25]. - In various tasks, IVA outperformed the LLARVA model in overall success rates and false premise detection rates, with only minor reductions in success rates for real premise commands [25][28]. Group 4: Limitations and Future Directions - The dataset used for training is limited to a simulated environment, which may not fully represent real-world human-robot interactions, and the distribution of false premises may not align with actual occurrences [26][27]. - The IVA framework currently lacks the ability to handle complex, multi-turn clarifications and may struggle with longer, more ambiguous human instructions [26][27].
具身智能之心人形机器人交流群成立啦~
具身智能之心· 2025-08-24 13:22
Group 1 - The article introduces a new community for humanoid robot enthusiasts, focusing on areas such as humanoid control, VLA models, data collection, and hardware [1] - The community aims to connect professionals and students working in related fields to foster collaboration and knowledge sharing [1] Group 2 - Interested individuals are encouraged to add a designated assistant on WeChat with specific instructions for joining the group [2] - The requirement for a nickname and specific keywords for group entry emphasizes the community's organized approach to membership [2]
Cocos系统:让你的VLA模型实现了更快的收敛速度和更高的成功率
具身智能之心· 2025-08-22 00:04
Core Viewpoint - The article discusses the advancements in embodied intelligence, particularly focusing on diffusion strategies and the introduction of a new method called Cocos, which addresses the issue of loss collapse in training diffusion policies, leading to improved training efficiency and performance [3][11][25]. Summary by Sections Introduction - Embodied intelligence is a cutting-edge field in AI research, emphasizing the need for robots to understand and execute complex tasks effectively. Diffusion policies have emerged as a mainstream paradigm for constructing visual-language-action (VLA) models, although training efficiency remains a challenge [3]. Loss Collapse and Cocos - The article identifies loss collapse as a significant challenge in training diffusion strategies, where the neural network struggles to distinguish between generation conditions, leading to degraded training objectives. Cocos modifies the source distribution to depend on generation conditions, effectively addressing this issue [6][9][25]. Flow Matching Method - Flow matching is a core method in diffusion models, transforming a simple source distribution into a complex target distribution through optimization. The article outlines the optimization objectives for conditional distribution flow matching, which is crucial for VLA models [5][6]. Experimental Results - The article presents quantitative experimental results demonstrating that Cocos significantly enhances training efficiency and strategy performance across various benchmarks, including LIBERO and MetaWorld, as well as real-world robotic tasks [14][16][19][24]. Case Studies - Case studies illustrate the practical applications of Cocos in simulation tasks, highlighting its effectiveness in improving the robot's ability to distinguish between different camera perspectives and successfully complete tasks [18][21]. Source Distribution Design - The article discusses experiments on source distribution design, comparing different standard deviations and training methods. It concludes that a standard deviation of 0.2 is optimal, and using VAE for training the source distribution yields comparable results [22][24]. Conclusion - Cocos provides a general improvement for diffusion strategy training by effectively solving the loss collapse problem, thereby laying a foundation for future research and applications in embodied intelligence [25].
理想张骁: 这些事一定会在i6上解决掉
理想TOP2· 2025-08-21 08:10
Core Viewpoint - The company is facing internal and external challenges but is confident in its ability to resolve these issues with the upcoming i6 model, emphasizing the importance of user experience and service quality [1][19][21]. Group 1: Product Delivery and Performance - The initial delivery of the i8 model is taking place in over twenty cities, with approximately 200 units being delivered as a starting point, and the company aims to ramp up delivery speed quickly [4]. - The company is targeting to exceed 8,000 deliveries by the end of September, with a goal of reaching 10,000, while ensuring supply chain stability and quality [6]. - The i8 model will utilize the VLA architecture, with a voice control feature expected to be rolled out in mid-September [2][11]. Group 2: Technical Specifications and Innovations - The i8's 21-inch sport wheels have a similar range to the 20-inch wheels, as the impact on range is primarily due to rolling resistance and aerodynamic drag, which have been specifically calibrated [2][28][29]. - The company is implementing a new charging infrastructure with all high-speed stations capable of 5C and urban stations at 4C, with plans to prioritize charging for its own vehicle owners [2][32]. Group 3: Company Philosophy and Challenges - The company believes that any issues stem from internal mistakes, asserting that no external factors can defeat them unless they falter themselves [2][24]. - There is a focus on learning from past experiences and iterating on product and service offerings to enhance user value and experience [19][25].
行业深度 | 大模型重塑战局 智能驾驶商业化奇点已至【民生汽车 崔琰团队】
汽车琰究· 2025-08-21 01:55
Core Viewpoint - Intelligent driving has evolved from a technical highlight to a crucial factor for product differentiation among automakers and the commercialization of mobility services. The depth of technology, iteration speed, and scale of implementation will significantly influence the future competitive landscape and determine how automakers build sustainable competitive advantages in the "software-defined vehicle" arena [2][7]. Group 1: Intelligent Driving Development - Intelligent driving capabilities are becoming a battleground for automakers to shape brand premium, win user choices, and capture market share. The speed of implementation and penetration rate of intelligent driving systems create a technological gap among automakers, impacting the commercialization process [7]. - The commercialization process is accelerating, with increased regional pilots and favorable policies driving the rollout of L3 intelligent driving. The price range of 100,000 to 200,000 yuan is expected to dominate sales, with only 5% of models in this price range equipped with advanced intelligent driving features by 2024 [3][4]. - The "intelligent driving equity" trend is expected to drive the conversion of intelligent driving advantages into sales growth, with the Robotaxi market projected to reach hundreds of billions by 2030, showcasing significant potential [11]. Group 2: Technological Paradigms and Competition - The VLA (Vision-Language-Action) model is at the core of current intelligent driving solutions, integrating perception, cognition, and action. This model requires breakthroughs in world model construction and reinforcement learning to enhance its capabilities [8][9]. - The demand for computing power is surging, with the transition from L2 to L3 autonomous driving requiring a leap from 100+ TOPS to 500-1,000+ TOPS. The competition is shifting from single-vehicle computing power to the capabilities of vehicle chips and cloud supercomputing centers [9][52]. - Tesla has established a significant generational advantage through its fully self-developed closed-loop technology system, while domestic automakers are accelerating their catch-up efforts. The integration of VLA models is becoming a key focus for companies like Li Auto and Xiaopeng [10][12]. Group 3: Investment Recommendations - The establishment of a clear responsibility system under top-level policies and the maturation of intelligent driving technology towards L3 standards are promising. The trend of "intelligent driving equity" is expected to create a structural sales inflection point for intelligent driving vehicles [4]. - Companies with full-stack self-research capabilities, such as Li Auto, Xiaopeng, and Xiaomi Group, are recommended for investment, along with those employing self-research combined with third-party cooperation like BYD and Geely [4].
理想VLA司机大模型新的36个QA
自动驾驶之心· 2025-08-16 16:04
Core Viewpoint - The article discusses the challenges and advancements in the deployment of Visual-Language-Action (VLA) models in autonomous driving, emphasizing the integration of 3D spatial understanding with global semantic comprehension. Group 1: Challenges in VLA Deployment - The difficulties in deploying VLA models include multi-modal alignment, data training, and single-chip deployment, but advancements in new chip technologies may alleviate these challenges [2][3][5]. - The alignment issue between Visual-Language Models (VLM) and VLA is gradually being resolved with the release of advanced models like GPT-5, indicating that the alignment is not insurmountable [2][3]. Group 2: Technical Innovations - The VLA model incorporates a unique architecture that combines 3D local spatial understanding with 2D global comprehension, enhancing its ability to interpret complex environments [3][7]. - The integration of diffusion models into VLA is a significant innovation, allowing for improved trajectory generation and decision-making processes [5][6]. Group 3: Comparison with Competitors - The gradual transition from Level 2 (L2) to Level 4 (L4) autonomous driving is highlighted as a strategic approach, contrasting with competitors who may focus solely on L4 from the outset [9][10]. - The article draws parallels between the strategies of different companies in the autonomous driving space, particularly comparing the approaches of Tesla and Waymo [9][10]. Group 4: Future Developments - Future iterations of the VLA model are expected to scale in size and performance, with potential increases in parameters from 4 billion to 10 billion, while maintaining efficiency in deployment [16][18]. - The company is focused on enhancing the model's reasoning capabilities through reinforcement learning, which will play a crucial role in its development [13][51]. Group 5: User Experience and Functionality - The article emphasizes the importance of user experience, particularly in features like voice control and memory functions, which are essential for a seamless interaction between users and autonomous vehicles [18][25]. - The need for a robust understanding of various driving scenarios, including complex urban environments and highway conditions, is crucial for the model's success [22][23]. Group 6: Data and Training - The transition from VLM to VLA necessitates a complete overhaul of data labeling processes, as the requirements for training data have evolved significantly [32][34]. - The use of synthetic data is acknowledged, but the majority of the training data is derived from real-world scenarios to ensure the model's effectiveness [54]. Group 7: Regulatory Considerations - The company is actively engaging with regulatory bodies to ensure that its capabilities align with legal requirements, indicating a proactive approach to compliance [35][36]. - The relationship between technological advancements and regulatory frameworks is highlighted as a critical factor in the deployment of autonomous driving technologies [35][36].
头部企业抢夺标准定义权,机器人“暗战”升级
第一财经· 2025-08-14 05:04
Core Viewpoint - The article discusses the advancements and challenges in the field of robotics, particularly focusing on the development of embodied intelligence models that can learn from failures and adapt their actions accordingly [3][5][7]. Group 1: Robotics Development - Recent observations of robots reveal instances of failure during tasks, such as bed-making, highlighting the need for continuous improvement and learning in robotic systems [4][5]. - The ability of robots to recognize their failures and attempt new solutions is a significant advancement in embodied intelligence, showcasing a shift from traditional programmed responses to data-driven learning [5][7]. - The industry is currently divided on the approach to model architecture, with some advocating for unified models while others prefer layered designs, leading to debates over computational efficiency and application scenarios [3][9]. Group 2: Market Dynamics - Companies like Xinghai Map and Ziyuan are competing to establish their dominance in the robotics market, focusing on data, computational power, and algorithms as key drivers of innovation [9][12]. - The mainstream direction for large models in the industry is the Vision-Language-Action (VLA) model, which integrates visual, linguistic, and action processing capabilities [9][10]. - The competition is not only about technology but also about who can define the performance evaluation standards for these models, which will shape the future competitive landscape [13][14]. Group 3: Benchmarking and Standards - The concept of a benchmark for evaluating the performance of embodied intelligence models is gaining traction, with companies like Xinghai Map releasing open datasets to facilitate comparison and improvement [14][15]. - Establishing a common standard for model evaluation could enhance the industry's ability to measure advancements and foster collaboration among developers [14][15]. - The ambition behind creating these benchmarks is to attract more participants to the ecosystem, positioning companies like Xinghai Map as platform-oriented entities [15][16]. Group 4: Future Outlook - The ongoing evolution of robotics technology is seen as a critical juncture, where maintaining speed and efficiency over time will be essential for success in the market [16][17]. - Companies are increasingly focusing on comprehensive ecosystems that encompass data, core components, and robotic models, moving beyond single-point capabilities [16][17].
【钛晨报】事关智能网联新能源汽车,两部门征求意见;腾讯控股:第二季度营收1845.0亿元,同比增长15%;央行7月重要金融数据一览:今年M1-M2“剪刀...
Tai Mei Ti A P P· 2025-08-13 23:40
Group 1: Regulatory Developments in Smart Connected Vehicles - The State Administration for Market Regulation and the Ministry of Industry and Information Technology have drafted a notice to strengthen recall and supervision management for smart connected electric vehicles [2][3] - The draft emphasizes the need for companies to display safety warnings and usage instructions for combined driving assistance systems prominently in vehicle apps and manuals to prevent misuse [2][3] - Companies are required to develop and implement driver monitoring and warning systems to ensure driver engagement and reduce safety risks [2][3] Group 2: Production Consistency and Information Transparency - The draft calls for enhanced supervision of production consistency for smart connected electric vehicles, requiring accurate reporting of key information in the vehicle qualification certificate system [3] - Companies must manage over-the-air (OTA) software upgrades strictly, ensuring that only thoroughly tested versions are pushed to users and that defects are not concealed [3] - When providing information about driving automation levels and system capabilities, companies must ensure that the information is truthful and not misleading [3] Group 3: Incident Reporting and Investigation - The draft mandates that companies report safety incidents and collisions involving combined driving assistance systems promptly, in accordance with existing regulations [4] Group 4: Market Performance and Financial Updates - Tencent Holdings reported a second-quarter revenue of 184.5 billion yuan, a 15% year-on-year increase, with a net profit of 55.63 billion yuan, up 17% [6] - Nissan's sales in July reached 57,359 units in China, with Dongfeng Nissan's sales increasing by 19.4% year-on-year [8] - The People's Bank of China reported that M2 increased by 8.8% year-on-year, indicating improved liquidity and market confidence [9]
WRC观察:操作失误不新奇、更多厂商追求软硬一体、消费级机器狗上牌桌
Cai Jing Wang· 2025-08-13 16:29
Core Insights - The WRC event showcased significant advancements in humanoid robots, with over 200 companies and 1500+ exhibits, highlighting the industry's growth and innovation [1][2] - Discussions around embodied intelligence models have intensified, with industry leaders questioning the current VLA model's effectiveness and calling for a restructured approach [2][3] - The evolution of robot designs is evident, with many companies moving from demo stages to functional robots in various roles, although challenges in execution remain [2][6] Industry Trends - The humanoid robot market is experiencing a shift towards consumer-oriented products, with companies like Vbot and Magic Atom targeting family and personal use [13][14] - The integration of advanced sensory capabilities, including vision, hearing, and even olfactory systems, is becoming a focus for companies like Hanwang Technology [12] - The competition among robot manufacturers is intensifying, with firms striving to differentiate their products through unique designs and functionalities [8][15] Company Developments - Yushun Technology's collaboration with Reborn AGI aims to enhance robot training and optimization, indicating a trend towards community-driven development in robotics [3] - Self-variable Robotics is positioning itself as a comprehensive hardware and software provider, showcasing new models that emphasize practical applications [4][5] - Digital Huaxia is leveraging a PAAS platform to facilitate rapid customization of robotic applications, reflecting a shift towards more adaptable solutions in the market [11]
热爆了!中国机器人企业近100万家、融资超240亿,但仍有三大具身智能“非共识”争论
Tai Mei Ti A P P· 2025-08-12 23:25
优必选Walker机器人展示 中国机器人行业真的热爆了。 "人,实在是太多了。"这是今年世界机器人大会上,几乎每个人见面的第一句开场白。30多度高温下, 很多大人带着孩子去展区看,这证明着中国对于机器人赛道,尤其是人形机器人和具身智能赛道关注度 显著增加。 首先,机器人企业规模增长较快。笔者从企查查方面了解到,截至今年8月12日,中国现存机器人相关 企业有95.8万家,接近100万家。其中,2024年注册量为19.32万家,同比增长4.59%;而2025年前7个 月,机器人相关企业的注册量已达15.28万家,同比增长43.81%,大幅超过去年全年新增企业增速。 从地域分布来看,华东地区机器人相关企业占全国的39.64%。产业链方面,中国人形机器人整机平台 超过160家,占据全球50%以上;核心零部件供应链企业逾600家。 其次,融资端火热。今年1-7月,具身智能和机器人领域投资事件数超过200起,融资总额已超过240亿 元,远超过2024年全年总和。预计2025年全年,中国人形机器人市场规模将超过82亿元,占全球的50% 以上。 最后,市场前景广阔,中国正逐步成为全球人形机器人市场焦点。据花旗预测,到2050 ...