VLA模型

Search documents
加州大学最新!做什么?教VLA模型拒绝不可能的任务
具身智能之心· 2025-08-26 00:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 出发点与解决的问题 VLA模型在一系列机器人任务中表现出优异性能,其依赖多模态输入,语言指令不仅用于预测动作,还需稳健解读用户意图:即使指令无法执行。本工 作聚焦VLA模型如何识别、解读并响应 虚假前提指令 (即涉及环境中不存在的物体或条件的自然语言命令),提出统一框架 IVA(Instruct-Verify-and- Act) ,核心能力包括: 1. 检测指令因虚假前提无法执行的场景; 2. 通过语言进行澄清或纠正; 3. 将合理的替代方案与感知和动作关联。 为实现这一目标,构建了大规模指令微调数据集(含结构化语言提示),训练VLA模型同时处理有效和错误指令。该数据集为 上下文增强的半合成数据 ,包含成对的"真实前提指令"与"虚假前提指令",支撑模型稳健检测虚假前提并生成自然语言纠正。实验表明,IVA相比基线模型: 领域背景与挑战 1)VLA模型 ...
加州大学最新!做什么?教VLA模型拒绝不可能的任务
具身智能之心· 2025-08-25 06:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 出发点与解决的问题 VLA模型在一系列机器人任务中表现出优异性能,其依赖多模态输入,语言指令不仅用于预测动作,还需稳健解读用户意图:即使指令无法执行。本工 作聚焦VLA模型如何识别、解读并响应 虚假前提指令 (即涉及环境中不存在的物体或条件的自然语言命令),提出统一框架 IVA(Instruct-Verify-and- Act) ,核心能力包括: 1. 检测指令因虚假前提无法执行的场景; 2. 通过语言进行澄清或纠正; 3. 将合理的替代方案与感知和动作关联。 为实现这一目标,构建了大规模指令微调数据集(含结构化语言提示),训练VLA模型同时处理有效和错误指令。该数据集为 上下文增强的半合成数据 ,包含成对的"真实前提指令"与"虚假前提指令",支撑模型稳健检测虚假前提并生成自然语言纠正。实验表明,IVA相比基线模型: 领域背景与挑战 1)VLA模型 ...
具身智能之心人形机器人交流群成立啦~
具身智能之心· 2025-08-24 13:22
具身智能之心人形机器人交流群来啦!欢迎从事人形运控、VLA模型、数采、硬件等相关方向的同 学加入。 添加小助理微信AIDriver005,备注昵称+人形+加群。注意:有备注才能通过哦~ ...
Cocos系统:让你的VLA模型实现了更快的收敛速度和更高的成功率
具身智能之心· 2025-08-22 00:04
更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 写在前面 点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 具身智能目前已经成为人工智能研究的前沿领域。随着机器人技术的快速发展,如何让机器人更好地理解和执行复杂任务成为了一个重要的研究方向。扩散策略 (Diffusion Policy)发挥了扩散模型(Diffusion models)对复杂分布的拟合能力,成为了构建视觉-语言-动作(VLA)模型的主流范式。然而,现有的扩散策略 在训练效率上仍然存在不足。本文发现了扩散策略训练低效的一个关键挑战:当扩散模型神经网络难以区分生成条件——即视觉输入和语言指令时,训练目标会 发生退化,变成对边际动作分布的建模,该现象被称为损失崩塌(loss collapse)。为了解决损失崩塌的问题,可以简单地将动作生成的源分布(source distribution)修改为依赖于生成条件的分布( Co ndition- co nditioned s ource di ...
理想张骁: 这些事一定会在i6上解决掉
理想TOP2· 2025-08-21 08:10
Core Viewpoint - The company is facing internal and external challenges but is confident in its ability to resolve these issues with the upcoming i6 model, emphasizing the importance of user experience and service quality [1][19][21]. Group 1: Product Delivery and Performance - The initial delivery of the i8 model is taking place in over twenty cities, with approximately 200 units being delivered as a starting point, and the company aims to ramp up delivery speed quickly [4]. - The company is targeting to exceed 8,000 deliveries by the end of September, with a goal of reaching 10,000, while ensuring supply chain stability and quality [6]. - The i8 model will utilize the VLA architecture, with a voice control feature expected to be rolled out in mid-September [2][11]. Group 2: Technical Specifications and Innovations - The i8's 21-inch sport wheels have a similar range to the 20-inch wheels, as the impact on range is primarily due to rolling resistance and aerodynamic drag, which have been specifically calibrated [2][28][29]. - The company is implementing a new charging infrastructure with all high-speed stations capable of 5C and urban stations at 4C, with plans to prioritize charging for its own vehicle owners [2][32]. Group 3: Company Philosophy and Challenges - The company believes that any issues stem from internal mistakes, asserting that no external factors can defeat them unless they falter themselves [2][24]. - There is a focus on learning from past experiences and iterating on product and service offerings to enhance user value and experience [19][25].
行业深度 | 大模型重塑战局 智能驾驶商业化奇点已至【民生汽车 崔琰团队】
汽车琰究· 2025-08-21 01:55
Core Viewpoint - Intelligent driving has evolved from a technical highlight to a crucial factor for product differentiation among automakers and the commercialization of mobility services. The depth of technology, iteration speed, and scale of implementation will significantly influence the future competitive landscape and determine how automakers build sustainable competitive advantages in the "software-defined vehicle" arena [2][7]. Group 1: Intelligent Driving Development - Intelligent driving capabilities are becoming a battleground for automakers to shape brand premium, win user choices, and capture market share. The speed of implementation and penetration rate of intelligent driving systems create a technological gap among automakers, impacting the commercialization process [7]. - The commercialization process is accelerating, with increased regional pilots and favorable policies driving the rollout of L3 intelligent driving. The price range of 100,000 to 200,000 yuan is expected to dominate sales, with only 5% of models in this price range equipped with advanced intelligent driving features by 2024 [3][4]. - The "intelligent driving equity" trend is expected to drive the conversion of intelligent driving advantages into sales growth, with the Robotaxi market projected to reach hundreds of billions by 2030, showcasing significant potential [11]. Group 2: Technological Paradigms and Competition - The VLA (Vision-Language-Action) model is at the core of current intelligent driving solutions, integrating perception, cognition, and action. This model requires breakthroughs in world model construction and reinforcement learning to enhance its capabilities [8][9]. - The demand for computing power is surging, with the transition from L2 to L3 autonomous driving requiring a leap from 100+ TOPS to 500-1,000+ TOPS. The competition is shifting from single-vehicle computing power to the capabilities of vehicle chips and cloud supercomputing centers [9][52]. - Tesla has established a significant generational advantage through its fully self-developed closed-loop technology system, while domestic automakers are accelerating their catch-up efforts. The integration of VLA models is becoming a key focus for companies like Li Auto and Xiaopeng [10][12]. Group 3: Investment Recommendations - The establishment of a clear responsibility system under top-level policies and the maturation of intelligent driving technology towards L3 standards are promising. The trend of "intelligent driving equity" is expected to create a structural sales inflection point for intelligent driving vehicles [4]. - Companies with full-stack self-research capabilities, such as Li Auto, Xiaopeng, and Xiaomi Group, are recommended for investment, along with those employing self-research combined with third-party cooperation like BYD and Geely [4].
理想VLA司机大模型新的36个QA
自动驾驶之心· 2025-08-16 16:04
Core Viewpoint - The article discusses the challenges and advancements in the deployment of Visual-Language-Action (VLA) models in autonomous driving, emphasizing the integration of 3D spatial understanding with global semantic comprehension. Group 1: Challenges in VLA Deployment - The difficulties in deploying VLA models include multi-modal alignment, data training, and single-chip deployment, but advancements in new chip technologies may alleviate these challenges [2][3][5]. - The alignment issue between Visual-Language Models (VLM) and VLA is gradually being resolved with the release of advanced models like GPT-5, indicating that the alignment is not insurmountable [2][3]. Group 2: Technical Innovations - The VLA model incorporates a unique architecture that combines 3D local spatial understanding with 2D global comprehension, enhancing its ability to interpret complex environments [3][7]. - The integration of diffusion models into VLA is a significant innovation, allowing for improved trajectory generation and decision-making processes [5][6]. Group 3: Comparison with Competitors - The gradual transition from Level 2 (L2) to Level 4 (L4) autonomous driving is highlighted as a strategic approach, contrasting with competitors who may focus solely on L4 from the outset [9][10]. - The article draws parallels between the strategies of different companies in the autonomous driving space, particularly comparing the approaches of Tesla and Waymo [9][10]. Group 4: Future Developments - Future iterations of the VLA model are expected to scale in size and performance, with potential increases in parameters from 4 billion to 10 billion, while maintaining efficiency in deployment [16][18]. - The company is focused on enhancing the model's reasoning capabilities through reinforcement learning, which will play a crucial role in its development [13][51]. Group 5: User Experience and Functionality - The article emphasizes the importance of user experience, particularly in features like voice control and memory functions, which are essential for a seamless interaction between users and autonomous vehicles [18][25]. - The need for a robust understanding of various driving scenarios, including complex urban environments and highway conditions, is crucial for the model's success [22][23]. Group 6: Data and Training - The transition from VLM to VLA necessitates a complete overhaul of data labeling processes, as the requirements for training data have evolved significantly [32][34]. - The use of synthetic data is acknowledged, but the majority of the training data is derived from real-world scenarios to ensure the model's effectiveness [54]. Group 7: Regulatory Considerations - The company is actively engaging with regulatory bodies to ensure that its capabilities align with legal requirements, indicating a proactive approach to compliance [35][36]. - The relationship between technological advancements and regulatory frameworks is highlighted as a critical factor in the deployment of autonomous driving technologies [35][36].
头部企业抢夺标准定义权,机器人“暗战”升级
第一财经· 2025-08-14 05:04
Core Viewpoint - The article discusses the advancements and challenges in the field of robotics, particularly focusing on the development of embodied intelligence models that can learn from failures and adapt their actions accordingly [3][5][7]. Group 1: Robotics Development - Recent observations of robots reveal instances of failure during tasks, such as bed-making, highlighting the need for continuous improvement and learning in robotic systems [4][5]. - The ability of robots to recognize their failures and attempt new solutions is a significant advancement in embodied intelligence, showcasing a shift from traditional programmed responses to data-driven learning [5][7]. - The industry is currently divided on the approach to model architecture, with some advocating for unified models while others prefer layered designs, leading to debates over computational efficiency and application scenarios [3][9]. Group 2: Market Dynamics - Companies like Xinghai Map and Ziyuan are competing to establish their dominance in the robotics market, focusing on data, computational power, and algorithms as key drivers of innovation [9][12]. - The mainstream direction for large models in the industry is the Vision-Language-Action (VLA) model, which integrates visual, linguistic, and action processing capabilities [9][10]. - The competition is not only about technology but also about who can define the performance evaluation standards for these models, which will shape the future competitive landscape [13][14]. Group 3: Benchmarking and Standards - The concept of a benchmark for evaluating the performance of embodied intelligence models is gaining traction, with companies like Xinghai Map releasing open datasets to facilitate comparison and improvement [14][15]. - Establishing a common standard for model evaluation could enhance the industry's ability to measure advancements and foster collaboration among developers [14][15]. - The ambition behind creating these benchmarks is to attract more participants to the ecosystem, positioning companies like Xinghai Map as platform-oriented entities [15][16]. Group 4: Future Outlook - The ongoing evolution of robotics technology is seen as a critical juncture, where maintaining speed and efficiency over time will be essential for success in the market [16][17]. - Companies are increasingly focusing on comprehensive ecosystems that encompass data, core components, and robotic models, moving beyond single-point capabilities [16][17].
【钛晨报】事关智能网联新能源汽车,两部门征求意见;腾讯控股:第二季度营收1845.0亿元,同比增长15%;央行7月重要金融数据一览:今年M1-M2“剪刀...
Tai Mei Ti A P P· 2025-08-13 23:40
Group 1: Regulatory Developments in Smart Connected Vehicles - The State Administration for Market Regulation and the Ministry of Industry and Information Technology have drafted a notice to strengthen recall and supervision management for smart connected electric vehicles [2][3] - The draft emphasizes the need for companies to display safety warnings and usage instructions for combined driving assistance systems prominently in vehicle apps and manuals to prevent misuse [2][3] - Companies are required to develop and implement driver monitoring and warning systems to ensure driver engagement and reduce safety risks [2][3] Group 2: Production Consistency and Information Transparency - The draft calls for enhanced supervision of production consistency for smart connected electric vehicles, requiring accurate reporting of key information in the vehicle qualification certificate system [3] - Companies must manage over-the-air (OTA) software upgrades strictly, ensuring that only thoroughly tested versions are pushed to users and that defects are not concealed [3] - When providing information about driving automation levels and system capabilities, companies must ensure that the information is truthful and not misleading [3] Group 3: Incident Reporting and Investigation - The draft mandates that companies report safety incidents and collisions involving combined driving assistance systems promptly, in accordance with existing regulations [4] Group 4: Market Performance and Financial Updates - Tencent Holdings reported a second-quarter revenue of 184.5 billion yuan, a 15% year-on-year increase, with a net profit of 55.63 billion yuan, up 17% [6] - Nissan's sales in July reached 57,359 units in China, with Dongfeng Nissan's sales increasing by 19.4% year-on-year [8] - The People's Bank of China reported that M2 increased by 8.8% year-on-year, indicating improved liquidity and market confidence [9]
WRC观察:操作失误不新奇、更多厂商追求软硬一体、消费级机器狗上牌桌
Cai Jing Wang· 2025-08-13 16:29
Core Insights - The WRC event showcased significant advancements in humanoid robots, with over 200 companies and 1500+ exhibits, highlighting the industry's growth and innovation [1][2] - Discussions around embodied intelligence models have intensified, with industry leaders questioning the current VLA model's effectiveness and calling for a restructured approach [2][3] - The evolution of robot designs is evident, with many companies moving from demo stages to functional robots in various roles, although challenges in execution remain [2][6] Industry Trends - The humanoid robot market is experiencing a shift towards consumer-oriented products, with companies like Vbot and Magic Atom targeting family and personal use [13][14] - The integration of advanced sensory capabilities, including vision, hearing, and even olfactory systems, is becoming a focus for companies like Hanwang Technology [12] - The competition among robot manufacturers is intensifying, with firms striving to differentiate their products through unique designs and functionalities [8][15] Company Developments - Yushun Technology's collaboration with Reborn AGI aims to enhance robot training and optimization, indicating a trend towards community-driven development in robotics [3] - Self-variable Robotics is positioning itself as a comprehensive hardware and software provider, showcasing new models that emphasize practical applications [4][5] - Digital Huaxia is leveraging a PAAS platform to facilitate rapid customization of robotic applications, reflecting a shift towards more adaptable solutions in the market [11]