Workflow
VLA模型
icon
Search documents
一文尽览!2025年多篇VLA与RL融合的突破方向
具身智能之心· 2025-08-25 00:04
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 2025年机器人具身智能领域,一场"多模态与自主学习"的融合革命正悄然爆发!ICLR、RSS、ICRA、 CVPR等顶会集中收录8篇重磅文献,清一色聚焦 视觉-语言-动作(VLA)模型与强化学习(RL) 的融 合,直指机器人落地的核心痛点——如何在真实场景中更聪明地决策、更精准地执行任务? 想知道这些顶会研究具体提出了哪些创新方案?不同文章在技术细节、应用场景上又有哪些独特突破? 接下来,我们将逐一拆解这 8 篇文献的核心亮点,带您读懂 VLA+RL 如何重塑机器人智能! 更多干货和最新内容欢迎加入我们的具身社区,一起交流。 GRAPE: Generalizing Robot Policy via Preference Alignment (ICLR 2025) 论文链接: https://openreview.net/pdf?id=XnwyFD ...
质疑VLA模型、AI完全不够用?有从业者隔空回应宇树王兴兴
第一财经· 2025-08-11 14:51
2025.08. 11 本文字数:1430,阅读时长大约3分钟 作者 | 第一财 经 刘佳 在世界机器人大会上,宇树CEO王兴兴一口气提了不少"非共识"。他对 VLA (Vision-Language-Action视觉-语言-动作)模型持怀疑态度, 认为 这属于"相对傻瓜式架构";他还说机器人行业对数据关注度有点太高了,包括灵巧手在内的硬件虽然不够好但够用,行业最大的问题在于具 身智能的AI完全不够用。 王兴兴的观点在业内持续引发讨论。今日世界机器人大会上,记者留意到,国家地方共建人形机器人创新中心首席科学家江磊近20分钟的演 讲中,3次提到了王兴兴。 对于王兴兴关于"硬件足够用、大模型不够用"的观点,江磊分享了与阿里、华为等企业交流的体会:"我们是选不到一个很好的身体",并坦 承今天行业确实还用不上全参数模型,机器人的大脑、小脑、肢体需要深度协同;王兴兴质疑VLA并尝试用视频生成驱动机器人任务,江磊 承认"感知-认知-决策-执行的闭环尚未闭合",呼吁重构VLA模型,寻求新的解决范式;王兴兴还提到,机器人在RL(强化学习)的Scaling law(尺度定律)是非常值得做的方向,江磊认同表示,强化学习跟模仿学习 ...
质疑VLA模型、AI完全不够用?有从业者隔空回应宇树王兴兴
Di Yi Cai Jing· 2025-08-11 11:33
传统的人形机器人面临感知局限、决策断层、泛化瓶颈三大核心挑战。 在世界机器人大会上,宇树CEO王兴兴一口气提了不少"非共识"。他对 VLA (Vision-Language-Action视 觉-语言-动作)模型持怀疑态度, 认为这属于"相对傻瓜式架构";他还说机器人行业对数据关注度有点太 高了,包括灵巧手在内的硬件虽然不够好但够用,行业最大的问题在于具身智能的AI完全不够用。 王兴兴的观点在业内持续引发讨论。今日世界机器人大会上,记者留意到,国家地方共建人形机器人创 新中心首席科学家江磊近20分钟的演讲中,3次提到了王兴兴。 对于王兴兴关于"硬件足够用、大模型不够用"的观点,江磊分享了与阿里、华为等企业交流的体 会:"我们是选不到一个很好的身体",并坦承今天行业确实还用不上全参数模型,机器人的大脑、小 脑、肢体需要深度协同;王兴兴质疑VLA并尝试用视频生成驱动机器人任务,江磊承认"感知-认知-决 策-执行的闭环尚未闭合",呼吁重构VLA模型,寻求新的解决范式;王兴兴还提到,机器人在RL(强 化学习)的Scaling law(尺度定律)是非常值得做的方向,江磊认同表示,强化学习跟模仿学习都需要 进入Scalin ...
“安全宪章”下的智驾行业:告别野蛮生长 迈向规范发展
Zheng Quan Shi Bao· 2025-08-06 18:35
Core Viewpoint - The introduction of mandatory national standards for L2 level assisted driving is seen as a significant step towards enhancing safety in the intelligent driving sector, prompting industry players to reassess their safety design logic and establish comprehensive responsibility systems from technology development to user education [1][7]. Industry Actions - The Ministry of Industry and Information Technology (MIIT) has proposed a mandatory national standard for intelligent connected vehicles, focusing on safety requirements for L2 level assisted driving systems, which is viewed as a "safety charter" for the industry [2]. - Leading companies in the industry are actively building multi-dimensional safety assurance systems that cover technology research and development, scenario adaptation, and emergency response in light of the new safety requirements [2][4]. - The introduction of the VLA model by Yuanrong Qixing enhances the system's ability to handle edge cases and provides users with understandable decision-making processes, thereby improving the safety experience of assisted driving [3]. Regulatory Framework - The establishment of safety standards is seen as urgent, especially after a recent test revealed that none of the 36 popular models equipped with assisted driving systems passed a series of high-risk accident scenarios, with an average pass rate of only 35.73% [6]. - A multi-departmental regulatory framework is being formed, with the MIIT, Ministry of Public Security, and Ministry of Science and Technology collaborating to ensure the safety and reliability of assisted driving systems [6][8]. Industry Transformation - The mandatory standards are expected to drive a quality revolution in the industry, pushing companies to optimize their technology development and focus on safety and reliability rather than merely functionality [4][7]. - The introduction of clear safety standards will help eliminate consumer distrust in assisted driving technology, accelerating market adoption and fostering a competitive environment where resources concentrate on companies with genuine technological capabilities [7][9]. Market Dynamics - The penetration rate of assisted driving functions is increasing, with projections indicating that by 2024, 59.7% of new passenger cars in China will meet L2 level standards, up from 52.1% in 2023 [8]. - The industry is facing challenges such as high R&D costs and cash flow pressures, which may lead to a "Matthew effect" where weaker companies are eliminated from the market [4][8]. Future Outlook - The upcoming national standards are viewed as an opportunity for high-quality growth, allowing the industry to transition from "barbaric growth" to a more regulated and healthy development phase [9].
辅助驾驶的AI进化论 - 站在能力代际跃升的历史转折点
2025-08-05 03:15
Summary of Key Points from the Conference Call Industry Overview - The autonomous driving industry is at a pivotal point transitioning from L2 to L3 commercialization, with full-stack self-research manufacturers and third-party suppliers gaining a competitive edge [1][4] - Major players in the autonomous driving sector include Tesla, Xpeng, Li Auto, NIO, and third-party suppliers like Momenta and Yunrong Qixing [1][5] Core Insights and Arguments - The development of cloud-based intelligent computing centers and mass production of high-performance chips are crucial drivers for the industry [1] - Companies are investing heavily in R&D, with Tesla's HW5.0 featuring 4D millimeter-wave radar and Li Auto's L series equipped with laser radar [6][10] - Regulatory policies significantly impact the industry, with L2 standardization and multiple regions opening L4 commercialization pilot projects [8] Technological Developments - Xpeng is shifting to a pure vision solution to enhance visual perception and reduce hardware costs, while Huawei's ADS 4.0 supports high-speed L3 commercialization [3][12] - The VLA model integrates visual, language, and behavioral modules to optimize vehicle decision-making [3] - The industry is witnessing a shift towards data-driven development, with companies showcasing their cloud-based world models and parameter scales [29] Competitive Landscape - Leading companies in autonomous driving include Tesla, Xpeng, Li Auto, NIO, and Xiaomi, with significant contributions from domestic suppliers like SUTENG, Hesai Technology, and others [5][26] - Traditional manufacturers are increasingly opting for third-party solutions to shorten product cycles and reduce time costs [17] R&D and Investment Trends - Companies like NIO have invested over 10 billion yuan in R&D for three consecutive years, but face challenges in achieving commercial breakthroughs [14] - Xiaomi's growth in the autonomous driving sector is driven by its potential rather than current capabilities, with expectations for its models to feature laser radar [16] Consumer Perception and Market Trends - The development of intelligent driving technology includes advancements in features like high-speed NOA and parking functionalities [32] - Safety features are evolving, with the introduction of proactive avoidance systems to enhance driving experience [33] Investment Opportunities - Investors should focus on leading autonomous driving solution providers and full-stack self-research manufacturers, especially as regulatory frameworks evolve [36]
VLA模型崛起 汽车行业迎智驾与智造双破局
Core Viewpoint - The emergence of Vision-Language-Action (VLA) models is set to revolutionize the intelligent assisted driving industry, moving from traditional modular systems to a more integrated end-to-end architecture, enhancing driving experience and capabilities [1][2][3]. Industry Trends - The intelligent assisted driving sector is witnessing a shift from "usable" to "user-friendly" experiences, driven by the increasing adoption of new energy vehicles and the demand for improved driving assistance [3]. - VLA models are expected to dominate the market, with projections indicating that by 2030, VLA-driven end-to-end solutions could capture 60% of the L4 market share, leading to a reevaluation of the value chain for traditional Tier 1 suppliers [4]. Technological Advancements - The VLA model integrates visual, language understanding, and action decision-making, significantly enhancing scene reasoning and generalization capabilities compared to previous models [2][3]. - The VLA architecture is seen as a more comprehensive evolution of the end-to-end and VLM (Vision-Language Model) combination, addressing limitations in complex driving scenarios [3]. Competitive Landscape - Tesla is positioned as a potential beneficiary of this transformation, with its FSD Beta V12 showing a 76% reduction in intervention frequency compared to the previous version [4]. - Domestic automakers are also actively exploring VLA technologies, with companies like Li Auto emphasizing the importance of VLA in their future models [4]. Manufacturing Innovations - AI is driving a paradigm shift in automotive manufacturing, moving from traditional assembly line methods to more efficient, data-driven "smart island" models [2][5]. - The integration of AI in manufacturing processes is seen as essential for overcoming challenges such as long changeover times and quality fluctuations [6][7]. Future Outlook - The VLA technology is expected to redefine the competitive landscape of the intelligent assisted driving market, leading to a layered market structure rather than a single dominant technology [6]. - The acceptance of AI for process optimization in manufacturing is growing, with companies recognizing the need for comprehensive AI integration to enhance operational efficiency [8].
外卖大战尴尬收场,但巨头们仍在“窘境”中竞争AI
Hu Xiu· 2025-08-01 14:09
Group 1 - The fierce competition in the food delivery market has ended, with major players like Meituan, Taobao, and JD.com expressing a commitment to resist malicious competition and focus on cooperation [1] - JD.com has made significant investments in three robotics companies, totaling over 1.6 billion yuan, to enhance its logistics capabilities and integrate AI technology into its supply chain [1][7] - Meituan has also been actively investing in the field of embodied intelligence, with recent investments in multiple projects, indicating a strategic shift towards this technology [1][22] Group 2 - The investment trend among major companies suggests a shift from internal innovation to collaborative innovation through external investments, particularly in the field of embodied intelligence [3][4] - The competition among giants in the robotics sector is characterized by a lack of exclusive competitive barriers, as they are targeting similar investment opportunities [4][5] - JD.com has a robust financial position with 209.5 billion yuan in cash and equivalents, allowing it to invest heavily in technology without the need for significant capital expenditures [7][8] Group 3 - The logistics network of JD.com includes over 3,600 self-operated warehouses with a total management area of 32 million square meters, enhancing its operational efficiency [8][9] - The automation upgrades in JD.com's warehouses are driven by the need for efficiency, with technologies like AI scheduling and sorting robots being integrated into their operations [9][10] - The investment in robotics companies allows JD.com to cover the entire technology chain from AI to hardware, creating a closed-loop system for logistics solutions [12][13] Group 4 - The competition in the food delivery market has led to a decline in valuations for major players, with JD.com, Meituan, and Alibaba experiencing stock price drops amid a saturated market [38][40] - The growth of the food delivery market is slowing, with user penetration expected to reach 22.6% by 2025, necessitating new growth stories for these companies [39][40] - The development of embodied intelligence technology could potentially transform the industry, shifting the focus from subsidy wars to technological advancements [41][42]
李想对Thor-U芯片500 TOPS算力的回应
理想TOP2· 2025-07-27 15:27
Core Viewpoint - The article discusses the performance and capabilities of NVIDIA's Thor chip, particularly in relation to its TOPS (Tera Operations Per Second) output under different precision formats, highlighting discrepancies between advertised and actual performance metrics [1][5]. Group 1: Chip Performance - NVIDIA's Thor chip was initially advertised to deliver 700 TOPS, but the actual performance is closer to 500 TOPS after several adjustments [1]. - The performance of the Thor-U chip varies significantly based on the precision format used; it can achieve 700 TOPS in INT8 and FP8 mixed precision, while it can reach 1400 TOPS in FP4 precision [5][4]. - The Thor-X chip has a similar performance variation, achieving 1000 TOPS in FP8, 500 TOPS in FP16, and 2000 TOPS in FP4 [5]. Group 2: Implications for Automotive Industry - Li Auto's VLA model utilizes a mixed precision approach (INT8 and FP8) to optimize performance, aiming to leverage the Thor-U chip's capabilities for faster response times [2][3]. - The automotive industry is increasingly focusing on low precision inference models to enhance processing speed, which requires advanced engineering capabilities [2]. - The shift towards FP4 precision in future models indicates a trend towards maximizing chip performance in the automotive sector [5].
长城汽车计划再投资元戎启行 金额8亿—10亿元
Jing Ji Guan Cha Bao· 2025-07-23 13:50
Group 1 - Great Wall Motors plans to invest 800 million to 1 billion RMB in DeepRoute.AI, primarily for purchasing computing power cards [1] - DeepRoute.AI, founded in February 2019, focuses on advanced intelligent driving technology and has over 20,000 units of its driving solutions in production [1] - In November 2024, DeepRoute.AI secured a $100 million Series C funding round led exclusively by Great Wall Motors [1] Group 2 - DeepRoute.AI's VLA model is expected to enter mass production in Q3 2025 and is designed for commercial application, enhancing the driving capabilities of Great Wall's vehicles [2] - The VLA model, which supports both LiDAR and pure vision solutions, boasts strong reasoning capabilities and improved interpretability of driving decisions [2] - If Great Wall Motors successfully integrates the VLA model, it will significantly enhance the competitive edge of its vehicles in the intelligent driving market [2] Group 3 - The VLA model's production has faced delays due to reliance on the NVIDIA Thor chip, which has a computing power of 2000 TFLOPS, eight times that of Orin [3] - Industry observers note that after several rounds of eliminations, few intelligent driving solution providers remain, prompting Great Wall Motors to deepen its collaboration with DeepRoute.AI [3] - Great Wall Motors previously spun off its intelligent driving division, Haomo Zhixing, which faced strategic and technical challenges, impacting its high-level driving plans [3] Group 4 - Great Wall Motors is increasingly inclined to adopt external intelligent driving solutions, with DeepRoute.AI launching the first domestic solution that does not rely on high-precision maps [4] - The intelligent driving algorithms for Great Wall's upcoming models, such as the Weipai Lanshan and Weipai Gaoshan, will be provided by DeepRoute.AI [4] - In addition to DeepRoute.AI, Great Wall Motors is collaborating with Zhuoyue Technology to develop intelligent driving solutions for its budget models [4]
分层VLA模型与完全端到端VLA哪个方向好发论文?
自动驾驶之心· 2025-07-23 07:32
Core Viewpoint - The article emphasizes the shift in academic research from traditional perception and planning tasks in autonomous driving to the exploration of Vision-Language-Action (VLA) models, suggesting that there are still many opportunities for research in this area [1][2]. Group 1: VLA Research Topics - The VLA model represents a new paradigm in autonomous driving, integrating vision, language, and action to enhance decision-making capabilities [2][3]. - The evolution of autonomous driving technology can be categorized into three phases: traditional modular architecture, pure visual end-to-end systems, and the emergence of VLA models [2][3]. - VLA models aim to improve interpretability and reliability by allowing the model to explain its decisions in natural language, thus increasing transparency and trust [3]. Group 2: Course Objectives and Structure - The course aims to help participants systematically master key theoretical knowledge in VLA and develop practical skills in model design and implementation [6][7]. - Participants will engage in a 12-week online group research followed by 2 weeks of paper guidance, culminating in a 10-week maintenance period for their research papers [6]. - The course will provide insights into classic and cutting-edge papers, coding implementations, and writing methodologies, ultimately assisting participants in producing a research paper draft [6][12]. Group 3: Enrollment and Requirements - The course is limited to 6-8 participants per session, targeting individuals with a foundational understanding of deep learning and basic programming skills [5][9]. - Participants are expected to have access to high-performance computing resources, ideally with multiple high-end GPUs, to facilitate their research [13][14]. - A preliminary assessment will be conducted to tailor the course content to the individual needs of participants, ensuring a focused learning experience [15]. Group 4: Course Highlights and Outcomes - The course features a "2+1" teaching model, providing comprehensive support from experienced instructors and research mentors [15]. - Participants will gain a thorough understanding of the research process, writing techniques, and submission strategies, enhancing their academic and professional profiles [15][20]. - The expected outcomes include a research paper draft, project completion certificates, and potential recommendation letters based on performance [15].