VLA

Search documents
速度提升3倍,CoT推理助力VLA!ECoT-Lite:融合具身机器人推理改善策略的几种机制
具身智能之心· 2025-08-27 00:04
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 William Chen等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 标题:Training Strategies for Efficient Embodied Reasoning 相关链接 : 主页 :https://ecot-lite.github.io/ paper :https://arxiv.org/pdf/2505.08243 Ecot 解读 :https://zhuanlan.zhihu.com/p/1941997417248583914 Libero-90 ECoT dataset : https://huggingface.co/datasets/Embodied-CoT/embodied_features_and_demos_libero Libero 数据集: https://github.com/Lifelong-Robot-Learning/LIBERO ...
用三组关键词囊括所有看好理想人士近期对理想的观点
理想TOP2· 2025-08-22 13:29
VC还是PE心态看理想/物理AI/对组织力的怀疑与批评可以囊括所有看好理想人士近期对理想的观 点。 每一个具体的人,实际都是同时具备VC心态与PE心态的,只是说不同时刻占比不同。 VC还是PE心态看理想 VC心态: 1.更长周期看理想(比如3年5年以上周期) 3.能接受理想说不清楚靠AI长期而言如何收费,更多取决于认为这东西底层对世界的改变有多深刻, 创造价值的潜力有多大。 4.高容忍度,能接受理想实现长期目标过程中多次犯错,多次判断错误,多次说到没做到。 PE心态: 1. 较VC心态明显更短周期看理想(比如一年以内或几个月以内) 2.从务实的层面分析理想的价值(比如卖多少车,有多少营收、单车利润、总利润) 3.如果理想说不清楚靠AI如何收费,就直接选择不信 4.低容忍度,不太能接受理想短期多次误判 对同一个事,VC/PE心态视角不同,已知腾讯从QQ起步发展了微信,构建了巨大护城河,靠广告可 以挣不少钱。字节从今日头条起步发展了抖音/tiktok,靠广告可以挣不少钱。乔布斯2.0时代的苹果从 iPod起步发展了iPhone,谷歌愿意一年给苹果200亿左右美元图默认搜索引擎。 VC心态更愿意从QQ开始,就认为腾 ...
VLA:何时大规模落地
Zhong Guo Qi Che Bao Wang· 2025-08-13 01:33
Core Viewpoint - The discussion around VLA (Vision-Language-Action model) is intensifying, with contrasting opinions on its short-term feasibility and potential impact on the automotive industry [2][12]. Group 1: VLA Technology and Development - The Li Auto i8 is the first vehicle to feature the VLA driver model, positioning it as a key selling point [2]. - Bosch's president for intelligent driving in China, Wu Yongqiao, expressed skepticism about the short-term implementation of VLA, citing challenges in multi-modal data acquisition and training [2][12]. - VLA is seen as an "intelligent enhanced version" of end-to-end systems, aiming for a more human-like driving experience [2][5]. Group 2: Comparison of Driving Technologies - There are two main types of end-to-end technology: modular end-to-end and one-stage end-to-end, with the latter being more advanced and efficient [3][4]. - The one-stage end-to-end model simplifies the process by directly mapping sensor data to control commands, reducing information loss between modules [3][4]. - VLA is expected to outperform traditional end-to-end models by integrating multi-modal capabilities and enhancing decision-making in complex scenarios [5][6]. Group 3: Challenges and Requirements for VLA - The successful implementation of VLA relies on breakthroughs in three key areas: cross-modal feature alignment, world model construction, and dynamic knowledge base integration [7][8]. - Current automotive chips are not designed for AI large models, leading to performance limitations in real-time decision-making [9][11]. - The industry is experiencing a "chip power battle," with companies like Tesla and Li Auto developing their own high-performance AI chips to meet VLA's requirements [11][12]. Group 4: Future Outlook and Timeline - Some industry experts believe 2025 could be a pivotal year for VLA technology, while others suggest it may take 3-5 years for widespread adoption [12][13]. - Initial applications of VLA are expected to be in controlled environments, with broader capabilities emerging as chip technology advances [14]. - Long-term projections indicate that advancements in AI chip technology and multi-modal alignment could lead to significant breakthroughs in VLA deployment by 2030 [14][15].
腾讯研究院AI速递 20250808
腾讯研究院· 2025-08-07 16:01
Group 1: GPT-5 and MiniMax Voice Model - OpenAI has disclosed four versions of GPT-5: standard, mini, nano, and chat, with varying capabilities for different user tiers [1] - Community testing shows GPT-5 achieves 90% accuracy in SimpleBench reasoning tests, with improvements in programming and visual performance [1] - MiniMax has launched a new voice generation model, Speech 2.5, supporting 40 languages and enabling natural switching between languages while preserving voice characteristics [2] Group 2: Xiaohongshu and MiniCPM Models - Xiaohongshu has open-sourced its first multimodal large model, dots.vlm1, which closely rivals leading closed-source models in visual understanding and reasoning [3] - The MiniCPM-V 4.0 model has been released with only 4 billion parameters, achieving state-of-the-art results while being optimized for mobile use [4] - MiniCPM-V 4.0 shows significant throughput advantages under increased concurrent user loads, reaching 13,856 tokens per second [4] Group 3: Qwen Models and Chess Competition - Qwen has introduced two smaller models, Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507, both suitable for edge deployment and achieving high performance in reasoning tasks [6] - The first round of the inaugural large model chess competition saw OpenAI's o3 achieve a perfect score against o4-mini, while Grok 4 advanced after a tie with Gemini 2.5 Pro [7] Group 4: Gemini's Guided Learning and Skild AI - Google has launched a "Guided Learning" tool for Gemini, designed to help users build deep understanding through interactive learning [8] - Skild AI has developed an end-to-end visual perception control strategy that allows robots to navigate complex environments with unprecedented adaptability [9] Group 5: Li Auto and a16z Insights - Li Auto has introduced the VLA model, which integrates visual, language, and action components to enhance vehicle decision-making [10] - a16z analysts predict that the AI application generation platform market will move towards specialization rather than a winner-takes-all scenario, with over 70% of users active on a single platform [12]
一切竞争,都变成了“AI 竞争”
3 6 Ke· 2025-08-01 11:13
Group 1 - The core viewpoint of the articles emphasizes the increasing importance of AI in the automotive industry, particularly in high-end electric vehicles, with companies like Li Auto and Geely leading the charge with innovative AI technologies [1][2][4][5][6][8][17] - Li Auto's new model, the i8, features the VLA (Visual Language Model) AI technology, which integrates smart driving and smart cockpit systems into a single cohesive unit, marking a significant evolution in automotive AI capabilities [2][4][5] - Geely's Agent OS represents a comprehensive approach to AI in vehicles, treating cars as intelligent robots that can proactively interact with users, showcasing a shift from passive assistance to active engagement [6][8] Group 2 - The articles highlight that while AI is becoming a core selling point in high-end vehicles, traditional factors such as space and driving experience remain crucial for consumers, especially in the mid-range market [9][10][13][17] - The focus on spacious interiors is evident in new models like the Nio L90 and Geely Galaxy M9, which are designed to cater to family needs, emphasizing comfort and versatility [10][12][13] - The return to prioritizing driving dynamics is noted, with companies like Geely and Leap Motor investing in chassis tuning to enhance driving pleasure, indicating that despite advancements in AI, the fundamental driving experience remains a key competitive factor [14][16][17]
关于理想VLA的22个QA
理想TOP2· 2025-07-30 00:02
Core Viewpoint - The VLA architecture has significant technical potential and is seen as a long-term framework for autonomous driving, evolving from end-to-end systems to a more robust model that can support urban driving scenarios [1][4]. Group 1: VLA Architecture and Technical Potential - The VLA architecture is derived from robotics and embodied intelligence, emphasizing the need for visual and action capabilities, and is expected to evolve alongside advancements in robotics [1]. - VLA's ability to generalize is not solely dependent on data input but is enhanced through reinforcement learning, allowing it to autonomously address new challenges [5]. - The VLA model is designed to support various platforms without differentiation, ensuring consistent performance across different hardware [2][3]. Group 2: Performance Metrics and Future Enhancements - The current operational speed of the Thor-U chip is 10Hz, with potential upgrades to 20Hz and 30Hz through optimizations in data and algorithm architecture [2]. - The VLA model's upgrade cycle includes both pre-training and post-training updates, allowing for continuous improvement in capabilities such as spatial understanding and language processing [6]. - The VLA architecture aims to achieve L4 autonomous driving capabilities within a year, with a focus on rapid iteration and simulation-based testing [12]. Group 3: User Experience and Interaction - Language understanding is deemed essential for future autonomous driving, enhancing the model's ability to handle complex scenarios and improving overall driving experience [4]. - The VLA system is designed to adapt to user preferences, allowing for different driving styles based on individual needs and enhancing user trust in the technology [19]. - Features such as remote vehicle summoning and real-time monitoring of the vehicle's surroundings are being developed to improve user interaction and experience [13]. Group 4: Competitive Landscape and Strategic Decisions - The company is currently utilizing NVIDIA chips for model deployment, focusing on maintaining versatility and avoiding being locked into specific architectures [3]. - The company is closely monitoring competitors like Tesla, aiming to learn from their advancements while prioritizing a gradual and comprehensive approach to achieving full autonomous driving capabilities [12]. - The VLA architecture is positioned as a differentiating factor in the market, leveraging reinforcement learning to enhance driving logic and user experience [20].
可以留意一下10位业内人士如何看VLA
理想TOP2· 2025-07-21 14:36
Core Viewpoints - The current development of cutting-edge technologies in autonomous driving is not yet fully mature for mass production, with significant challenges remaining to be addressed [1][27][31] - Emerging technologies such as VLA/VLM, diffusion models, closed-loop simulation, and reinforcement learning are seen as potential key directions for future exploration in autonomous driving [6][7][28] - The choice between deepening expertise in autonomous driving or transitioning to embodied intelligence depends on individual circumstances and market dynamics [19][34] Group 1: Current Technology Maturity - The BEV (Bird's Eye View) perception model has reached a level of maturity suitable for mass production, while other models like E2E (End-to-End) are still in the experimental phase [16][31] - There is a consensus that the existing models struggle with corner cases, particularly in complex driving scenarios, indicating that while basic functionalities are in place, advanced capabilities are still lacking [16][24][31] - The industry is witnessing a shift towards utilizing larger models and advanced techniques to enhance scene understanding and decision-making processes in autonomous vehicles [26][28] Group 2: Emerging Technologies - VLA/VLM is viewed as a promising direction for the next generation of autonomous driving, with the potential to improve reasoning capabilities and safety [2][28] - The application of reinforcement learning is recognized as having significant potential, particularly when combined with effective simulation environments [6][32] - Diffusion models are being explored for their ability to generate multi-modal trajectories, which could be beneficial in uncertain driving conditions [7][26] Group 3: Future Directions - Future advancements in autonomous driving technology are expected to focus on enhancing safety, improving passenger experience, and achieving comprehensive scene coverage [20][28] - The integration of closed-loop simulations and data-driven approaches is essential for refining autonomous driving systems and ensuring their reliability [20][30] - The industry is moving towards a data-driven model where the efficiency of data collection, cleaning, labeling, training, and validation will determine competitive advantage [20][22] Group 4: Career Choices - The decision to specialize in autonomous driving or shift to embodied intelligence should consider personal interests, market trends, and the maturity of each field [19][34] - The autonomous driving sector is perceived as having more immediate opportunities for impactful work compared to the still-developing field of embodied intelligence [19][34]
师兄自己发了篇自动驾大模型,申博去TOP2了。。。
自动驾驶之心· 2025-07-09 12:56
Core Viewpoint - The article discusses the advancements in large models (LLMs) for autonomous driving, highlighting the need for optimization in efficiency, knowledge expansion, and reasoning capabilities as the technology matures [2][3]. Group 1: Development of Large Models - Companies like Li Auto and Huawei are implementing their own VLA and VLM solutions, indicating a trend towards the practical application of large models in autonomous driving [2]. - The focus for the next generation of large models includes lightweight design, hardware adaptation, knowledge distillation, quantization acceleration, and efficient fine-tuning [2][3]. Group 2: Course Introduction - A course is being offered to explore cutting-edge optimization methods for large models, focusing on parameter-efficient computation, dynamic knowledge expansion, and complex reasoning [3]. - The course aims to address core challenges in model optimization, including pruning, quantization, retrieval-augmented generation (RAG), and advanced reasoning paradigms like Chain-of-Thought (CoT) and reinforcement learning [3][4]. Group 3: Enrollment and Requirements - The course will accept a maximum of 8 students per session, targeting individuals with a background in deep learning or machine learning who are familiar with Python and PyTorch [5][10]. - Participants will gain a systematic understanding of large model optimization, practical coding skills, and insights into academic writing and publication processes [8][10]. Group 4: Course Outcomes - Students will learn to combine theoretical knowledge with practical coding, develop their own research ideas, and produce a draft of a research paper [8][9]. - The course includes a structured timeline with specific topics each week, covering model pruning, quantization, efficient fine-tuning, and advanced reasoning techniques [20].
大模型在自动驾驶后期的落地与研究方向有哪些?
自动驾驶之心· 2025-07-07 23:31
Core Insights - The article discusses the evolving landscape of large models in autonomous driving, highlighting the focus on lightweight solutions, hardware compatibility, knowledge distillation, and efficient fine-tuning of large models [1] - It emphasizes the importance of advanced reasoning paradigms such as Chain-of-Thought (CoT) and VLA combined with reinforcement learning in enhancing spatial perception capabilities [1] Group 1: Course Overview - The course aims to explore cutting-edge optimization methods for large models, focusing on parameter-efficient computation, dynamic knowledge expansion, and complex reasoning [2] - Key challenges in model optimization include parameter compression through pruning and quantization, dynamic knowledge injection techniques, and advanced reasoning paradigms [2][3] Group 2: Enrollment and Requirements - The course is limited to 6-8 participants per session, targeting individuals with a foundational understanding of deep learning and machine learning [4][8] - Participants are expected to have basic programming skills in Python and familiarity with PyTorch, along with a genuine interest in research [8] Group 3: Course Outcomes - The course aims to provide a systematic understanding of large model optimization, helping participants develop their own research ideas and enhance their coding skills [6][7] - Participants will receive guidance on writing and submitting academic papers, including methodologies for drafting and revising manuscripts [6][7] Group 4: Course Structure - The course spans 12 weeks of online group research followed by 2 weeks of paper guidance, covering topics such as model pruning, quantization, and dynamic knowledge expansion [7][18] - Each week focuses on specific themes, including advanced reasoning techniques and collaborative multi-agent systems [18][20] Group 5: Additional Information - The course will utilize publicly available datasets and baseline codes tailored to specific applications, ensuring practical relevance [15][16] - Participants will engage in discussions and hands-on experiments using mainstream large models like LLaMA and GPT [2][18]
提前10天成功判断理想无法完成25Q2交付指引, 今天再下5个判断
理想TOP2· 2025-06-27 10:17
Core Viewpoint - The company is unlikely to meet its Q2 2025 delivery guidance due to insufficient sales momentum and historical performance trends [1][2]. Group 1: Delivery Guidance and Sales Performance - The company's Q2 2025 delivery guidance is set between 123,000 and 128,000 units, with April and May deliveries recorded at 33,939 and 40,856 units respectively, indicating a need for June deliveries to be between 48,205 and 53,205 units [1]. - Historical data shows that the company has only exceeded weekly insurance numbers of 13,000 units six times, with the highest two-week total being 28,020 units in December 2024, suggesting that achieving the required sales volume in June is unlikely [2]. - If the company does not update its delivery guidance, it will mark the third time in history that it fails to meet quarterly delivery targets [2]. Group 2: Strategic Changes and Market Position - Recent reforms aim to enhance sales focus on value delivery, but current strategies are not effectively achieving this goal, which may impact sales in the short term [4]. - The company is expected to eventually align its strategies with its values, but this process may take time, leading to potential short-term sales declines [4]. - The recent success of a competitor, YU7, has prompted reflections on the company's unique strengths and areas for improvement, suggesting that the company should learn from the competitor's successful strategies [5]. Group 3: Market Dynamics and Consumer Perception - The company’s core product lines face challenges in maintaining consumer confidence, as repeated cycles of performance fluctuations can lead to loss of trust among consumers [6]. - The strong leadership of the company's founder is seen as a potential long-term advantage in competing with established players like Xiaomi, although the timeline for realizing this advantage remains uncertain [6]. - The current market perception of YU7 as a high-value vehicle is driven by consumer beliefs in its resale value, indicating that the company may need to enhance its product appeal to compete effectively [6][7].