VLM
Search documents
理想VLM/VLA盲区减速差异
理想TOP2· 2025-10-18 08:44
在写一个VLM VLA 在场景上的差异 举个最简单的例子: 盲区减速 原作者为微博用户大懒货 原文链接: https://weibo.com/2062985282/Q95d6BJkn 原内容: 这里我们能感受到的就是端到端模型是听了VLM模型的减速指令后进行的减速,因此就有割 裂感/规则感【都减p速到8-12km/h ,不考虑路口场景差异】etc :而VLA是另一逻辑 VLA的工作逻辑是用自研的基座模型去理解场景,因此是直接构建【盲区类的场景理解】 工作流是: 视频编码进LLM,LLM综合判断道路场景,宽度,流量etc … 然后直接输出Action 所以你的体感发现VLA的盲区减速档位更多了【接近不离散】,特别是不同道路的盲区减速的 G值差异很大,更加匹配场景交通流。而并非是以前e2e 听VLM这种感觉。 这个就是类似的【原生】的减速Action,而并非是双系统的指令体感。 E2E+VLM,策略是怎么做的? 首先VLM是一个视觉语言动作模型,因此研发会找大量【其实也没多少】,LLM特性而已。 丁字路口的场景视频和图像。让以Qwen这个基座模型具备丁字路口的场景的理解能力。 然后VLM的工作逻辑就是: 感知到无 ...
【汽车智能化10月投资策略】先发优势稳固,后发发力追赶,继续看好智能化主线!
东吴汽车黄细里团队· 2025-10-17 09:20
Core Viewpoint - The market is expected to refocus on investment opportunities in smart technology in Q4, driven by the ongoing AI trend and advancements in autonomous driving capabilities, particularly in Robotaxi applications [2][8]. Group 1: Q4 Smart Technology Outlook - The Q4 market will see a renewed emphasis on smart technology investment opportunities, as AI applications in the physical world are anticipated to exceed expectations in the next 3-5 years [2][8]. - Key catalysts for smart technology in Q4 include the release of Tesla's V14 version, Xiaopeng's upcoming technology day, and the introduction of new autonomous vehicles by various companies [2][8]. Group 2: Comparison with Last Year - Similarities with last year's Q4 include the expansion of AI applications, but this year emphasizes the evolution of AI logic rather than the resonance between automotive and AI logic [3][9]. - The focus has shifted from hardware opportunities and consumer sales to software opportunities and breakthroughs in B2B applications [3][9]. Group 3: Investment Strategy - The preferred investment strategy favors Hong Kong stocks over A-shares, prioritizing software over hardware and B2B over B2C applications, with recommended stocks including Xiaopeng Motors, Horizon Robotics, and Cao Cao Mobility [4][9]. - Key investment targets include integrated models for Robotaxi, technology providers, and the transformation of ride-hailing services [4][9]. Group 4: Smart Technology Market Dynamics - The price war among passenger car manufacturers is more intense than expected, which could significantly impact profitability across the supply chain [5]. - The recovery of terminal demand is below expectations, which may affect sales growth for car manufacturers [5]. Group 5: Smart Technology Development Review - In August, the penetration rate of smart technology reached 23.3%, with significant advancements in autonomous driving capabilities among leading players [10]. - By October, the focus will be on the iterative development of next-generation driving architectures and the sales performance of key smart vehicles [10]. Group 6: Consumer Willingness to Pay - The consumer willingness to pay for smart technology is expected to evolve in two phases, with the first phase focusing on helping car manufacturers sell vehicles and the second phase aiming for software monetization [20][18]. Group 7: Future Projections - By 2025-2027, the core task of automotive smart technology will be to achieve a penetration rate of 50%-80% for new energy vehicles, while the period from 2028-2030 is expected to see the large-scale commercialization of Robotaxi services [20][19]. Group 8: Smart Technology Supply Chain Tracking - The supply chain for smart technology is being closely monitored, with various companies contributing to different aspects of the technology, including perception, decision-making, and execution [14][13]. Group 9: Key Metrics and Trends - The penetration rates for smart driving capabilities among different brands show significant variation, with Xiaopeng at 76.1% and Wey at 95.6% [25][26]. - The overall market dynamics indicate a competitive landscape with rapid advancements in technology and varying consumer adoption rates [24][23].
Waymo自动驾驶最新探索:世界模型、长尾问题、最重要的东西
自动驾驶之心· 2025-10-10 23:32
Core Insights - Waymo has developed a large-scale AI model called the Waymo Foundation Model, which supports vehicle perception, behavior prediction, scene simulation, and driving decision-making [5][11] - The model integrates data from multiple sensors to understand the environment, similar to how large language models operate [5][11] - The focus on data quality and selection is crucial for ensuring that the model addresses the right problems effectively [25][30] Group 1: World Model Development - Waymo's world model encodes all sensor data and incorporates world knowledge, enabling it to decode driving-related tasks [11] - The model allows for real-time perception and decision-making on the vehicle while simulating real driving environments in the cloud for testing [7][11] - The long-tail problem in autonomous driving, which includes complex scenarios like adverse weather and construction, remains a significant challenge [11][12] Group 2: Addressing Long-Tail Problems - Weather conditions such as rain and snow present unique challenges for autonomous driving, requiring high precision in judgment [12][14] - Low visibility scenarios necessitate the use of multi-modal sensors to detect objects effectively [15] - Occlusion reasoning is critical for understanding hidden objects and ensuring driving safety [18][21] Group 3: Complex Scene Understanding - Understanding complex scenes like construction zones and dynamic environments requires advanced reasoning capabilities [24] - Real-time responses to dynamic signals, such as traffic officer gestures, are essential for safe navigation [24] - The use of large language models is being explored to enhance scene understanding and decision-making [24] Group 4: Importance of Data, Algorithms, and Computing Power - The three critical components for successful autonomous driving are data, algorithms, and computing power, with a strong emphasis on data quality [25][30] - Efficient data mining from vast video datasets is vital for understanding driving events [30] - Quick decision-making is essential for safety and smooth operation, with a focus on reducing response times across the algorithmic chain [30][31] Group 5: Operational Infrastructure - Waymo's operational facilities, including depots and modification workshops, are crucial for the efficient deployment of Level 4 autonomous vehicles [33] - Vehicles can autonomously navigate to charging stations and begin operations after sensor installation [33] - The engineering challenges of scaling autonomous driving technology require collaboration with traditional automotive engineers [34] Group 6: Sensor and Algorithm Response - The responsiveness of sensors, such as camera frame rates, is critical for effective autonomous driving [36] - Algorithms must process data at high frequencies to ensure timely execution of driving commands [36] - The evolution of vehicle control systems is moving towards higher frequency responses, particularly in electric and electronically controlled systems [36]
李想目前对AI兴趣远大于汽车硬件维度产品细节打磨
理想TOP2· 2025-09-01 07:50
Core Viewpoints - Li Xiang's personal interest in AI currently outweighs the focus on the incremental details of automotive hardware products [1][4] - Discussing the short-term market, Li Xiang's preference for AI over hardware may pose a potential risk to short-term sales, as many consumers prefer hardware-defined products [1] - The foundational anchor for both short-term and long-term commercial value is the product's utility, supported by varying levels of emotional value; in the AI era, models are products [1] - Within a three-month timeframe, AI-related product utility is unlikely to reach early mainstream adoption, remaining in the early adopter phase, with low emotional value among the general public [1] Detailed Analysis - The head of the first product line, Lao Tang, actively shares the product development process online, while the heads of the second and third product lines, Zhang Xiao and Li Xinyang, are less inclined to do so [3] - The MEGA Home was developed based on user feedback regarding accessibility for the elderly, with differing opinions between Li Xiang and Lao Tang on design solutions [3] - Li Xiang has been the primary decision-maker for many product details in the Li ONE, while there is speculation that the i8 may shift to a configuration with fewer options, likely influenced by Li Xiang [3] - There is no evidence from public information that Li Xiang has strongly insisted on hardware dimension enhancements for the new product lines [3] - Li Xiang's strong insistence on running VLA on dual Orin chips led to significant technical challenges being overcome, showcasing his first-principles thinking [5] - All vehicles equipped with the Thor chip are expected to be able to switch to Li Auto's own autonomous driving chip in the future, although it is uncertain if the Orin chip will also be replaceable [5]
何小鹏回应:与特斯拉市值差50倍合理吗?劝雷军造车是“害”他吗?
3 6 Ke· 2025-08-28 09:43
Core Insights - He Xiaopeng emphasized that urging Lei Jun to enter the automotive industry was not intended to harm him, highlighting the challenges of the sector [1] - The company claims to be the only one in China to have truly developed a Vehicle Level Automation (VLA) system, showcasing confidence in its technological advancements [1][18] - He Xiaopeng expressed optimism about the company's valuation changing significantly within six months as Robotaxi services are expected to launch [1][26] Product and Market Strategy - The new P7 model sold 10,000 units in just 7 minutes, indicating strong market demand [3] - The P7 is considered a crucial product for the company, with a focus on simplicity and cutting-edge technology [3][4] - The company aims to achieve top-three production capacity in its class for the P7, leveraging modular manufacturing improvements [4] Financial Performance and Cost Structure - He Xiaopeng discussed the challenges of profitability in the automotive sector, noting that traditional cost structures do not apply to electric vehicles, where battery costs account for 40%-50% of total expenses [7] - The company anticipates recovering previous losses within one to two years, thanks to its unique integration of software and hardware in smart vehicles [10] - The company plans to invest approximately 5 billion yuan annually in VLA development, emphasizing the need for substantial investment to achieve meaningful advancements [16] Technological Development - The company is transitioning from single-unit intelligence to group intelligence in its AI systems, with a long-term vision extending to 2027-2028 [10] - He Xiaopeng highlighted the importance of VLA and VLM (Vehicle Level Model) integration, suggesting that the latter will enhance task execution capabilities [20][22] - The company is utilizing both NVIDIA and domestic GPUs, with over 30,000 units deployed, to enhance AI capabilities [24] Competitive Landscape - He Xiaopeng acknowledged the significant valuation gap between the company and Tesla, attributing it to broader market trends and the company's potential yet to be fully realized [26] - The company believes that the automotive industry will see a differentiation in capabilities among competitors in the near future, driven by substantial investments in AI [22] Industry Insights - He Xiaopeng shared insights on the challenges of the automotive industry, likening it to a marathon where resilience is crucial for success [30][32] - The company recognizes the complexities of merging hardware and software effectively, which poses a greater challenge compared to purely digital companies [27]
可以留意一下10位业内人士如何看VLA
理想TOP2· 2025-07-21 14:36
Core Viewpoints - The current development of cutting-edge technologies in autonomous driving is not yet fully mature for mass production, with significant challenges remaining to be addressed [1][27][31] - Emerging technologies such as VLA/VLM, diffusion models, closed-loop simulation, and reinforcement learning are seen as potential key directions for future exploration in autonomous driving [6][7][28] - The choice between deepening expertise in autonomous driving or transitioning to embodied intelligence depends on individual circumstances and market dynamics [19][34] Group 1: Current Technology Maturity - The BEV (Bird's Eye View) perception model has reached a level of maturity suitable for mass production, while other models like E2E (End-to-End) are still in the experimental phase [16][31] - There is a consensus that the existing models struggle with corner cases, particularly in complex driving scenarios, indicating that while basic functionalities are in place, advanced capabilities are still lacking [16][24][31] - The industry is witnessing a shift towards utilizing larger models and advanced techniques to enhance scene understanding and decision-making processes in autonomous vehicles [26][28] Group 2: Emerging Technologies - VLA/VLM is viewed as a promising direction for the next generation of autonomous driving, with the potential to improve reasoning capabilities and safety [2][28] - The application of reinforcement learning is recognized as having significant potential, particularly when combined with effective simulation environments [6][32] - Diffusion models are being explored for their ability to generate multi-modal trajectories, which could be beneficial in uncertain driving conditions [7][26] Group 3: Future Directions - Future advancements in autonomous driving technology are expected to focus on enhancing safety, improving passenger experience, and achieving comprehensive scene coverage [20][28] - The integration of closed-loop simulations and data-driven approaches is essential for refining autonomous driving systems and ensuring their reliability [20][30] - The industry is moving towards a data-driven model where the efficiency of data collection, cleaning, labeling, training, and validation will determine competitive advantage [20][22] Group 4: Career Choices - The decision to specialize in autonomous driving or shift to embodied intelligence should consider personal interests, market trends, and the maturity of each field [19][34] - The autonomous driving sector is perceived as having more immediate opportunities for impactful work compared to the still-developing field of embodied intelligence [19][34]
师兄自己发了篇自动驾大模型,申博去TOP2了。。。
自动驾驶之心· 2025-07-09 12:56
Core Viewpoint - The article discusses the advancements in large models (LLMs) for autonomous driving, highlighting the need for optimization in efficiency, knowledge expansion, and reasoning capabilities as the technology matures [2][3]. Group 1: Development of Large Models - Companies like Li Auto and Huawei are implementing their own VLA and VLM solutions, indicating a trend towards the practical application of large models in autonomous driving [2]. - The focus for the next generation of large models includes lightweight design, hardware adaptation, knowledge distillation, quantization acceleration, and efficient fine-tuning [2][3]. Group 2: Course Introduction - A course is being offered to explore cutting-edge optimization methods for large models, focusing on parameter-efficient computation, dynamic knowledge expansion, and complex reasoning [3]. - The course aims to address core challenges in model optimization, including pruning, quantization, retrieval-augmented generation (RAG), and advanced reasoning paradigms like Chain-of-Thought (CoT) and reinforcement learning [3][4]. Group 3: Enrollment and Requirements - The course will accept a maximum of 8 students per session, targeting individuals with a background in deep learning or machine learning who are familiar with Python and PyTorch [5][10]. - Participants will gain a systematic understanding of large model optimization, practical coding skills, and insights into academic writing and publication processes [8][10]. Group 4: Course Outcomes - Students will learn to combine theoretical knowledge with practical coding, develop their own research ideas, and produce a draft of a research paper [8][9]. - The course includes a structured timeline with specific topics each week, covering model pruning, quantization, efficient fine-tuning, and advanced reasoning techniques [20].
大模型在自动驾驶后期的落地与研究方向有哪些?
自动驾驶之心· 2025-07-07 23:31
Core Insights - The article discusses the evolving landscape of large models in autonomous driving, highlighting the focus on lightweight solutions, hardware compatibility, knowledge distillation, and efficient fine-tuning of large models [1] - It emphasizes the importance of advanced reasoning paradigms such as Chain-of-Thought (CoT) and VLA combined with reinforcement learning in enhancing spatial perception capabilities [1] Group 1: Course Overview - The course aims to explore cutting-edge optimization methods for large models, focusing on parameter-efficient computation, dynamic knowledge expansion, and complex reasoning [2] - Key challenges in model optimization include parameter compression through pruning and quantization, dynamic knowledge injection techniques, and advanced reasoning paradigms [2][3] Group 2: Enrollment and Requirements - The course is limited to 6-8 participants per session, targeting individuals with a foundational understanding of deep learning and machine learning [4][8] - Participants are expected to have basic programming skills in Python and familiarity with PyTorch, along with a genuine interest in research [8] Group 3: Course Outcomes - The course aims to provide a systematic understanding of large model optimization, helping participants develop their own research ideas and enhance their coding skills [6][7] - Participants will receive guidance on writing and submitting academic papers, including methodologies for drafting and revising manuscripts [6][7] Group 4: Course Structure - The course spans 12 weeks of online group research followed by 2 weeks of paper guidance, covering topics such as model pruning, quantization, and dynamic knowledge expansion [7][18] - Each week focuses on specific themes, including advanced reasoning techniques and collaborative multi-agent systems [18][20] Group 5: Additional Information - The course will utilize publicly available datasets and baseline codes tailored to specific applications, ensuring practical relevance [15][16] - Participants will engage in discussions and hands-on experiments using mainstream large models like LLaMA and GPT [2][18]
How fast are LLM inference engines anyway? — Charles Frye, Modal
AI Engineer· 2025-06-27 10:01
Open Model Landscape & Benchmarking - Open-weight models are catching up to Frontier Labs in capabilities, making many AI Engineer applications possible that weren't before [1] - Open-source engines like VLM, SGLang, and Tensor TLM are readily available, reducing the need for custom model implementations [1] - Modal has created a public benchmark (modal.com/llmalmanac) for comparing the performance of different models and engines across various context lengths [2][3] Performance Analysis - Throughput is significantly higher when processing longer input contexts (prefill) compared to generating longer output sequences (decode), with up to a 4x improvement observed [15][16] - The time to first token (latency) remains nearly constant even with a 10x increase in input tokens, suggesting a "free lunch" by prioritizing context over reasoning [19] - Gemma 7B models show roughly the same throughput as Qwen 3 models, despite being 10x smaller in model weights, indicating optimization differences [12] Optimization & Infrastructure - Scaling out (adding more GPUs) is the primary method for increasing total throughput, rather than scaling up (optimizing a single GPU) [23] - Benchmarking methodology involves sending a thousand requests to determine maximum throughput and sending single requests to determine fastest possible server run time [24][25] - BF16 precision offers slower tensor core support compared to FP8 or FP4, suggesting potential for even greater performance gains with lower precision formats on newer hardware like Blackwell [16][17]