Workflow
视觉语言模型
icon
Search documents
真实场景也能批量造「险」!VLM+扩散模型打造极限测试
具身智能之心· 2025-08-26 00:03
更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 近期,懂车帝的《懂车智炼场》栏目对量产自动驾驶系统的NOA辅助驾驶功能进行了安全关键场景测试。 编辑丨新智元 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 结果显示,在黑夜施工工地、高速公路前方车辆发生事故以及障碍物后突然驶出车辆等高风险场景中,目前尚无任何系统能够在测试中做到完全避免事 故。 这类安全关键场 景在真实道路上虽不常见,但一旦发生,可能导致人员伤亡或严重交通事故。 为了提升自动驾驶系统在此类情境下的可靠性,必须在多样化且高风险的安全关键场景中进行广泛测试。 然而,这类极端场景在现实中采集难度极高——发生频率低、风险大、难以批量获取。 在仿真环境中,类似的场景虽然可以批量制造,但现有模拟器在画面真实度上与现实仍有差距,难以直接用于真实域下端到端系统的极限测试。 为此,来自 浙江大学与与哈工大(深圳) 的研究团队提出了 SafeMVDrive ——首个面向真实域的多视角安全关键驾驶视频生成框架。 它将 VLM关键车辆选择器 与两阶段轨迹生成 ...
均普智能发展逐步多元化 具身智能机器人业务实现突破式进展
Zheng Quan Ri Bao Wang· 2025-08-23 04:13
Core Insights - Junpu Intelligent achieved a revenue of 1.032 billion yuan in the first half of 2025, with a backlog of orders amounting to 3.464 billion yuan, indicating stable business development [1] - The company secured new orders worth 1.112 billion yuan, representing a year-on-year growth of 20.22%, with non-automotive orders in the medical and high-end consumer goods sectors reaching 445 million yuan, accounting for approximately 40% of total new orders [1] Group 1: Medical Sector Developments - In the medical health sector, Junpu Intelligent successfully won a project for the production line of continuous glucose monitoring (CGM) sensors for an internationally leading diagnostic equipment manufacturer, with an annual design capacity of 15 million units [1] - The company established a strategic partnership with a leading domestic medical enterprise to jointly develop key platform cam technology for insulin injection pens [1] - The acquisition of the first fully automated production line project for insulin injection pens and automatic injectors signifies the market recognition of Junpu Intelligent's technological strength in high-value medical consumables intelligent manufacturing [1] Group 2: High-End Consumer Goods Innovations - In the high-end consumer goods sector, Junpu Intelligent's innovative achievements include the successful application of its self-developed "multi-blade intelligent assembly process" for an international brand's razor blade assembly order [1] - The company received an order for a flexible assembly line for high-end electric toothbrush drive units, which received high praise from the client [1] Group 3: Robotics Advancements - Junpu Intelligent's humanoid robot "Jarvis 2.0" successfully completed a multimodal upgrade, integrating various AI models such as large language models (LLM) and visual language models (VLM), enabling multilingual dialogue, voice command control, and visual guidance for object handling [2] - The "Jarvis Lightweight 1.0" version has been officially delivered to Tsinghua University and other institutions for research and teaching purposes [2] - The joint venture between Junpu Intelligent's Ningbo Junpu Artificial Intelligence and Humanoid Robot Research Institute and Zhiyuan Robotics has officially commenced operations, with the first mass production pilot line achieving production [2] - By the end of June, the joint venture received over 28 million yuan in orders for humanoid robot production and sales, with three models of embodied intelligent robots currently in production [2]
又帮到了一位同学拿到了VLA算法岗......
具身智能之心· 2025-08-22 16:03
昨天下午有个小朋友,底子还不错,C9即将研三。正在秋招,来找峰哥诉苦,同门找到了VLA算法岗位 (一个特别有钱的具身公司),我想转来不及了......刚开始都是一起做的传统机器人,SLAM相关。后面不 知道他做了什么项目,进度这么快,面试几家都过了。 这两天同门才刚给我推荐你们社区,体系很完整, 就怕有点晚了。 8月份,陆续有同学找到峰哥,不是拿到口头offer,就是想转具身担心来不及。虽然秋招将近, 但还是那 句话,"什么时候都不算太晚。" 尽快把完整的具身路线补齐才是重中之重,特别是数采和算法、仿真等。 如果你没有较强独立学习和搜索问题的能力,可以来我们的具身社区,也是目前国内最大最全的具身学习 平台【具身智能之心】知识星球。 "具身智能之心知识星球"目前集视频 + 图文 + 学习路线 + 问答 + 求职交流为一体,是一个综合类的具身社 区,近2000人了。我们期望未来2年内做到近万人的规模。给大家打造一个交流+技术分享的聚集地,是许 多初学者和进阶的同学经常逛的地方。 社区内部还经常为大家解答各类实用问题:如何使用设备?如何有效采集数据?如何部署VA、VLA模型 等。是采集背景太复杂还是数据比较dirt ...
VLA方向的论文还不知怎么下手?有的同学已经CCF-A了......
自动驾驶之心· 2025-08-22 12:00
理想VLA司机大模型已经上车了!从发布会上看,VLA 能力的提升集中体现在三点:更懂语义 (多模态输入)、更擅长推理(思维链)、更接近人类驾驶直觉(轨迹规划)。发布会上展示了 四个核心能力:空间理解能力、思维能力、沟通与记忆能力以及行为能力。 ⼀、VLA科研论文辅导课题来啦⭐ 其中思维能力、沟通与记忆能力是语言模型赋予的能力,其中记忆能力还用到了RAG。下面是理 想VLA司机大模型思维链输出的demo:结合了动态目标、静态元素、导航地图、空间理解等等元 素。毫无疑问,VLA已经是自动驾驶学术界和工业界最为关注的方向。 而VLA是从VLM+E2E一路发展过来的,涵盖了端到端、轨迹预测、视觉语言模型、强化学习等多 个前沿技术栈。。而传统的BEV感知、车道线、Occupancy等工作相对较少出现在顶会了,最近也 有很多同学陆续来咨询柱哥,传统的感知、规划这块还能继续发论文吗?感觉工作都已经被做的 七七八八了,审稿人会打高分吗? 说到传统的感知、规划等任务,工业界都还在继续优化方案!但学术界基本都慢慢转向大模型与 VLA了,这个领域还有很多工作可以做的子领域... 之前我们已经开展了第一期VLA论文指导班,反响很不错 ...
在复杂真实场景中评估 π0 这类通用 policy 的性能和边界
自动驾驶之心· 2025-08-17 03:23
Core Viewpoint - The article discusses the evaluation of the PI0-FAST-DROID model in real-world scenarios, highlighting its potential and limitations in robotic operations, particularly in handling new objects and tasks without extensive prior training [4][10][77]. Evaluation Method - The evaluation utilized the π₀-FAST-DROID model, specifically fine-tuned for the DROID robot platform, which includes a Franka Panda robot equipped with cameras [5][10]. - The assessment involved over 300 trials across various tasks, focusing on the model's ability to perform in diverse environments, particularly in a kitchen setting [10][11]. Findings - The model demonstrated a strong prior assumption of reasonable behavior, often producing intelligent actions, but these were not always sufficient to complete tasks [11]. - Prompt engineering was crucial, as variations in task descriptions significantly affected success rates, indicating the need for clear and structured prompts [12][59]. - The model exhibited impressive visual-language understanding and could mimic continuous actions across different scenarios [13][28]. Performance in Complex Scenarios - The model showed robust performance in recognizing and manipulating transparent objects, which is a significant challenge for traditional methods [20][27]. - It maintained focus on tasks despite human movement in the background, suggesting effective prioritization of relevant visual inputs [25]. Limitations - The model faced challenges with semantic ambiguity and often froze during tasks, particularly when it encountered unfamiliar commands or objects [39][42]. - It lacked memory, which hindered its ability to perform multi-step tasks effectively, leading to premature task completion or freezing [43][32]. - The model struggled with precise spatial reasoning, particularly in estimating distances and heights, which resulted in failures during object manipulation tasks [48][50]. Task-Specific Performance - The model's performance varied across different task categories, with notable success in simple tasks but significant challenges in complex operations like pouring liquids and interacting with household appliances [89][91][100]. - For instance, it achieved a 73.3% progress rate in pouring toy items but only 20% when dealing with real liquids, indicating limitations in physical capabilities [90]. Conclusion - The evaluation indicates that while the PI0 model shows promise as a generalist policy in robotic applications, it still requires significant improvements in instruction adherence, fine manipulation, and handling partial observability [77][88].
VLA与自动驾驶科研论文辅导第二期来啦~
自动驾驶之心· 2025-08-16 12:00
Core Insights - The article discusses the recent advancements in the Li Auto VLA driver model, highlighting its improved capabilities in understanding semantics, reasoning, and trajectory planning, which are crucial for autonomous driving [1][3]. Group 1: VLA Model Capabilities - The VLA model's enhancements focus on four core abilities: spatial understanding, reasoning, communication and memory, and behavioral capabilities [1]. - The reasoning and communication abilities are derived from language models, with memory capabilities utilizing RAG [3]. Group 2: Research and Development Trends - The VLA model has evolved from VLM+E2E, incorporating various cutting-edge technologies such as end-to-end learning, trajectory prediction, visual language models, and reinforcement learning [5]. - While traditional perception and planning tasks are still being optimized in the industry, the academic community is increasingly shifting towards large models and VLA, indicating a wealth of subfields still open for research [5]. Group 3: VLA Research Guidance Program - A VLA research paper guidance program has been initiated, aimed at helping participants systematically grasp key theoretical knowledge and develop their own research ideas [6]. - The program includes a structured 12-week online group research course followed by 2 weeks of paper guidance and a 10-week maintenance period for paper development [14][34]. Group 4: Course Structure and Content - The course covers various topics over 14 weeks, including traditional end-to-end autonomous driving, VLA end-to-end models, and writing methodologies for research papers [9][11][35]. - Participants will gain insights into classic and cutting-edge papers, coding skills, and methods for writing and submitting research papers [20][34]. Group 5: Enrollment and Requirements - The program is limited to 6-8 participants per session, targeting individuals with a background in deep learning and basic knowledge of autonomous driving algorithms [12][15]. - Participants are expected to have a foundational understanding of Python and PyTorch, with access to high-performance computing resources recommended [21].
全球工业机器人市场遇冷,中国逆势增长成最大亮点
第一财经· 2025-08-10 01:23
Core Viewpoint - The industrial robot market is facing challenges in 2024, with a global decline in new installations and significant regional disparities, particularly highlighting China's growth amidst a global downturn [3][4]. Group 1: Global Market Trends - In 2023, global industrial robot installations decreased by 3% to approximately 523,000 units, with Asia down 2%, Europe down 6%, and the Americas down 9% [3]. - The automotive industry has seen a significant decline, while the electronics sector experienced slight growth. Other industries such as metals, machinery, plastics, chemicals, and food are in a growth phase [3]. Group 2: China's Market Performance - China is projected to install around 290,000 industrial robots in 2024, marking a 5% increase and raising its global market share from 51% in 2023 to 54% [3]. - The structure of installations has shifted, with general industrial applications rising from 38% five years ago to 53%, while the electronics sector's share dropped from 45% to 28% [3][4]. - China remains the largest industrial robot market globally for 12 consecutive years, with sales expected to reach 302,000 units in 2024 [4]. Group 3: Regional Comparisons - Japan's industrial robot installations fell by 7% to 43,000 units, with only the automotive sector showing an 11% increase [6]. - The U.S. market shrank by 9%, with the automotive sector contributing nearly 40% of installations [6]. - Europe experienced a 6% decline but still achieved a historical second-highest installation level at 86,000 units, with plastics, chemicals, and food industries emerging as new growth areas [6]. Group 4: Industry Innovations and Future Trends - The integration of artificial intelligence and advancements in digital twin technology are expected to enhance human-robot interaction and reshape production processes [6]. - The logistics and material handling sectors are anticipated to be early adopters of humanoid robots, with construction, laboratory automation, and warehousing also accelerating robot penetration [6].
全球工业机器人市场遇冷 中国逆势增长成最大亮点
Di Yi Cai Jing· 2025-08-09 07:17
Core Insights - 2024 is expected to be a challenging year for the industrial robotics sector, with a global decline in new installations by 3% to approximately 523,000 units in 2023 [1] - Major markets in Asia, Europe, and the Americas all experienced downturns, with Asia down 2%, Europe down 6%, and the Americas down 9% [1] - China stands out as the only bright spot, with an expected growth of 5% in new installations, reaching around 290,000 units in 2024, increasing its global market share from 51% in 2023 to 54% [1][2] Market Performance - The electronics and automotive sectors have been the leading industries for industrial robots since 2020, with the electronics sector showing slight growth while the automotive sector faced significant declines [1] - In China, the industrial robot market is projected to reach 302,000 units in 2024, maintaining its position as the largest industrial robot market globally for 12 consecutive years [2] - Japan's industrial robot installations fell by 7% to 43,000 units, while the U.S. market shrank by 9%, with the automotive sector contributing nearly 40% of installations [4] Regional Analysis - China is the world's largest producer of industrial robots, with production increasing from 33,000 units in 2015 to 556,000 units in 2024, and service robots reaching 10.5 million units, a 34.3% year-on-year growth [2] - The robot density in China is 470 units per 10,000 workers, surpassing Japan and Germany, with South Korea and Singapore leading at 1,012 and 770 units respectively [4] - Despite geopolitical tensions, Asia is still viewed positively, with a forecast of single-digit growth in industry orders in Q1 2025 and a mild recovery in the electronics sector [4] Industry Trends - The robotics industry is increasingly focusing on the integration of artificial intelligence, with advancements in digital twin technology and enhanced human-machine interaction capabilities [4] - Key areas for early adoption of robotics include logistics and material handling, with construction, laboratory automation, and warehousing also seeing accelerated penetration [4]
全球工业机器人市场遇冷,中国逆势增长成最大亮点
Di Yi Cai Jing· 2025-08-09 07:13
Core Insights - The global industrial robot market faced a decline in new installations in 2023, with a 3% drop to approximately 523,000 units, affecting major markets in Asia, Europe, and the Americas [1][4] - China remains the only bright spot in the market, with an expected 5% growth in new installations for 2024, reaching around 290,000 units, increasing its global market share from 51% in 2023 to 54% [1][2] - The structure of the market is changing, with general industrial applications increasing their share from 38% five years ago to 53%, while the electronics sector's share has decreased from 45% to 28% [1] Regional Performance - Japan's industrial robot installations fell by 7% to 43,000 units, with only the automotive sector showing an 11% growth [4] - The U.S. market shrank by 9%, with the automotive industry contributing nearly 40% of installations [4] - Europe experienced a 6% decline but still achieved a historical second-highest installation level at 86,000 units, with the plastics, chemicals, and food sectors emerging as new growth areas [4] Industry Trends - The density of industrial robots per 10,000 workers indicates varying levels of automation, with South Korea (1,012 units), Singapore (770 units), and China (470 units) leading the way, surpassing Japan and Germany [4] - Despite geopolitical tensions and tariff disputes, the Asian market is expected to see growth, with a mild recovery in the electronics sector anticipated in early 2025 [4] - Future trends in the robotics industry include a focus on AI integration, advancements in digital twin technology, and improvements in human-robot interaction through visual language models [4]
性能暴涨30%!港中文ReAL-AD:类人推理的端到端算法 (ICCV'25)
自动驾驶之心· 2025-08-03 23:32
Core Viewpoint - The article discusses the ReAL-AD framework, which integrates human-like reasoning into end-to-end autonomous driving systems, enhancing decision-making processes through a structured approach that mimics human cognitive functions [3][43]. Group 1: Framework Overview - ReAL-AD employs a reasoning-enhanced learning framework based on a three-layer human cognitive model: driving strategy, decision-making, and operation [3][5]. - The framework incorporates a visual-language model (VLM) to improve environmental perception and structured reasoning capabilities, allowing for a more nuanced decision-making process [3][5]. Group 2: Components of ReAL-AD - The framework consists of three main components: 1. **Strategic Reasoning Injector**: Utilizes VLM to generate insights for complex traffic situations, forming high-level driving strategies [5][11]. 2. **Tactical Reasoning Integrator**: Converts strategic intentions into executable tactical choices, bridging the gap between strategy and operational decisions [5][14]. 3. **Hierarchical Trajectory Decoder**: Simulates human decision-making by establishing rough motion patterns before refining them into detailed trajectories [5][20]. Group 3: Performance Evaluation - In open-loop evaluations, ReAL-AD demonstrated significant improvements over baseline methods, achieving over 30% better performance in L2 error and collision rates [36]. - The framework achieved the lowest average L2 error of 0.48 meters and a collision rate of 0.15% on the nuScenes dataset, indicating enhanced learning efficiency in driving capabilities [36]. - Closed-loop evaluations showed that the introduction of the ReAL-AD framework significantly improved driving scores and successful path completions compared to baseline models [37]. Group 4: Experimental Setup - The evaluation utilized the nuScenes dataset, which includes 1,000 scenes sampled at 2Hz, and the Bench2Drive dataset, covering 44 scenarios and 23 weather conditions [34]. - Metrics for evaluation included L2 error, collision rates, driving scores, and success rates, providing a comprehensive assessment of the framework's performance [35][39]. Group 5: Ablation Studies - Ablation studies indicated that removing the Strategic Reasoning Injector led to a 12% increase in average L2 error and a 19% increase in collision rates, highlighting its importance in guiding decision-making [40]. - The Tactical Reasoning Integrator was shown to reduce average L2 error by 0.14 meters and collision rates by 0.05%, emphasizing the value of tactical commands in planning [41]. - Replacing the Hierarchical Trajectory Decoder with a multi-layer perceptron resulted in increased L2 error and collision rates, underscoring the necessity of a hierarchical decoding process for trajectory prediction [41].