视觉语言动作模型
Search documents
 理想VLM/VLA盲区减速差异
 理想TOP2· 2025-10-18 08:44
在写一个VLM VLA 在场景上的差异 举个最简单的例子: 盲区减速 原作者为微博用户大懒货 原文链接: https://weibo.com/2062985282/Q95d6BJkn 原内容: 这里我们能感受到的就是端到端模型是听了VLM模型的减速指令后进行的减速,因此就有割 裂感/规则感【都减p速到8-12km/h ,不考虑路口场景差异】etc :而VLA是另一逻辑 VLA的工作逻辑是用自研的基座模型去理解场景,因此是直接构建【盲区类的场景理解】 工作流是: 视频编码进LLM,LLM综合判断道路场景,宽度,流量etc … 然后直接输出Action 所以你的体感发现VLA的盲区减速档位更多了【接近不离散】,特别是不同道路的盲区减速的 G值差异很大,更加匹配场景交通流。而并非是以前e2e 听VLM这种感觉。 这个就是类似的【原生】的减速Action,而并非是双系统的指令体感。 E2E+VLM,策略是怎么做的? 首先VLM是一个视觉语言动作模型,因此研发会找大量【其实也没多少】,LLM特性而已。 丁字路口的场景视频和图像。让以Qwen这个基座模型具备丁字路口的场景的理解能力。 然后VLM的工作逻辑就是: 感知到无 ...
 机器人感知大升级!轻量化注入几何先验,成功率提升31%
 量子位· 2025-09-28 11:54
当前基于显式深度输入的增强方案虽有效,但依赖额外传感器或深度估计网络,存在部署难度、精度噪声等问题。 Evo-0团队 投稿 量子位 | 公众号 QbitAI 在机器人学习领域,如何让AI真正"看懂"三维世界一直是个难题。 VLA模型通常建立在预训练视觉语言模型(VLM)之上,仅基于2D图像-文本数据训练,缺乏真实世界操作所需的3D空间理解能力。 为此, 上海交通大学和剑桥大学提出一种增强视觉语言动作(VLA)模型空间理解能力的轻量化方法Evo-0, 通过隐式注入3D几何先验 , 无需显式深度输入或额外传感器。 该方法利用视觉几何基础模型VGGT, 从多视角RGB图像中提取3D结构信息 ,并融合到原有视觉语言模型中,实现空间感知能力的显著提 升。 在rlbench仿真实验中,Evo-0在5个需要精细操作的任务上,平均成功率超过基线pi0 15%,超过openvla-oft 31%。 Evo-0:实现2D–3D表征的融合 Evo-0提出将VGGT作为空间编码器,引入VGGT训练过程中针对3D结构任务提取的t3^D token。这些token包含深度上下文、跨视图空间对 应关系等几何信息。 模型引入一个cross- ...
 人形机器人,更快更高更强
 Ren Min Ri Bao· 2025-09-01 01:03
 Core Insights - The sales of humanoid robots in China are expected to exceed 10,000 units this year, representing a year-on-year growth of 125% [1] - The development of the humanoid robot industry is characterized by rapid innovation and application across various sectors, including industrial manufacturing, retail delivery, and restaurant services [1][2]   Trend 1: Faster Innovation and Application - The Chinese government has included "embodied intelligence" in its work report, emphasizing the importance of humanoid robots as a typical application in the "Artificial Intelligence+" initiative [3] - Various local policies are being implemented to support humanoid robot development, with significant funding and investment initiatives announced in cities like Beijing, Shanghai, and Hangzhou [3] - Experts indicate that the industry has reached a "turning point" for large-scale production, with improvements in hardware and intelligence capabilities [3][4]   Trend 2: Higher Technical Standards - The development of humanoid robots relies on the synergy of hardware innovation, advanced algorithms, and high-quality data accumulation [7] - The industry is witnessing rapid advancements in core components, such as actuators and sensors, which are becoming more standardized and cost-effective [4][7] - The integration of technologies like satellite navigation and 5G communication is enhancing the capabilities of humanoid robots [8][9]   Trend 3: Stronger Comprehensive Performance - Humanoid robots are evolving towards full autonomy, moving away from remote control operations to self-sufficient decision-making and execution [11] - The complexity of humanoid robot development involves multiple fields, including mechanical structure, drive systems, and artificial intelligence [12] - The potential applications of humanoid robots are expanding, with roles in production, service industries, and even family settings, addressing diverse needs [12]
 元戎启行VLA模型三季度要量产,能否冲破市场+技术壁垒?
 Nan Fang Du Shi Bao· 2025-06-13 15:04
 Core Insights - Yuanrong Qixing announced its VLA model will be launched to consumers in Q3 2025, with five vehicle models expected to be on the road within the year [1] - The VLA model features four key capabilities: blind spot detection, obstacle recognition, road sign interpretation, and voice control, generating significant interest in the industry [1][3]   Company Overview - Yuanrong Qixing, established in 2018 and based in Shenzhen, has focused on autonomous driving and vehicle networking technologies [3] - The VLA model, or Vision Language Action Model, is considered the company's "secret weapon" and offers a differentiating factor compared to traditional end-to-end models by addressing the "black box problem" [3][4]   Technology and Innovation - The VLA model enhances transparency by clearly displaying the reasoning process behind its decisions, which increases user trust in the autonomous driving system [4] - In Q4 2024, Yuanrong Qixing captured over 15% market share in the high-level intelligent driving assistance sector with a single mass-produced model [6] - The company has optimized costs through collaboration with Qualcomm, achieving complex scenario operations on a 100 TOPS platform, significantly reducing the price of its intelligent driving solutions [7]   Market Challenges - The intelligent driving sector is highly competitive, with many players already established and partnerships formed with automotive manufacturers [8] - Yuanrong Qixing faces challenges in gaining market recognition and acceptance for the VLA model amidst increasing consumer caution due to recent accidents and stringent regulations [8] - The company has successfully raised $100 million in its C1 financing round in November 2024, but still faces financial pressures in a cooling investment environment [8][9]   Strategic Considerations - The push to market the VLA model represents both a technological showcase and a market challenge, as the company shifts focus from L4 to L2 capabilities, potentially sacrificing some advanced technology for mass production [9] - The need for ongoing funding is critical to avoid disruptions in technology development and to maintain competitive positioning in the market [9]