Workflow
VLA模型
icon
Search documents
马斯克diss英伟达自动驾驶:再等五六年
Sou Hu Cai Jing· 2026-01-09 08:00
一个是"无所不能、牵引全球科技发展潮流"的钢铁侠,一个是手握人工智能核心算力的皮衣教主,也都与美国白宫关联密切,硅谷的两任科技偶像,马斯 克与黄仁勋,终于有了争锋的现实条件。 就在2026年的CES消费电子展上,英伟达发布Alpamayo自动驾驶平台,向世界展示的"AI推理"时刻,其实在C次元看来是招揽汽车公司作为客户,本质上 就是做又一家汽车界的安卓。 本来大家以为,特斯拉做整车,英伟达卖芯片,八竿子打不到一起去竞争。 然而自从老马存了"对其他汽车制造商推销FSD完全自动驾驶系统"的心思,就注定他和皮衣黄"早晚有一战"。更何况,谁才是真正的科技教父,在白宫那 边丢了场子之后,老马一定不想再输一场。 "嗯,那正是特斯拉正在做的事。"马斯克在X平台上如是评价英伟达打造Alpamayo,看起来就有点不屑,其实一点儿都不是夸奖,"他们会发现,达到 99%很容易,但要解决分布的长尾部分则超级困难。" 英伟达"授人以渔":VLA与思维链 要理解马斯克的"不屑",必须先看懂英伟达此次抛出的究竟是何物。Alpamayo并非一个可以直接装车上路的完整自动驾驶系统,而是一套开发范式与基 础设施。 其核心创新,在于首次将视觉-语 ...
VLA+RL技术交流群来啦~
具身智能之心· 2026-01-08 04:23
具身智能之心VLA技术交流群来啦~欢迎VLA模型、VLA+RL、轻量化与部署方向的同学加入! 添加小助理微信AIDriver005,备注:昵称+机构+进群。 ...
为什么π系列对行业产生了这么大的影响?
具身智能之心· 2026-01-07 07:02
>> 点击进入→ 具身智能之心 技术交流群 点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 但pi貌似不"听话",不好调,总是达不到预期效果。这个事情,是很多同学持续在吐槽的。不少同学说,相当多的时间"浪费"在踩 坑上了。 想要基于pi系列,完成数据、VLA模型训练优化、部署一整套任务,对很多初学者来说非常困难。有的同学甚至踩了半年坑都无法真的 入门,更不用说取得较好效果。 ★ 其模型能力引领通用机器人从实验室走向工业制造、家庭服务等实景应用,成为 2025 年来业界众多 VLA 模型的核心参考。学会 π 系列 模型,即掌握 VLA 模型的核心根基,更能为科研创新、求职就业、工业落地赋能添翼。 不少公司基于pi系列搭建自己的真机demo,比如叠衣服、拆箱子等,或基于这个思路改进优化。physical intelligence的每次新工作发布, 都会引起行业反响。 ★ 2024.10 π0:首创 Flow Matching 连续动作轨迹预测,突破传统离散动作精度瓶颈,为精密制造、自动驾驶等场景提供毫米级操作基础; 2025.04 π0.5:异构任务协 ...
宇树科技“绿色通道暂停”风波背后,谁在给机器人赛道泼冷水?
Tai Mei Ti A P P· 2026-01-05 01:21
文 | 观潮科技Pro,作者 |高恒 宇树科技被传"A股绿色通道被叫停",虽迅速辟谣,但这场舆论风波还是将人形机器人赛道推上了新一 轮监管和资本审视的焦点。 1月4日,宇树科技突遭"绿色通道被叫停"的传闻袭扰,成为年初机器人行业的第一个舆论焦点。据网易 科技等媒体报道称,宇树科技虽仍在排队上市,但其原本依赖的"绿色通道"被监管叫停。这一说法迅速 引发广泛关注,外界普遍解读为监管层对机器人赛道过热发出的降温信号。 然而,宇树方面的回应更为直接。公司明确表示:"该报道涉及我司上市工作相关动态情况的内容与事 实情况不符,我司未涉及申请绿色通道相关事宜。"目前该公司上市工作正常推进,相关进展将依法依 规进行披露。 当天下午宇树科技CEO(首席执行官)王兴兴在其微信朋友圈表示,"被人乱编的消息,大家别当真。" 从证监会官网信息来看,宇树科技确已于2025年11月完成了上市辅导,拟在A股IPO,由中信证券担任 辅导机构。这意味着,宇树的上市路径本就走的是标准流程,绿色通道并非必需项。 企事界北京科技有限公司执行董事李睿对观潮科技Pro表示:所谓"绿色通道",是为特定类型企业(如 国家级专精特新"小巨人")提供的加速审核机 ...
宇树科技上市绿色通道被叫停?王兴兴回应:“乱编的消息”
Sou Hu Cai Jing· 2026-01-04 13:08
来源:钛媒体 笔者向宇树科技方面求证该消息是否属实,后者表示:该报道涉及我司上市工作相关动态情况的内容与事实情况不符,我司未涉及申请"绿色通道"相关事 宜。相关报道误导公众认知,已严重侵害我司合法权益。我司已向主管部门反映,同时督促相关方撤回不实报道。我司在此严正声明,后续将保留通过法律 手段追责的权利。 1月4日,有消息曝出,宇树科技A股上市的绿色通道被叫停,但上市并未叫停。相关人士透露,"国家希望能让机器人赛道降降温,泡沫太大了。" 所谓上市的绿色通道,指的是一种为特定类型企业(如国家级专精特新"小巨人"企业)提供的优先审核机制,能显著缩短上市时间。 不过,这并不意味着宇树科技的上市节奏被叫停。上述人士还透露,宇树科技完全符合上市资格,排队上市,走自然流程就好了。 目前,我司上市工作正常推进,相关进展将依法依规进行披露,感谢社会各界对公司的关心与支持。 另据网传微信聊天记录显示,宇树科技创始人王兴兴也通过社交软件回应称:"好几周之前的,被人乱编的消息,又扩大了,大家别当真。也不用和外人解 释。" 根据中国证监会官网显示,宇树科技11月份已经完成了上市辅导工作。公告显示,宇树科技拟申请在境内IPO,中信证券 ...
王鹤团队最新工作!解决VLA 模型多依赖单视角图像,缺乏精准几何信息的问题
具身智能之心· 2026-01-04 08:58
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在机器人操作领域,VLA模型通过端到端框架将视觉输入与语言指令映射为动作,实现了多样化技能学习。然而,现有 VLA 模型多依赖单视角 RGB 图像,缺乏精 准空间几何信息,难以满足高精度操纵需求。 由 Galbot、北京大学、香港大学等团队联合提出的 StereoVLA 模型 ,创新性地融合立体视觉的丰富几何线索,通过 "几何 - 语义特征提取 - 交互区域深度估计 - 多场景验证" 的技术体系,首次系统性解决了 VLA 模型空间感知不足的核心问题,为机器人精准操纵提供了全新解决方 案。 论文题目:StereoVLA: Enhancing Vision-Language-Action Models with Stereo Vision 现有解决方案中,手腕相机视野有限且易遮挡、增加碰撞风险;深度传感器对透明或镜面物体测量噪声大;多相机配置则增加硬件 ...
智元发布一体化具身大小脑系统GenieReasoner
人民财讯1月1日电,智元具身研究中心推出第二代一体化具身大小脑系统GenieReasoner。针对VLA模 型中语义推理与动作控制的模态对齐难题,智元具身研究中心提出了一种支持统一离散化预训练的模型 架构,并通过流匹配(Flow-matching)缓解了传统离散Tokenizer的动作精度瓶颈。 ...
2025商用具身智能白皮书
艾瑞咨询· 2025-12-31 22:34
Core Insights - Embodied intelligence has gained significant traction globally, with Figure achieving a valuation of $39 billion despite zero revenue, while domestic players are securing commercial orders and projecting substantial revenue growth [1][4] - The Chinese market is integrating embodied intelligence into its strategic development plans, indicating a shift towards a trillion-dollar market potential [1][9] Definition and Understanding - Embodied intelligence is recognized as a crucial development in artificial intelligence, characterized by agents that interact with their environment through a physical body, showcasing autonomy and adaptability [2] - It represents a convergence of machine learning, computer vision, and robotics, marking a significant step towards practical AI applications [2] Commercial Scene Classification - Different forms of embodied intelligent robots are evolving to meet diverse needs across retail, dining, manufacturing, logistics, education, and healthcare [4] - Commercial applications focus on enhancing service experiences in dynamic environments, while industrial applications emphasize precision and stability in structured settings [4] Strategic Significance - Embodied intelligence is pivotal in narrowing the technological gap between China and the U.S., driving innovation across various sectors including manufacturing and healthcare [6] - The competition in advanced technology between the two nations highlights the importance of breakthroughs in embodied intelligence for economic and competitive advantages [6] Policy Incentives - The Chinese government is actively promoting the development of embodied intelligence through various policies, funding, and standardization efforts [9] - Local governments are also implementing initiatives to support industry growth, including funding for humanoid robots and establishing collaborative platforms [9] Development Stages - The evolution of embodied intelligence can be categorized into three phases: conceptual development (1950s), technological accumulation (2000-2020), and application expansion driven by large models (2020 onwards) [11] - The current phase sees the U.S. leveraging its advantages in computational power and capital, while China accelerates its catch-up through policy support and industry collaboration [11] Bottlenecks and Challenges - The transition from experimental to commercial applications faces challenges such as data scarcity, high costs, and technical limitations in dexterity and generalization [13][16] - The industry is exploring solutions to overcome these challenges, including the establishment of data collection training grounds and innovative data acquisition methods [19] Model Evolution - The VLA model is emerging as a consensus for the development of embodied intelligence, integrating reasoning capabilities with real-world perception and action [21] - This evolution is expected to lead to a significant leap in capabilities, akin to the breakthroughs seen with large language models [21] Commercialization Breakthroughs - The path to large-scale commercialization of embodied intelligence hinges on advancements in five key dimensions: endurance, latency, execution, reliability, and economic viability [29] - Initial applications are focusing on low-complexity, high-ROI scenarios, with future expansions into more complex environments as technology matures [31] Global Market Predictions - The global market for embodied intelligence is projected to reach 19.2 billion RMB by 2025, with a compound annual growth rate of 73% over the next five years [46] - China's market is expected to experience significant growth, potentially exceeding 280 billion RMB by 2035, driven by a robust industrial ecosystem [50] Competitive Landscape - The competition in the embodied intelligence sector is characterized by three main players: AI-native challengers like Figure, traditional industrial players like ABB, and cross-industry giants like Tesla [55] - The market is anticipated to undergo consolidation as product homogeneity increases, leading to a potential first wave of industry shakeout [57] Initial Player Strategies - Startups in the sector must leverage their agility and innovation capabilities to survive against established giants, focusing on strategic partnerships and long-term value creation [59]
对话大晓机器人董事长王晓刚:不押注VLA,押注世界模型
Sou Hu Cai Jing· 2025-12-25 07:59
Core Insights - The current technological routes in embodied intelligence, particularly the VLA model, have significant flaws in understanding the physical world and its laws [4][11] - Many companies are developing embodiments, but there is a lack of products that can truly understand the world and solve real problems [5] - In 2025, the domestic market is expected to see a surge in instant retail warehousing applications, which require 24/7 service, presenting an opportunity for robots to excel [5] Group 1: Company Strategy - The CEO of DaXiao Robotics, Wang Xiaogang, emphasizes a restrained approach by not entering the crowded embodiment market or betting on VLA, but instead focusing on the world model as a consensus direction in the industry [6][8] - DaXiao Robotics aims to integrate soft and hard solutions, addressing the shortcomings of existing technology routes, particularly the VLA model, which does not require a true understanding of the physical world [11][12] - The company’s world model consists of three parts: multi-modal understanding, long-term dynamic interaction scenes, and predictive capabilities, which are essential for the core of their technology [13] Group 2: Market Position and Opportunities - The industry is still maturing, and the head positioning has not been completed, with significant opportunities for new startups due to existing technological flaws [17] - The company sees a unique opportunity in the integration of hardware and software, leveraging its extensive client base from previous years to achieve rapid scaling in the robotics field [18] - Short-term goals include deploying four-legged robotic dogs with navigation and AI capabilities, while mid-term focus will be on commercial service scenarios like flash purchase warehouses [19] Group 3: Technological Differentiation - The ACE research paradigm proposed by DaXiao Robotics is seen as a revolutionary change that could provide a competitive edge in the market [18] - The world model approach is believed to be more adaptable and capable of covering a wider range of scenarios compared to VLA, which is limited by its embodiment [21] - The company plans to open-source its model to gather diverse feedback and data, differentiating its development path from other countries [22]
业内首个RL+VLA汇总:强化学习如何推动 VLA 走向真实世界?
自动驾驶之心· 2025-12-24 09:22
MindDrive WAM-Diff 论文标题 :MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning 论文链接 :https://arxiv.org/abs/2512.13636 项目主页 :https://xiaomi-mlab.github.io/MindDrive/ 提出机构 :华中科技大学、小米汽车 一句话总结 :为解决VLA模型在线强化学习中连续动作空间探索低效的问题,提出MindDrive框架,通过双专家(决策专家+动作专家)架构将动作空间转化为离 散语言决策空间,实现高效在线RL训练。 核心贡献 : 设计双LoRA适配器架构,决策专家负责场景推理与语言决策,动作专家将决策映射为可行轨迹,建立语言-动作动态映射。 构建基于CARLA模拟器的在线闭环RL框架,采用稀疏奖励与PPO算法,结合KL正则化避免灾难性遗忘。 在Bench2Drive基准上以轻量Qwen-0.5B模型实现78.04的驾驶分数与55.09%的成功率,超越同规模SOTA模型。 点击下方 ...