Workflow
具身智能之心
icon
Search documents
前理想汽车 CTO 具身领域创业,过硬的量产实力是硬通货
具身智能之心· 2025-09-17 03:14
编辑丨具身智能之心 点击下方 卡片 ,关注" 具身智能 之心 "公众号 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 元璟资本投资合伙人、前理想汽车CTO王凯已投入具身智能创业。与此同时,某头部自驾技术高管 也即将参与。 成立数月便受到各类投资机构的青睐,红杉资本、蓝驰资本等多家累计进行了5000w美元的投资。 「具身智能之心」 立减 新人立减券 * 50 2025/09/20 23:00 后失效 CJ知识量班 长按扫码领取优惠 ▶ 除了目前比较火热的具身智能赛道,创始人的量产能力是资本非常看好的。2020年王凯加入理想汽 车,负责智能驾驶相关的研究,涉及座舱、自驾、操作系统和平台等内容。他推动了地平线芯片的 方案量产。2022年离开理想,加入元璟资本担任投资合伙人。 另外一位自驾高管参与某头部新势力的端到端与vla量产工作,当下具身领域确实需要量产能力强的 大牛参与,推动商业化进程。 更多内容欢迎加入我们的具身社区:具身智能之心知识星球,第一时间了解行业动态。 ...
VLA-Adapter:以0.5B参数实现机器人智能新高度,还无需预训练
具身智能之心· 2025-09-17 03:14
>> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 点击下方 卡片 ,关注" 具身智能之心 "公众号 | | OpenVLA-OFT (soTA) | | VLA-Adapter (Ours) | | --- | --- | --- | --- | | Backbone ↓ | 7B | 0.5в | 1/14× | | Fine-tuning Cost ↓ | 304GPU·h | 8GPU.h | 1/38× | | Training VRAM (8 batch)↓ | 62GB | 24.7GB | 0.4× | | Throughput (8-dim chunk) ↑ | 71.4Hz | 219.2Hz | 3× | | Performance (LIBERO) ↑ | 97.1% | 97.3% | Maintain | | VLM # / 3 | Bridge | Policy | Frozen | | | | | ు Trainable | | / IRGB L Instuction | A ...
星动纪元招聘!具身多模态、强化学习等多个方向
具身智能之心· 2025-09-17 00:02
Core Viewpoint - The article outlines various job descriptions and requirements for positions related to multi-modal reinforcement learning, data processing, and embodied intelligence, emphasizing the need for advanced skills in AI and machine learning technologies [6][14][15]. Group 1: Job Descriptions - Responsibilities include research, design, and implementation of cutting-edge multi-modal reinforcement learning algorithms to address complex real-world problems [6]. - Involvement in the collection, processing, cleaning, and analysis of multi-modal data to create high-quality training datasets [14]. - Development and optimization of multi-modal models, including training, fine-tuning, and enhancing performance across different tasks [6][15]. Group 2: Job Requirements - Candidates should possess a master's degree or higher in computer science, artificial intelligence, or robotics, with at least one year of research experience in computer vision or embodied intelligence [13]. - Proficiency in programming languages such as Python and deep learning frameworks like PyTorch is essential, along with strong engineering implementation skills [13]. - Experience in publishing papers at top academic conferences (e.g., CVPR, NeurIPS) and contributions to open-source projects are preferred [13][19]. Group 3: Additional Qualifications - Familiarity with multi-modal data cleaning, labeling, and loading, as well as understanding data optimization techniques is required [14]. - Candidates should have experience with large language models and multi-modal models, including knowledge of their capabilities and applicable scenarios [14]. - High standards for data quality and attention to detail are necessary, along with proficiency in data processing tools like Pandas and NumPy [14].
一个P7,从自驾到具身的转行建议......
具身智能之心· 2025-09-17 00:02
Core Viewpoint - The article discusses the transition from autonomous driving to embodied intelligence, highlighting similarities in challenges such as data scarcity, algorithm maturity, and deployment strategies [1][6]. Data - The lack of data leads to considerations of real-to-sim and sim-to-real approaches, with suggestions for self-collection methods where robots gather and filter their own data [2]. Algorithms - For commercialization, it is advised to prioritize proven technologies while waiting for newer technologies to mature, emphasizing the use of reinforcement learning methods when applicable [3][4]. Deployment Strategies - Concerns about deployment are minimal, as the industry is adept at lightweight solutions, with expectations for future advancements in data and algorithms [5]. Differences from Autonomous Driving - Unlike autonomous driving, embodied intelligence heavily relies on the physical body, with a greater risk of damage during deployment, necessitating robust safety measures [6]. Career Transition - Transitioning to embodied intelligence is easier for those with relevant experience in autonomous driving or traditional robotics, while newcomers should have a structured learning approach [8]. Community and Resources - The "Embodied Intelligence Knowledge Planet" community offers a comprehensive platform for learning and sharing, with nearly 2000 members and plans to expand to 10,000, providing resources for data collection, algorithm deployment, and job opportunities [10][18]. Research Directions - The community has compiled over 30 technical routes for research, facilitating access to benchmarks and learning pathways for both beginners and advanced practitioners [11][12]. Industry Insights - The community connects members with industry leaders and provides insights into job opportunities, academic advancements, and practical applications in embodied intelligence [18][21]. Resource Compilation - A variety of resources, including open-source projects, datasets, and technical learning routes, are available to support research and development in embodied intelligence [31][37]. Networking Opportunities - Members can engage in discussions, ask questions, and share solutions, fostering a collaborative environment for tackling challenges in the field [78].
宇树开源了UnifoLM-WMA-0: 一个跨实体的世界模型+Action的框架
具身智能之心· 2025-09-16 03:29
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 UnifoLM-WMA-0是宇树科技推出的开源世界模型-行动架构,该架构跨越多种机器人实体形态,专 为通用机器人学习而设计。其核心组件是具备理解机器人与环境间物理交互能力的世界模型,该模 型提供两大关键功能:(a)仿真引擎——作为交互式模拟器运行,为机器人学习生成合成数据; (b)策略增强——与行动模块连接,通过预测与世界模型的未来交互过程,进一步优化决策性能。 项目链接:https://unigen-x.github.io/unifolm-world-model-action.github.io/ 架构说明 UnifoLM-WMA-0是一种嵌入世界模型的策略架构。该框架使世界模型能够以两种模式运行:(1) 决策模式-预测未来物理交互信息以辅助策略生成动作;(2)模拟模式-根据机器人动作生成高保真 度的环境反馈。 对视频生成模型进行微调:首先,我 ...
机器人入职洗衣房,开始打工挣钱!苹果前AI高管打造
具身智能之心· 2025-09-16 00:03
更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 机器人叠衣服赚钱的第一个工作场地:洗衣房。 从业者是 Weave Robotic 的 Isaacs ,专门为做家务打造的,目前已经在付费洗衣房Tumble Laundry上岗了。 值得一提的是,Weave Robotic是 前苹果团队 创立的,在没有产品正式亮相时就已经完成了三轮融资。 编辑丨量子位 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 虽然说现在很多机器人都在学叠衣服,但大多还是在展台表演秀技能,而Isaacs已经next level,是首个可以搞定付费衣物折叠的通用机器 人。 那么机器人的工作流程是怎样的呢? 不仅会叠衣服还会收纳 首先,洗衣房负责承接顾客的洗衣需求,完成衣物的清洗、烘干等,接下来就到Isaacs的工作范围了。它们负责最耗费人力,并且对规整度 有一定要求的折叠环节。 从Isaacs本身的能力细节来看,它的折叠服务并非粗放操作的。Weave Robotic对合格的衣物折叠有明确标准:折叠后的衣物需要保持整 洁。 以衬衫为例, ...
那些敢于破风的具身技术一号位们......
具身智能之心· 2025-09-16 00:03
Core Viewpoint - The article emphasizes the rapid development and commercialization of embodied intelligence, highlighting key figures and companies driving innovation in this field globally [2]. Group 1: Key Figures in Embodied Intelligence - Wang Xingxing, CEO and CTO of Yushu Technology, has over 10 years of experience in quadruped robot development, leading the launch of multiple products and holding over 100 related patents [4]. - Zhao Xing, an assistant professor at Tsinghua University, has made significant contributions to embodied intelligence and multimodal learning, including the development of the first mass-produced autonomous driving large model [6]. - Xu Huazhe, also an assistant professor at Tsinghua University, focuses on visual deep reinforcement learning and has published over 60 papers in top journals and conferences [9][10]. - Wang He, founder of Galaxy General, specializes in embodied intelligence and 3D vision, leading the development of a versatile wheeled robot [12][13]. - Luo Jianlan, chief scientist at Zhiyuan Robotics, has developed a system achieving 100% success in real-world reinforcement learning tasks [16][18]. - Wang Hao, co-founder and CTO of Zihuan Robotics, led the development of the world's largest parameter scale embodied intelligence model, WALL-A [20][21]. Group 2: Companies in Embodied Intelligence - Yushu Technology focuses on high-performance humanoid robot hardware, utilizing advanced motor drives and motion control solutions [46]. - Galaxy General aims to create versatile humanoid robots, with significant data accumulation for simulation and real-world applications [12][13]. - Zhiyuan Robotics is dedicated to solving challenges in reinforcement learning for precise robotic assembly tasks [16][18]. - Zihuan Robotics is working on integrating large models with embodied intelligence, emphasizing cost-effective solutions for advanced capabilities [20][21]. - Physical Intelligence, co-founded by Sergey Levine, has raised significant funding to develop advanced AI models for various robotic applications [36]. Group 3: Trends and Future Directions - The industry is witnessing a shift towards flexible, adaptive, and highly interactive embodied intelligence systems, driven by diverse technological paths [46]. - Companies are focusing on local needs and practical applications, aiming to create systems that are more aligned with everyday life [46]. - The collaboration between academia and industry is crucial for advancing embodied intelligence technologies and achieving commercial viability [46].
具身VLA后训练:TeleAI提出潜空间引导的VLA跨本体泛化方法
具身智能之心· 2025-09-16 00:03
Core Insights - The article discusses the challenges and solutions related to the Vision-Language-Action (VLA) models in the context of cross-embodiment adaptation, highlighting the limitations of existing models and the introduction of a new framework called "Align then Steer" (ATE) to enhance performance in post-training scenarios [1][2][10]. Group 1: Challenges in VLA Models - Current VLA models require extensive target domain data for post-training, often needing dozens to hundreds of hours, leading to significant mismatches in action distributions between pre-training and post-training phases [1][10]. - The marginal returns of simply stacking data during post-training diminish rapidly, making it ineffective for fitting the action distribution of target scenarios [1][11]. Group 2: ATE Framework Introduction - The ATE framework proposed by the TeleAI team aims to align action distributions in latent space, allowing for efficient adaptation of VLA models without altering their core architecture [2][10]. - ATE transitions the focus from adjusting model architecture to adjusting distributions, significantly reducing data requirements for cross-embodiment adaptation [2][15]. Group 3: ATE Framework Mechanism - The ATE framework consists of two main phases: aligning action distributions in latent space and guiding the post-training strategy updates using a classifier [14][19]. - In the alignment phase, two small Variational Autoencoders (VAEs) are constructed to embed action data into a unified latent space, ensuring that the adapted actions closely follow the pre-trained distribution [18][19]. - The guiding phase integrates a classifier guidance function to measure the difference between generated actions and target distributions, effectively steering the model outputs towards the desired action distribution [21][22]. Group 4: Experimental Results - The ATE algorithm demonstrated an average increase of 9.8% in multi-task success rates in simulation evaluations compared to direct post-training, with a maximum success rate gain of 32% in real-world scenarios [23][24]. - The framework showed robust performance under various conditions, including lighting changes and external disturbances, maintaining task-related focus and recovery capabilities [29][30]. Group 5: Conclusion and Future Directions - The ATE framework provides a viable solution to the challenges of data scarcity and cross-embodiment adaptation in VLA models, allowing for efficient and robust training without the need for extensive data collection or full model retraining [30]. - This framework can serve as a plug-and-play module compatible with various mainstream VLA models, enhancing their post-training cross-embodiment generalization capabilities [30].
真的花了好久才汇总的大模型技术路线......
具身智能之心· 2025-09-16 00:03
Core Insights - The article emphasizes the transformative impact of large models on various industries, highlighting their role in enhancing productivity and driving innovation in fields such as autonomous driving, embodied intelligence, and generative AI [2][4]. Group 1: Large Model Technology Trends - The large model industry is undergoing significant changes characterized by technological democratization, vertical application, and open-source ecosystems [2]. - There is a growing demand for talent skilled in technologies like RAG (Retrieval-Augmented Generation) and AI Agents, which are becoming core competencies for AI practitioners [2][4]. - The article introduces a comprehensive learning community focused on large models, offering resources such as videos, articles, learning paths, and job exchange opportunities [2][4]. Group 2: Learning Pathways - The community provides detailed learning pathways for various aspects of large models, including RAG, AI Agents, and multimodal models [4][5]. - Specific learning routes include Graph RAG, Knowledge-Oriented RAG, and Reasoning RAG, among others, aimed at both beginners and advanced learners [4][5]. - The pathways are designed to facilitate systematic learning and networking among peers in the field [5]. Group 3: Community Benefits - Joining the community offers benefits such as access to the latest academic advancements, industrial applications, and job opportunities in the large model sector [7][9]. - The community aims to create a collaborative environment for knowledge sharing and professional networking [7][9]. - There are plans for live sessions with industry leaders to further enrich the community's offerings [65][66].
卷VLA,提供一些参考方向......
具身智能之心· 2025-09-15 10:00
VLA有价值,但入门也难 VLA,Vision-Language-Action模型,是具身智能领域的新范式,从给定的语言指令和视觉信号,直接生成出机 器人可执行的动作。这种范式打破了以往只能在单个任务上训练大的局限性,提供了机器人模型往更加通用,场 景更加泛化的方向发展。VLA模型在学术界和工业界的重要性主要体现在其将视觉信息、语言指令和行动决策 有效整合,显著提升了机器人对复杂环境的理解和适应能力。 VLA打破了传统方法的单任务局限,使得机器人能够在多样化的场景中自主决策,灵活应对未见过的环境,广 泛应用于制造业、物流和家庭服务等领域。此外,VLA模型已成为研究热点,推动了多个前沿项目的发展,如 pi0、RT-2、OpenVLA、QUAR-VLA和HumanVLA,这些研究促进了学术界与工业界的合作。其适应性体现在能 够应用于机械臂、四足机器人和人形机器人等多种平台,为各类智能机器人的发展提供了广泛的潜力和实际应用 价值,成为智能机器人领域的关键驱动力。 从产业角度看,国内外具身智能领域正处于蓬勃发展阶段,Unitree、智元、星海图、银河通用、逐际动力等团 队从实验室走向商业化,华为、京东、腾讯等科技巨头 ...