Workflow
具身智能之心
icon
Search documents
前理想汽车 CTO 具身领域创业,过硬的量产实力是硬通货
具身智能之心· 2025-09-17 03:14
编辑丨具身智能之心 点击下方 卡片 ,关注" 具身智能 之心 "公众号 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 元璟资本投资合伙人、前理想汽车CTO王凯已投入具身智能创业。与此同时,某头部自驾技术高管 也即将参与。 成立数月便受到各类投资机构的青睐,红杉资本、蓝驰资本等多家累计进行了5000w美元的投资。 「具身智能之心」 立减 新人立减券 * 50 2025/09/20 23:00 后失效 CJ知识量班 长按扫码领取优惠 ▶ 除了目前比较火热的具身智能赛道,创始人的量产能力是资本非常看好的。2020年王凯加入理想汽 车,负责智能驾驶相关的研究,涉及座舱、自驾、操作系统和平台等内容。他推动了地平线芯片的 方案量产。2022年离开理想,加入元璟资本担任投资合伙人。 另外一位自驾高管参与某头部新势力的端到端与vla量产工作,当下具身领域确实需要量产能力强的 大牛参与,推动商业化进程。 更多内容欢迎加入我们的具身社区:具身智能之心知识星球,第一时间了解行业动态。 ...
VLA-Adapter:以0.5B参数实现机器人智能新高度,还无需预训练
具身智能之心· 2025-09-17 03:14
Core Viewpoint - The VLA-Adapter model, developed by leading institutions, represents a revolutionary breakthrough in the Vision-Language-Action (VLA) model for robotics, offering a lightweight design with 500 million parameters while achieving performance comparable to larger models, thus lowering the barriers for training and deployment in robotic applications [4][11][30]. Summary by Sections Introduction to VLA-Adapter - The VLA-Adapter model has been jointly developed by top institutions and is designed to enhance the efficiency and intelligence of robots in understanding environments and executing tasks [4][11]. Challenges in VLA Models - Traditional VLA models face challenges such as reliance on large-scale pre-trained models and high computational costs, which hinder practical applications [3][11]. VLA-Adapter's Innovations - VLA-Adapter introduces a new bridging paradigm that efficiently transmits multimodal information to the action space, significantly reducing model size and training costs [11][12]. - The model utilizes a lightweight backbone network with only 0.5 billion parameters, achieving performance comparable to 7 billion parameter models without requiring extensive pre-training on robotic datasets [11][12]. Key Technologies - The innovative Bridge Attention mechanism is crucial for VLA-Adapter's success, allowing efficient connection between visual-language representations and action generation [12][14]. - The model's training efficiency is highlighted by its ability to complete training in just 8 hours on a single consumer-grade GPU, compared to traditional models that may take days or weeks [15][19]. Experimental Validation - VLA-Adapter has demonstrated superior performance in various robotic tasks, achieving an average success rate of 97.3% in the LIBERO benchmark, outperforming several baseline models [19][20]. - In zero-shot generalization tasks, VLA-Adapter achieved an average task completion length of 4.42, indicating strong adaptability to unseen environments [21][22]. Real-World Applications - The model has shown robust performance in real-world tasks, including complex operations with a 6-DOF robot, demonstrating its potential for diverse applications in industrial automation, smart homes, and medical assistance [23][28]. Future Potential - VLA-Adapter's lightweight design and high efficiency position it as a promising solution for real-time applications, facilitating the development and deployment of VLA models by smaller research institutions and companies [28][30].
星动纪元招聘!具身多模态、强化学习等多个方向
具身智能之心· 2025-09-17 00:02
Core Viewpoint - The article outlines various job descriptions and requirements for positions related to multi-modal reinforcement learning, data processing, and embodied intelligence, emphasizing the need for advanced skills in AI and machine learning technologies [6][14][15]. Group 1: Job Descriptions - Responsibilities include research, design, and implementation of cutting-edge multi-modal reinforcement learning algorithms to address complex real-world problems [6]. - Involvement in the collection, processing, cleaning, and analysis of multi-modal data to create high-quality training datasets [14]. - Development and optimization of multi-modal models, including training, fine-tuning, and enhancing performance across different tasks [6][15]. Group 2: Job Requirements - Candidates should possess a master's degree or higher in computer science, artificial intelligence, or robotics, with at least one year of research experience in computer vision or embodied intelligence [13]. - Proficiency in programming languages such as Python and deep learning frameworks like PyTorch is essential, along with strong engineering implementation skills [13]. - Experience in publishing papers at top academic conferences (e.g., CVPR, NeurIPS) and contributions to open-source projects are preferred [13][19]. Group 3: Additional Qualifications - Familiarity with multi-modal data cleaning, labeling, and loading, as well as understanding data optimization techniques is required [14]. - Candidates should have experience with large language models and multi-modal models, including knowledge of their capabilities and applicable scenarios [14]. - High standards for data quality and attention to detail are necessary, along with proficiency in data processing tools like Pandas and NumPy [14].
一个P7,从自驾到具身的转行建议......
具身智能之心· 2025-09-17 00:02
Core Viewpoint - The article discusses the transition from autonomous driving to embodied intelligence, highlighting similarities in challenges such as data scarcity, algorithm maturity, and deployment strategies [1][6]. Data - The lack of data leads to considerations of real-to-sim and sim-to-real approaches, with suggestions for self-collection methods where robots gather and filter their own data [2]. Algorithms - For commercialization, it is advised to prioritize proven technologies while waiting for newer technologies to mature, emphasizing the use of reinforcement learning methods when applicable [3][4]. Deployment Strategies - Concerns about deployment are minimal, as the industry is adept at lightweight solutions, with expectations for future advancements in data and algorithms [5]. Differences from Autonomous Driving - Unlike autonomous driving, embodied intelligence heavily relies on the physical body, with a greater risk of damage during deployment, necessitating robust safety measures [6]. Career Transition - Transitioning to embodied intelligence is easier for those with relevant experience in autonomous driving or traditional robotics, while newcomers should have a structured learning approach [8]. Community and Resources - The "Embodied Intelligence Knowledge Planet" community offers a comprehensive platform for learning and sharing, with nearly 2000 members and plans to expand to 10,000, providing resources for data collection, algorithm deployment, and job opportunities [10][18]. Research Directions - The community has compiled over 30 technical routes for research, facilitating access to benchmarks and learning pathways for both beginners and advanced practitioners [11][12]. Industry Insights - The community connects members with industry leaders and provides insights into job opportunities, academic advancements, and practical applications in embodied intelligence [18][21]. Resource Compilation - A variety of resources, including open-source projects, datasets, and technical learning routes, are available to support research and development in embodied intelligence [31][37]. Networking Opportunities - Members can engage in discussions, ask questions, and share solutions, fostering a collaborative environment for tackling challenges in the field [78].
宇树开源了UnifoLM-WMA-0: 一个跨实体的世界模型+Action的框架
具身智能之心· 2025-09-16 03:29
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 UnifoLM-WMA-0是宇树科技推出的开源世界模型-行动架构,该架构跨越多种机器人实体形态,专 为通用机器人学习而设计。其核心组件是具备理解机器人与环境间物理交互能力的世界模型,该模 型提供两大关键功能:(a)仿真引擎——作为交互式模拟器运行,为机器人学习生成合成数据; (b)策略增强——与行动模块连接,通过预测与世界模型的未来交互过程,进一步优化决策性能。 项目链接:https://unigen-x.github.io/unifolm-world-model-action.github.io/ 架构说明 UnifoLM-WMA-0是一种嵌入世界模型的策略架构。该框架使世界模型能够以两种模式运行:(1) 决策模式-预测未来物理交互信息以辅助策略生成动作;(2)模拟模式-根据机器人动作生成高保真 度的环境反馈。 对视频生成模型进行微调:首先,我 ...
那些敢于破风的具身技术一号位们......
具身智能之心· 2025-09-16 00:03
Core Viewpoint - The article emphasizes the rapid development and commercialization of embodied intelligence, highlighting key figures and companies driving innovation in this field globally [2]. Group 1: Key Figures in Embodied Intelligence - Wang Xingxing, CEO and CTO of Yushu Technology, has over 10 years of experience in quadruped robot development, leading the launch of multiple products and holding over 100 related patents [4]. - Zhao Xing, an assistant professor at Tsinghua University, has made significant contributions to embodied intelligence and multimodal learning, including the development of the first mass-produced autonomous driving large model [6]. - Xu Huazhe, also an assistant professor at Tsinghua University, focuses on visual deep reinforcement learning and has published over 60 papers in top journals and conferences [9][10]. - Wang He, founder of Galaxy General, specializes in embodied intelligence and 3D vision, leading the development of a versatile wheeled robot [12][13]. - Luo Jianlan, chief scientist at Zhiyuan Robotics, has developed a system achieving 100% success in real-world reinforcement learning tasks [16][18]. - Wang Hao, co-founder and CTO of Zihuan Robotics, led the development of the world's largest parameter scale embodied intelligence model, WALL-A [20][21]. Group 2: Companies in Embodied Intelligence - Yushu Technology focuses on high-performance humanoid robot hardware, utilizing advanced motor drives and motion control solutions [46]. - Galaxy General aims to create versatile humanoid robots, with significant data accumulation for simulation and real-world applications [12][13]. - Zhiyuan Robotics is dedicated to solving challenges in reinforcement learning for precise robotic assembly tasks [16][18]. - Zihuan Robotics is working on integrating large models with embodied intelligence, emphasizing cost-effective solutions for advanced capabilities [20][21]. - Physical Intelligence, co-founded by Sergey Levine, has raised significant funding to develop advanced AI models for various robotic applications [36]. Group 3: Trends and Future Directions - The industry is witnessing a shift towards flexible, adaptive, and highly interactive embodied intelligence systems, driven by diverse technological paths [46]. - Companies are focusing on local needs and practical applications, aiming to create systems that are more aligned with everyday life [46]. - The collaboration between academia and industry is crucial for advancing embodied intelligence technologies and achieving commercial viability [46].
机器人入职洗衣房,开始打工挣钱!苹果前AI高管打造
具身智能之心· 2025-09-16 00:03
Core Insights - The article discusses the launch of Isaacs, a laundry folding robot developed by Weave Robotic, which is the first of its kind to operate in a paid laundry facility, Tumble Laundry [4][5]. Group 1: Company Overview - Weave Robotic was founded by former Apple team members, Evan Winelan and Kaan Dogrusoz, who have significant experience in AI and product development [16][17][21]. - The company has successfully completed three rounds of financing before the official product launch, indicating strong investor confidence [5]. Group 2: Technology and Functionality - Isaacs is designed to perform not only laundry folding but also to organize and store clothes, ensuring a tidy workspace [6][8]. - The robot utilizes a proprietary visual-language-action (VLA) model for precise identification of clothing types and folding techniques, achieving 70% autonomous folding capability with human assistance only when necessary [20]. - Isaacs is built with a high-performance network stack that allows for remote human intervention in complex scenarios, enhancing its operational efficiency [20]. Group 3: Future Prospects - The robot is intended to evolve beyond laundry folding to include capabilities such as organizing miscellaneous items and home security, addressing diverse household needs [14]. - The design incorporates privacy features, such as automatically folding its camera when idle, to protect user privacy [15].
具身VLA后训练:TeleAI提出潜空间引导的VLA跨本体泛化方法
具身智能之心· 2025-09-16 00:03
Core Insights - The article discusses the challenges and solutions related to the Vision-Language-Action (VLA) models in the context of cross-embodiment adaptation, highlighting the limitations of existing models and the introduction of a new framework called "Align then Steer" (ATE) to enhance performance in post-training scenarios [1][2][10]. Group 1: Challenges in VLA Models - Current VLA models require extensive target domain data for post-training, often needing dozens to hundreds of hours, leading to significant mismatches in action distributions between pre-training and post-training phases [1][10]. - The marginal returns of simply stacking data during post-training diminish rapidly, making it ineffective for fitting the action distribution of target scenarios [1][11]. Group 2: ATE Framework Introduction - The ATE framework proposed by the TeleAI team aims to align action distributions in latent space, allowing for efficient adaptation of VLA models without altering their core architecture [2][10]. - ATE transitions the focus from adjusting model architecture to adjusting distributions, significantly reducing data requirements for cross-embodiment adaptation [2][15]. Group 3: ATE Framework Mechanism - The ATE framework consists of two main phases: aligning action distributions in latent space and guiding the post-training strategy updates using a classifier [14][19]. - In the alignment phase, two small Variational Autoencoders (VAEs) are constructed to embed action data into a unified latent space, ensuring that the adapted actions closely follow the pre-trained distribution [18][19]. - The guiding phase integrates a classifier guidance function to measure the difference between generated actions and target distributions, effectively steering the model outputs towards the desired action distribution [21][22]. Group 4: Experimental Results - The ATE algorithm demonstrated an average increase of 9.8% in multi-task success rates in simulation evaluations compared to direct post-training, with a maximum success rate gain of 32% in real-world scenarios [23][24]. - The framework showed robust performance under various conditions, including lighting changes and external disturbances, maintaining task-related focus and recovery capabilities [29][30]. Group 5: Conclusion and Future Directions - The ATE framework provides a viable solution to the challenges of data scarcity and cross-embodiment adaptation in VLA models, allowing for efficient and robust training without the need for extensive data collection or full model retraining [30]. - This framework can serve as a plug-and-play module compatible with various mainstream VLA models, enhancing their post-training cross-embodiment generalization capabilities [30].
真的花了好久才汇总的大模型技术路线......
具身智能之心· 2025-09-16 00:03
Core Insights - The article emphasizes the transformative impact of large models on various industries, highlighting their role in enhancing productivity and driving innovation in fields such as autonomous driving, embodied intelligence, and generative AI [2][4]. Group 1: Large Model Technology Trends - The large model industry is undergoing significant changes characterized by technological democratization, vertical application, and open-source ecosystems [2]. - There is a growing demand for talent skilled in technologies like RAG (Retrieval-Augmented Generation) and AI Agents, which are becoming core competencies for AI practitioners [2][4]. - The article introduces a comprehensive learning community focused on large models, offering resources such as videos, articles, learning paths, and job exchange opportunities [2][4]. Group 2: Learning Pathways - The community provides detailed learning pathways for various aspects of large models, including RAG, AI Agents, and multimodal models [4][5]. - Specific learning routes include Graph RAG, Knowledge-Oriented RAG, and Reasoning RAG, among others, aimed at both beginners and advanced learners [4][5]. - The pathways are designed to facilitate systematic learning and networking among peers in the field [5]. Group 3: Community Benefits - Joining the community offers benefits such as access to the latest academic advancements, industrial applications, and job opportunities in the large model sector [7][9]. - The community aims to create a collaborative environment for knowledge sharing and professional networking [7][9]. - There are plans for live sessions with industry leaders to further enrich the community's offerings [65][66].
卷VLA,提供一些参考方向......
具身智能之心· 2025-09-15 10:00
Core Insights - The Vision-Language-Action (VLA) model represents a new paradigm in embodied intelligence, enabling robots to generate executable actions from language instructions and visual signals, thus enhancing their adaptability to complex environments [1][3]. - VLA breaks the traditional single-task limitations, allowing robots to make autonomous decisions in diverse scenarios, which is applicable in manufacturing, logistics, and home services [3]. - The VLA model has become a research hotspot, driving collaboration between academia and industry, with various cutting-edge projects like pi0, RT-2, OpenVLA, QUAR-VLA, and HumanVLA emerging [3][5]. Industry Development - The embodied intelligence sector is experiencing robust growth, with teams like Unitree, Zhiyuan, Xinghaitu, Galaxy General, and Zhujidongli transitioning from laboratories to commercialization [5]. - Major tech companies such as Huawei, JD.com, and Tencent are actively investing in this field, alongside international firms like Tesla and Figure AI [5]. Educational Initiatives - A specialized VLA research guidance course has been launched to assist students in quickly entering or transitioning into the VLA research area, addressing the complexity of the related systems and frameworks [5]. - The course focuses on the perception-cognition-action loop, providing a comprehensive understanding of VLA's theoretical foundations and practical applications [7][8]. Course Structure and Outcomes - The curriculum covers the entire research process, from theoretical foundations to experimental design and paper writing, ensuring students develop independent research capabilities [15]. - Students will learn to identify research opportunities, analyze unresolved challenges in the field, and receive personalized guidance tailored to their backgrounds and interests [15]. - The course aims to help students produce a complete research idea and a preliminary experimental validation, culminating in a draft of a high-quality academic paper [15][18].