具身智能之心
Search documents
国内外那些做具身大脑的公司们......
具身智能之心· 2025-09-13 04:03
Core Insights - The article focuses on the emerging field of embodied intelligence, highlighting the development of general-purpose robotic "brain" systems and multi-modal perception-decision systems, which are gaining significant attention from both capital and industry sectors [2][3]. Domestic Companies - **Xinghai Map**: Founded in 2023, focuses on developing a general embodied large model using real-world data to create robots with fine operational capabilities. The company has completed 8 rounds of financing in less than two years. Its representative product, WALL-A model, is set to launch in October 2024 and is claimed to be the largest parameter scale embodied intelligence model globally, integrating visual, language, and motion control signals [6]. - **UBTECH**: Established in 2012, it is a leader in humanoid robot commercialization with comprehensive self-research capabilities. The Thinker model, set to be released in 2025, has achieved top rankings in international benchmark tests, significantly enhancing robots' perception and planning capabilities in complex environments [10]. - **ZhiYuan Robotics**: Founded in February 2023, it aims to create world-class general embodied intelligent robots. Its Genie Operator-1 model, to be released in March 2025, integrates multi-modal large model and mixed expert technologies, improving task success rates by 32% compared to market models [12]. - **Galaxy General**: Established in May 2023, it focuses on multi-modal large models driven by synthetic data. Its VLA model is the first general embodied large model globally, utilizing a "brain + cerebellum" collaborative framework [14]. - **Qianxun Intelligent**: Founded in 2024, it is a leading AI + robotics company with a focus on flexible object manipulation. Its Spirit V1 VLA model is the first to tackle long-range operations of flexible objects [16]. - **Star Motion Era**: A new tech company incubated by Tsinghua University, focusing on general artificial intelligence applications. Its ERA-42 model supports over 100 dynamic tasks through video training [18]. - **Zhujidi Power**: Concentrates on embodied intelligent robots, developing core technologies for hardware design, full-body motion control, and training paradigms [20]. International Companies - **Figure AI**: Focuses on embodied intelligence operation algorithms, enhancing data training and algorithm performance through video generation technology [17]. - **Physical Intelligence**: Founded in January 2023, it aims to develop advanced intelligent software for various robots. Its π0 model, released in October 2024, is a universal robot foundation model [22]. - **Google DeepMind**: Merged with Google Brain in 2023, it focuses on general artificial intelligence research. Its Gemini Robotics model can control robots to perform complex tasks without specialized training [20]. - **Skild AI**: A leading robotics "brain" development company in the US, aiming to create a universal robot operating system that enables intelligent operations across various scenarios [26].
组内没有人做具身,导师让我先去踩坑......
具身智能之心· 2025-09-12 16:03
Core Viewpoint - The article emphasizes the importance of building a solid foundation in hardware and algorithms for embodied intelligence research, particularly for newcomers in the field [1][12]. Group 1: Research Guidance - For teams without prior experience in embodied intelligence, it is recommended to start with simpler tasks using robotic arms before tackling more complex humanoid robots [1]. - Those with a background in large models should focus on specific downstream tasks like VLA (Visual-Language Action) and VLN (Visual-Language Navigation) to bridge the gap between theory and practical applications [1]. - It is advised to solidify knowledge in reinforcement learning before attempting to develop humanoid robots, as this area is still underdeveloped in the domestic market [1]. Group 2: Community and Resources - The "Embodied Intelligence Knowledge Planet" community serves as a comprehensive platform for sharing knowledge, with nearly 2000 members and aims to grow to 10,000 in two years [3][12]. - The community provides various resources, including technical routes, Q&A, and job opportunities, making it a valuable asset for both beginners and advanced researchers [4][13]. - Members can access a wealth of information, including over 30 technical routes, open-source projects, and data sets related to embodied intelligence [12][26]. Group 3: Technical Insights - The community addresses practical issues such as data collection, model deployment, and the challenges of sim-to-real transitions in robotics [4][5]. - It offers insights into various models and frameworks, including VLA and reinforcement learning, and discusses their applications in robotic tasks [5][6]. - The community also organizes forums and live discussions to keep members updated on the latest trends and challenges in the embodied intelligence field [4][11].
当准备开展VLA后,发现真的太难了。。。。。。
具身智能之心· 2025-09-12 12:02
Core Insights - The Vision-Language-Action (VLA) model represents a new paradigm in embodied intelligence, enabling robots to generate executable actions from language instructions and visual signals, thus enhancing their adaptability to complex environments [1][3]. - VLA breaks the traditional single-task limitations, allowing robots to make autonomous decisions in diverse scenarios, which is applicable in manufacturing, logistics, and home services [3]. - The VLA model has become a research hotspot, driving collaboration between academia and industry, with various cutting-edge projects like pi0, RT-2, OpenVLA, QUAR-VLA, and HumanVLA emerging [3][5]. Industry Development - The embodied intelligence sector is experiencing robust growth, with teams like Unitree, Zhiyuan, Xinghaitu, Galaxy General, and Zhujidongli transitioning from laboratories to commercialization [5]. - Major tech companies such as Huawei, JD.com, and Tencent are actively investing in this field, alongside international firms like Tesla and Figure AI [5]. Educational Initiatives - A specialized VLA research guidance course has been launched to assist individuals in quickly entering or transitioning within the VLA research domain, addressing the complexity of the related systems and frameworks [5]. - The course focuses on the perception-cognition-action loop, providing a comprehensive understanding of VLA's theoretical foundations and practical applications [7][8]. Technical Evolution - The course will analyze the technical evolution of the VLA paradigm, from early grasp pose detection to recent advancements like Diffusion Policy and multimodal foundational models [8]. - It will also explore core challenges in embodied intelligence, such as cross-domain generalization and long-term planning, while integrating large language models with robotic control systems [9]. Course Structure and Outcomes - The curriculum emphasizes a full-chain training approach, covering theoretical foundations, simulation environment setup, experimental design, and paper writing [15]. - Students will gain skills in academic research methodologies, including literature review, innovation extraction, and the ability to identify valuable research directions [15]. - The course aims to help students develop their research ideas, conduct preliminary experiments, and produce high-quality academic papers [15].
全新范式!LLaDA-VLA:首个基于大语言扩散模型的VLA模型
具身智能之心· 2025-09-12 00:05
Core Viewpoint - The article discusses the advancements in Vision-Language Models (VLMs) and introduces LLaDA-VLA, the first Vision-Language-Action Model developed using large language diffusion models, which demonstrates superior multi-task performance in robotic action generation [1][5][19]. Group 1: Introduction to LLaDA-VLA - LLaDA-VLA integrates Masked Diffusion Models (MDMs) into robotic action generation, leveraging pre-trained multimodal large language diffusion models for fine-tuning and enabling parallel action trajectory prediction [5][19]. - The model architecture consists of three core modules: a vision encoder for RGB feature extraction, a language diffusion backbone for integrating visual and language information, and a projector for mapping visual features to language token space [10][7]. Group 2: Key Technical Innovations - Two major breakthroughs are highlighted: - Localized Special-token Classification (LSC), which reduces cross-domain transfer difficulty by classifying only action-related special tokens, thus improving training efficiency [8][12]. - Hierarchical Action-Structured Decoding (HAD), which explicitly models hierarchical dependencies between actions, resulting in smoother and more reasonable generated trajectories [9][13]. Group 3: Performance Evaluation - LLaDA-VLA outperforms state-of-the-art methods across various environments, including SimplerEnv, CALVIN, and real robot WidowX, achieving significant improvements in success rates and task completion metrics [4][21]. - In specific task evaluations, LLaDA-VLA achieved an average success rate of 58% across multiple tasks, surpassing previous models [15]. Group 4: Experimental Results - The model demonstrated a notable increase in task completion rates and average task lengths compared to baseline models, validating the effectiveness of the proposed LSC and HAD strategies [18][14]. - In a comparative analysis, LLaDA-VLA achieved a success rate of 95.6% in a specific task, significantly higher than other models [14][18]. Group 5: Research Significance and Future Directions - The introduction of LLaDA-VLA establishes a solid foundation for applying large language diffusion models in robotic operations, paving the way for future research in this domain [19][21]. - The design strategies employed in LLaDA-VLA not only enhance model performance but also open new avenues for exploration in the field of embodied intelligence [19].
智源评测:用数据解码机器人足球赛中的具身智能
具身智能之心· 2025-09-12 00:05
以下文章来源于BAAI具身智能 ,作者BAAI具身智能 BAAI具身智能 . 北京智源人工智能研究院(BAAI)具身智能团队,致力于推动人类社会向更智能、高效和人性化的方向发展,推动技术创新和产业升级的同时,为解决现实 世界问题提供新的视角和解决方案。 编辑丨 BAAI具身智能 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在2025世界人形机器人运动会(WHRG)上,众多团队展现了具身智能算法与机器人本体深度融合的最新成果,机器人本体的自由度、稳定性和控 制能力明显提升,具身 智能算法赋予感知、推理、规划和决策能力助其在动态环境中能够挑战更复杂的任务。正 因人形机器人已发展为涵盖本体与 智能模型的复杂系统,如何科学系统地 评估其综合能力 ,已成为当前行业发展的关键瓶颈 。 传统的结果导向评价,如简单的输赢或任务完成情 况,已难以充分反映 具身智能在支撑 机器人 本体处于 复杂、动态和强对抗环境下的 性能 表现。以 足球比赛 为例,其中涌现的各种现象 ...
港大团队首发具身表征新范式,构建任务自适应感知框架
具身智能之心· 2025-09-12 00:05
编辑丨机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 本文的共同第一作者为香港大学 InfoBodied AI 实验室的博士生孙力和吴杰枫,合作者为刘瑞哲,陈枫。通讯作者为香港大学数据科学研究院及电机电子工程系助 理教授杨言超。InfoBodied AI 实验室近年来在 CVPR,ICML,Neurips,ICLR 等顶会上有多项代表性成果发表,与国内外知名高校,科研机构广泛开展合作。 出发点与研究背景 在具身智能中,策略学习通常需要依赖场景表征(scene representation)。然而,大多数现有多任务操作方法中的表征提取过程都是任务无关的(task-agnostic): 无论具身智能体要 "关抽屉" 还是 "堆积木",系统提取的特征的方式始终相同(利用同样的神经网络参数)。 想象一下,一个机器人在厨房里,既要能精准抓取易碎的鸡蛋,又要能搬运重型锅具。传统方法让机器人用同一套 "眼光" 观察不同的任务场景,这会使得场景表 征中包含大 ...
机器人走进工厂矿场,外滩这场机器人职业技能赛有意义!
具身智能之心· 2025-09-12 00:05
Core Viewpoint - The AI Science and Technology Competition showcased the practical applications of robotics in industrial inspection and emergency rescue, highlighting the advancements in embodied intelligence and its potential to enhance human capabilities in hazardous environments [2][9]. Group 1: Event Overview - The AI Science and Technology Competition featured a "Robot Vocational Skills Performance Competition" held on September 10, organized by Ant Group, with participation from four embodied intelligence manufacturers [2]. - The competition included various challenging tasks simulating real industrial and rescue scenarios, demonstrating the robots' capabilities and earning applause from the audience [2][3]. Group 2: Robot Performances - The first robot, Qiteng, successfully navigated a "dangerous terrain crossing" task, showcasing its rapid response and strong algorithmic foundation, which is crucial for exploration in remote areas [3][6]. - The team of Shuangying Aviation and Qiuzhi Technology presented a robotic dog that excelled in industrial inspection tasks, performing six complex actions with high precision, and later successfully "rescued" a simulated baby in a rescue scenario [5][9]. - The final robot, Zhongke Huiling, tackled a simulated mining explosion task, achieving millimeter-level precision in inserting explosives, demonstrating effective real-time correction and collaboration capabilities [7][10]. Group 3: Expert Insights - Experts emphasized that industrial inspection and emergency rescue are the most valuable application scenarios for robots, with current robotic capabilities being mature but still facing challenges in fine manipulation [6][9]. - The competition highlighted the importance of practical applications of technology, with a focus on real-world problems and scenarios, aiming to drive industry collaboration and innovation [9]. Group 4: Competition Impact - The competition attracted over 8,000 teams and nearly 20,000 participants from around 20 countries and regions, providing a platform for innovators and companies to showcase their advancements in AI hardware and applications [9]. - The event underscored the commitment to advancing robotics from mere demonstrations to practical industrial applications, aligning technology development with human needs [9].
当我们再说具身大小脑的时候究竟在说什么?
具身智能之心· 2025-09-11 05:53
Core Viewpoint - The exploration towards Artificial General Intelligence (AGI) highlights embodied intelligence as a key direction, focusing on the interaction and adaptation of intelligent agents within physical environments [1][3]. Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, driving advancements in embodied brain and cerebellum technologies [3]. - Major domestic companies like Huawei, JD.com, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build an ecosystem for embodied intelligence, while international firms like Tesla and investment institutions in the U.S. are focusing on foundational models and humanoid robot prototypes [5]. Technological Evolution - The development of embodied intelligence has progressed through several stages: - The first stage focused on grasp pose detection, which struggled with complex tasks due to a lack of context modeling [6]. - The second stage involved behavior cloning, allowing robots to imitate human tasks but revealing weaknesses in generalization and performance in multi-target scenarios [6]. - The third stage introduced Diffusion Policy methods, enhancing stability and generalization through sequence modeling [6][7]. - The fourth stage, emerging in 2025, explores the integration of VLA models with reinforcement learning and tactile sensing, addressing limitations in feedback and future prediction capabilities [9][11][12]. Product and Market Development - The evolution of embodied intelligence technologies has led to the emergence of various products, including humanoid robots, robotic arms, and quadrupedal robots, serving industries such as manufacturing, home services, and healthcare [14]. - The demand for engineering and system capabilities is increasing as the industry shifts from research to deployment, necessitating higher engineering skills [17]. Educational Initiatives - A comprehensive curriculum has been developed to assist learners in mastering the full spectrum of embodied intelligence algorithms, covering topics from basic tasks to advanced models like VLA and its integrations [14][20].
库克挤爆牙膏!5999元iPhone17上高刷,新款耳机能测心率+同传
具身智能之心· 2025-09-11 02:07
Core Viewpoint - The article discusses the recent Apple Spring event, highlighting the launch of the iPhone 17 series, AirPods Pro 3, and Apple Watch Series 11, emphasizing design, performance upgrades, and new features across the product lineup [2][14][100]. iPhone 17 Series - The iPhone 17 series includes four models with prices ranging from 5999 yuan to 9999 yuan, with the standard model now featuring adaptive 120Hz ProMotion display [14][24]. - The A19 chip in the iPhone 17 offers a 20% performance improvement over the A18, with a 3nm process and enhanced AI capabilities [22][23]. - The camera system features a 48MP dual-camera setup and an upgraded 1800MP Center Stage front camera, enhancing photo and video capabilities [25][28]. - Battery life is extended, with the iPhone 17 capable of 30 hours of video playback and quick charging options [36]. iPhone 17 Air - The iPhone 17 Air is the thinnest iPhone yet, measuring 5.6mm and weighing 165g, featuring a 6.5-inch 120Hz display [39][44]. - It is powered by the A19 Pro chip, with a peak performance three times that of the A18 Pro, and includes advanced wireless connectivity with WiFi-7 and Bluetooth 6 support [46][49]. - The camera system mirrors that of the iPhone 17, and it exclusively uses eSIM technology [58]. iPhone 17 Pro/Pro Max - The Pro models feature enhanced materials for better heat dissipation and a more robust design, with the Pro Max offering up to 39 hours of video playback [71][75]. - The camera capabilities are significantly upgraded, with up to 8x optical zoom and support for ProRAW and ProRes video formats [81][84]. AirPods Pro 3 - The new AirPods Pro 3 feature double the active noise cancellation of the previous generation and are designed for fitness enthusiasts with heart rate monitoring capabilities [89][90]. - They also support real-time translation and have a battery life of 6-10 hours depending on the mode [98]. Apple Watch Series 11 - The Series 11 is the thinnest and most comfortable Apple Watch yet, starting at 2999 yuan, and now supports 5G connectivity [101][105]. - New health features include high blood pressure notifications and sleep quality scoring, with a battery life of 24 hours [110][120]. - The lightweight SE 3 model also supports 5G and includes new health monitoring features [122][128]. Conclusion - The article concludes with a reflection on the significance of these product launches and their potential impact on the market, inviting readers to share their thoughts on which product they find most appealing [135].
西湖大学最新!ARFM:结合VLA模仿学习与强化学习的优势
具身智能之心· 2025-09-11 02:07
Core Viewpoint - The article discusses the limitations of current visual-language-action (VLA) models in complex tasks and introduces the Adaptive Reinforcement Flow Matching (ARFM) method to enhance their performance by integrating reinforcement learning (RL) capabilities with flow matching advantages [1][2][4]. Summary by Sections Current Status of VLA Models - VLA models based on flow matching have shown excellent performance in general robotic manipulation tasks, validated by large-scale pre-trained systems like RT-1 and PaLM-E, but they struggle with action precision in complex downstream tasks due to reliance on imitation learning [4][5]. Existing Solutions and Limitations - Previous attempts to fine-tune VLA models using offline RL methods, such as ReinboT, have been limited in effectiveness due to the indirect guidance of action prediction, highlighting the need for more effective offline RL fine-tuning methods [4][5]. Main Contributions - The ARFM method is introduced as a novel offline RL post-training approach specifically designed for VLA flow models, addressing the challenges of data quality extraction and improving the efficiency of offline RL fine-tuning [6][7]. Methodological Innovation - ARFM incorporates an adaptive scaling factor in the loss function to balance the advantages of RL while controlling gradient variance, leading to improved generalization, robustness against disturbances, and few-shot learning capabilities [6][8]. Experimental Validation - Extensive experiments on the LIBERO simulation benchmark and the UR5 robotic arm platform demonstrate that ARFM outperforms existing methods in various aspects, including generalization ability, robustness to dynamic disturbances, and efficiency in few-shot learning [6][8][29]. Core Algorithm Design - The ARFM framework is built around energy-weighted loss to integrate RL signals and an adaptive mechanism to ensure training stability, effectively overcoming the limitations of traditional imitation learning and existing offline RL fine-tuning methods [8][11]. Experimental Setup - The experiments utilized the LIBERO benchmark platform, which includes four core task suites, and real-world scenarios with the UR5 robotic arm, focusing on various manipulation tasks under different conditions [29][30]. Key Experimental Results - ARFM demonstrated superior performance in multi-task learning, action perturbation robustness, few-shot learning efficiency, and continual learning capabilities compared to baseline models, confirming its practical value in real-world robotic applications [32][35][38]. Conclusion - The ARFM method effectively balances the retention of RL advantage signals and the control of flow loss gradient variance, leading to enhanced performance in VLA flow models across various tasks and conditions, showcasing its applicability in real-world scenarios [49][47].