Workflow
具身智能之心
icon
Search documents
从300多篇工作来看, VLA是否为通向通用具身智能的必经之路?
具身智能之心· 2025-10-17 16:02
Core Insights - The emergence of Vision Language Action (VLA) models signifies a shift from traditional strategy-based control to a paradigm of general robotic technology, transforming visual language models (VLM) from passive sequence generators to active agents capable of manipulation and decision-making in complex, dynamic environments [2] Group 1: VLA Overview - The article discusses a comprehensive survey on advanced VLA methods, providing a clear taxonomy and systematic review of existing research [2] - VLA methods are categorized into several main paradigms: autoregressive, diffusion-based, reinforcement-based, hybrid methods, and specialized methods, with detailed examination of their motivations, core strategies, and implementations [2] - The survey integrates insights from over 300 recent studies, outlining the opportunities and challenges that will shape the development of scalable, general VLA methods [2] Group 2: Future Directions and Challenges - The review addresses key challenges and future development directions to advance VLA models and generalizable robotic technologies [2] - The live discussion will explore the origins of VLA, its research subdivisions, and the hot topics and future trends in VLA [5] Group 3: Event Details - The live event is scheduled for October 18, from 19:30 to 20:30, focusing on VLA as a prominent research direction in artificial intelligence [5] - Key highlights of the event include the classification of VLA research fields, the integration of VLA with reinforcement learning, and the Sim2Real concept [6]
穹彻智能获阿里投资,加速具身智能全链路技术突破
具身智能之心· 2025-10-17 08:12
Core Viewpoint - Qunche Intelligent, led by Professor Lu Cewu, a leader in the field of embodied intelligence, combines academic excellence with industry experience, possessing full-stack capabilities from technology research and development to commercial delivery [1] Group 1: Company Overview - Qunche Intelligent focuses on a force-based embodied intelligence brain technology, breaking through traditional trajectory control frameworks [1] - The company has developed a comprehensive autonomous decision-making system covering perception, cognition, planning, and execution [1] - It leverages multimodal large models and a rich accumulation of force perception data to achieve high-dimensional understanding and flexible operation of the physical world [1] Group 2: Recent Developments - Recently, Qunche Intelligent announced the completion of a new round of financing, with Alibaba Group as the investor and several existing shareholders participating [1] - The new funding will be used to accelerate technology product development, implement embodied applications, and expand industry ecosystems [1]
独家|穹彻智能获阿里新一轮融资,上交教授卢策吾领衔,突破无本体数据采集,打通具身智能全链路​
具身智能之心· 2025-10-17 07:46
Core Insights - Qunche Intelligent recently completed a new round of financing led by Alibaba Group, with multiple existing shareholders participating. The funds will be used to accelerate technology product development, implement embodied applications, and expand industry ecosystems [2][4]. Group 1: Company Overview - Qunche Intelligent was established at the end of 2023 and has previously completed several rounds of financing totaling hundreds of millions in Pre-A++ and Pre-A+++ rounds [4]. - The company focuses on embodied intelligence technology, rapidly iterating its self-developed large models for the physical world and launched the upgraded product Noematrix Brain 2.0 this year [4][8]. Group 2: Technological Advancements - Qunche Intelligent has made significant breakthroughs in key technology areas, including a no-ontology data collection scheme, a universal end-to-end model scheme, and a large-scale deployment system for human-machine collaboration [4]. - The company aims to streamline the entire process from data collection to deployment, covering the complete technical chain from data acquisition, model pre-training to post-training [4]. Group 3: Market Position and Collaborations - Qunche has established partnerships with several leading companies in the retail and home sectors to promote the mass delivery of integrated hardware and software embodied intelligence solutions [6]. - The company plans to leverage its advanced large model products and data-to-model closed-loop capabilities to continuously provide innovative and practical embodied intelligence solutions to clients and partners [6]. Group 4: Leadership and Vision - Qunche Intelligent is led by Professor Lu Ce Wu, a prominent figure in the field of embodied intelligence, who possesses both academic depth and industry experience, enabling the company to have full-stack capabilities from technology research and development to commercial delivery [8]. - The company’s core technology is based on force-driven embodied intelligence, breaking through traditional trajectory control frameworks to build a comprehensive autonomous decision-making system that covers perception, cognition, planning, and execution [8].
VLA可以赋于强化学习更智能的场景应用......
具身智能之心· 2025-10-17 04:01
Core Insights - The article discusses the importance of reinforcement learning (RL) in the development of embodied intelligent robots, highlighting its applications in various complex tasks such as stair climbing, running, and dancing [3][9] - It emphasizes the challenges faced by newcomers in the field of reinforcement learning, particularly in producing quality research papers due to the complexity and breadth of the subject [6][10] - To address these challenges, a specialized 1v6 mentoring course in reinforcement learning has been introduced, aimed at helping students produce publishable research papers [7][10] Group 1: Reinforcement Learning Applications - Reinforcement learning is crucial for gait control in humanoid and quadruped robots, enabling them to perform tasks in challenging environments [3][9] - The VLA+RL approach for robotic arms is gaining popularity in academia, enhancing the efficiency and smoothness of robotic operations [4][9] Group 2: Course Structure and Objectives - The 1v6 mentoring course is designed for graduate students and others needing guidance on research papers, featuring weekly live sessions and dedicated teaching assistants [8][10] - The course spans 14 weeks of intensive online training followed by 8 weeks of maintenance support, focusing on various aspects of research paper production, including idea confirmation, project implementation, and writing refinement [10][18] Group 3: Course Content and Deliverables - The curriculum includes topics such as reinforcement learning fundamentals, simulation environments, and writing guidance, with a focus on producing a research paper suitable for top conferences and journals [10][19] - Students will receive structured templates and support for writing and submission processes, ensuring they meet the standards of leading academic publications [10][29] Group 4: Instructor and Support - The course is led by experienced instructors with backgrounds in embodied intelligence and robotics, providing both theoretical knowledge and practical insights [27] - Continuous support is offered through a dedicated WeChat group for real-time Q&A, enhancing the learning experience [18][27]
AIR科研|X-VLA重磅开源,全面刷新机器人基准性能记录
具身智能之心· 2025-10-17 00:04
Core Insights - The article discusses the launch of the X-VLA model, a groundbreaking open-source model for embodied intelligence, achieving a significant milestone by completing a 120-minute autonomous folding task with only 0.9 billion parameters, setting a new performance benchmark in the field [3][19]. Group 1: Performance Breakthrough - X-VLA is the first model to achieve a fully open-source solution for long-duration autonomous tasks, overcoming challenges in complex autonomous operations [8]. - The model demonstrates superior efficiency with only 0.9 billion parameters while achieving state-of-the-art (SOTA) performance across five authoritative simulation benchmarks [8][19]. Group 2: Innovative Technology - The model introduces a Soft-Prompt mechanism to address the variability in robot platforms, enhancing adaptability and training efficiency [10]. - A multi-modal encoding strategy is employed to optimize resource allocation while ensuring information integrity during task execution [10]. - The action generation module utilizes flow-matching for probabilistic modeling of robot action sequences, improving trajectory smoothness and robustness in uncertain environments [10]. Group 3: Data Pre-training and Quality - A balanced data sampling strategy is implemented to ensure equitable training across heterogeneous datasets, preventing model bias [12]. - A rigorous data preprocessing pipeline enhances the temporal consistency and overall quality of state-action sequences [12]. - The model's training is guided by a semantic-action alignment standard, ensuring that the learned behaviors are causally related rather than superficial associations [12]. Group 4: Training Process and Techniques - The pre-training phase exhibits a linear growth trend in performance as model parameters and training data scale up, validating the effectiveness of the Soft-Prompt mechanism [15]. - During fine-tuning, X-VLA shows high data efficiency, requiring only small-scale task-specific data to achieve SOTA performance [16]. - Adaptive learning rate adjustments and a progressive warm-up strategy are employed to optimize training stability and efficiency [17]. Group 5: Experimental Results - X-VLA has demonstrated strong performance in real robot platforms, successfully completing various tasks, including unlimited-duration autonomous folding, showcasing its capability in complex long-range tasks [19].
仅用三五条样本击败英伟达,国内首个超少样本具身模型登场
具身智能之心· 2025-10-17 00:04
Core Insights - The article discusses the breakthrough in the field of embodied intelligence with the release of the first general-purpose few-shot embodied operation model, FAM-1, by the domestic startup FiveAges, which bridges the gap between visual language models and 3D robotic manipulation [2][5][18]. Data Scarcity and Challenges - Embodied intelligence faces a significant challenge due to the scarcity of data compared to natural language and visual fields, as real-world robotic operations involve complex physical interactions and real-time feedback, making data collection costly and inefficient [3]. - Current visual-language-action (VLA) models rely heavily on large-scale labeled data to compensate for their lack of generalization capabilities in practical applications [4]. FAM-1 Model Overview - FAM-1 utilizes a novel architecture called BridgeVLA, which allows for efficient knowledge transfer and spatial modeling between large visual language models and 3D robotic control [5][7]. - The model achieves significant breakthroughs in few-shot learning, cross-scene adaptation, and complex task understanding, requiring only 3-5 robot data points per task to achieve an impressive success rate of 97%, surpassing state-of-the-art (SOTA) models [5][14]. Technical Innovations - The model consists of two core modules: Knowledge-driven Pretraining (KP) and 3D Few-shot Fine-tuning (FF), which enhance its ability to generalize across different tasks and environments [9][12]. - The KP module builds a knowledge base from vast amounts of image and video data to improve the model's understanding of operational contexts, while the FF module aligns the outputs of VLM and VLA using 3D heatmaps, significantly reducing the dependency on labeled data [9][12]. Experimental Results - FAM-1 outperformed SOTA models in various international benchmarks, achieving an average success rate of 88.2% in tasks such as "Insert Peg" and "Open Drawer," with improvements of over 30% in average success rates compared to competitors [11]. - In real-world deployments, FAM-1 demonstrated a 97% success rate in basic tasks using only 3-5 samples, showcasing its robustness against various environmental challenges [15]. Future Directions - FiveAges aims to enhance the generalization, reliability, and adaptability of its foundational models for operational scenarios, promote their application in industrial settings, and develop general-purpose models for navigation tasks [20]. - The company is also exploring self-supervised learning strategies from unlabeled human operation videos, which could further lower the barriers to application in robotics [19].
3天搞定机械臂上的VLA完整部署:算法&项目实践(运动规划、视觉反馈、模仿学习等)
具身智能之心· 2025-10-17 00:04
Core Insights - The concept of "embodied intelligence" has been incorporated into government work reports, leading to a rapid increase in related projects across various sectors, with companies competing for talent [1] - The demand for skilled professionals in the field of robotic arms is high, with a supply-demand ratio of 1:7, resulting in companies offering attractive compensation packages, including annual salaries exceeding one million and stock options [1] Group 1: Challenges in Implementation - Many researchers and engineers lack practical project experience, facing difficulties when deploying algorithms from simulation environments to actual hardware [3] - Two main reasons for these challenges include insufficient mastery of classic methods for robotic arm operation and inadequate engineering practice skills, hindering effective integration of various methods [3] Group 2: Training Initiatives - Deep Blue Academy has partnered with notable figures and companies to launch a hands-on training program focused on robotic arm operation and grasping techniques [5] - The program offers practical opportunities with real robotic arms and covers key technologies such as motion planning, visual feedback, and imitation learning [5][6] Group 3: Course Projects - Project 1 involves achieving 1:1 precise mapping between RViz models and real machines, integrating RRT* path planning and inverse kinematics algorithms to address robotic arm control and obstacle avoidance [6] - Project 2 focuses on combining machine vision with rule-based algorithms and reinforcement learning for precise identification and adaptive grasping of specific target objects [6] - Project 3 establishes a 1:1 remote operation data collection platform, utilizing visual language models for efficient transfer of human operational skills to robotic arms [6] Group 4: Instructor and Collaborating Entities - The instructor, Qin Tong, is a prominent figure in the field, with a background in robotics and experience in developing intelligent driving systems [8] - The collaborating company, Songling Robotics, is recognized as a leading global platform in robotic technology, contributing to the practical training initiatives [10][11]
相约杭州!具身智能之心首次赞助IROS并现场颁奖
具身智能之心· 2025-10-17 00:04
在机器人系统不断迈向真实世界的进程中,感知系统的稳定性、鲁棒性与泛化能力正成为制约其 部署能力的关键因素。面对动态人群、恶劣天气、传感器故障、跨平台部署等复杂环境条件,传 统感知算法往往面临性能大幅下降的挑战。 | Track 1: Driving with Language | | --- | | egistration | From June 2025 | | --- | --- | | ompetition Server Online | June 15th, 2025 | | hase One Deadline | August 15th, 2025 | | hase Two Deadline | September 15th, 2025 | | ward Decision @ IROS 2025 | October 19th, 2025 | 该赛事由新加坡国立大学、南洋理工大学、香港科技大学、香港科技大学(广州)、密歇根大学 为此, RoboSense Challenge 2025 应运而生。该挑战赛旨在系统性评估机器人在真实场景下的感 知与理解能力,推动多模态感知模型的稳健性研究,鼓励跨模态融合与 ...
成立几个月,刚刚,一家具身智能明星公司原地解散了......
具身智能之心· 2025-10-16 08:05
Core Viewpoint - The article discusses the sudden dissolution of OneStar Robotics, a startup in the field of embodied intelligence, which was founded only a few months ago and had recently announced significant funding and high-profile hires [3][4][6]. Company Overview - OneStar Robotics was established on May 9, 2025, by Li Xingxing, the son of Geely's founder Li Shufu, and was positioned as a key player in Geely's robotics strategy [10][11]. - The company aimed to innovate in the "embodied intelligence" sector, focusing on practical applications rather than flashy demonstrations [13][14]. Recent Developments - In July, OneStar Robotics announced the completion of a multi-hundred million yuan "friends and family" funding round, primarily from Geely's ecosystem [16]. - The company secured a notable talent, Ding Yan, from Shanghai AI Lab, who became the CTO and co-founder [17]. - In August, a partnership was formed with Fudan University to establish a joint laboratory, and the first product, "Star Wheel 1," was launched [18]. - By September 17, the company completed another multi-hundred million yuan seed funding round, attracting various investors [19]. Sudden Dissolution - Despite its rapid growth and significant backing, OneStar Robotics was reported to have dissolved its team abruptly, with the reasons for this dissolution still unclear [8][20]. - There are indications that the existing Geely-related platforms and businesses may revert to Geely Auto Group, while the technology team, led by Ding Yan, might pursue independent ventures [9].
输出你的insights,邀请更多优秀的具身合作伙伴加入我们~
具身智能之心· 2025-10-16 07:00
Core Viewpoint - The article emphasizes the importance of collaboration and innovation in the field of embodied intelligence, aiming to create a platform that adds real value to the industry [2][3]. Group 1: Content Creation and Collaboration - The company invites participation from the community for content sharing through various platforms such as WeChat, Bilibili, and video channels for technical talks and roundtable discussions [4]. - There is an emphasis on developing online courses and practical projects to enhance the quality of content in the field [4]. Group 2: Main Directions of Focus - The primary areas of focus include vla, vln, reinforcement learning, embodied simulation, Diffusion Policy, multimodal large models, mobile operations, end-to-end processes, and model deployment [5]. Group 3: Future Engagement - The company is open to discussions regarding compensation and collaboration methods, encouraging interested parties to reach out for further communication [6].