触觉感知
Search documents
TACTILE-VLA:激活VLA模型的物理知识以实现触觉泛化(清华大学最新)
自动驾驶之心· 2025-07-16 04:05
Core Insights - The article discusses the development of Tactile-VLA, a model that integrates visual, language, action, and tactile perception to enhance robotic capabilities in contact-intensive tasks [2][6][20]. Group 1: Background and Core Issues - Visual-language-action (VLA) models are crucial for general-purpose robotic agents, but they struggle in contact-intensive scenarios due to a lack of tactile perception [2]. - Tactile perception provides essential feedback for physical interactions, which is often missing in existing models [2]. Group 2: Key Innovations and Research Goals - The core finding is that VLA models contain prior knowledge of physical interactions, which can be activated through tactile sensors for zero-shot generalization in contact tasks [6]. - Tactile-VLA framework introduces tactile perception as a primary modality, allowing direct mapping from abstract semantics to physical force control [6]. - The mixed position-force controller innovatively converts force targets into position adjustment commands, addressing the challenge of coordinating position and force control [6][10]. - Tactile-VLA-CoT variant incorporates a chain of thought (CoT) reasoning mechanism, enabling robots to analyze failure causes and autonomously adjust strategies [6][14]. Group 3: Overall Architecture - Tactile-VLA's architecture features four key modules, emphasizing token-level fusion through a non-causal attention mechanism for true semantic representation rooted in physical reality [9]. Group 4: Mixed Position-Force Control Mechanism - The mixed control strategy prioritizes position control while introducing force feedback adjustments when necessary, ensuring precision in movement and force control [10][12]. - The design separates external net force from internal grasping force, allowing for refined force adjustments suitable for contact-intensive tasks [13]. Group 5: Chain of Thought Reasoning Mechanism - Tactile-VLA-CoT enhances adaptive capabilities by transforming the adjustment process into an interpretable reasoning process, improving robustness in complex tasks [14][15]. Group 6: Data Collection Methods - A specialized data collection system was developed to obtain high-quality tactile-language aligned data, addressing the issue of missing force feedback in traditional remote operations [16][19]. Group 7: Experimental Validation and Results Analysis - Three experimental groups were designed to validate Tactile-VLA's capabilities in instruction following, common sense application, and adaptive reasoning [20]. - In the instruction following experiment, Tactile-VLA demonstrated the ability to learn the semantic meaning of force-related language, achieving a success rate of 35% in USB tasks and 90% in charger tasks [23]. - The model effectively utilized common sense knowledge to adjust interaction forces based on object properties, achieving significant performance improvements over baseline models [24][30]. - In the adaptive reasoning experiment, Tactile-VLA-CoT achieved an 80% success rate in a blackboard task, showcasing its ability to diagnose and correct failures autonomously [28][32].
TACTILE-VLA:激活VLA模型的物理知识以实现触觉泛化(清华大学最新)
具身智能之心· 2025-07-15 07:55
Core Insights - The article discusses the development of Tactile-VLA, a model that integrates visual, language, action, and tactile perception to enhance robotic capabilities in contact-intensive tasks [2][6]. Group 1: Background and Core Issues - Visual-language-action (VLA) models have strong semantic understanding and cross-modal generalization capabilities, but they struggle in contact-intensive scenarios due to a lack of tactile perception [2][6]. - Tactile perception provides critical feedback in physical interactions, such as friction and material properties, which are essential for tasks requiring fine motor control [2][6]. Group 2: Key Innovations and Research Goals - The core finding is that VLA models contain prior knowledge of physical interactions, which can be activated by connecting this knowledge with tactile sensors, enabling zero-shot generalization in contact-intensive tasks [6][7]. - Tactile-VLA framework introduces tactile perception as a primary modality, allowing for direct mapping from abstract semantics to physical force control [7]. - The mixed position-force controller innovatively converts force targets into position adjustment commands, addressing the challenge of coordinating position and force control [7]. Group 3: Architecture and Mechanisms - Tactile-VLA's architecture includes four key modules: instruction adherence to tactile cues, application of tactile-related common sense, adaptive reasoning through tactile feedback, and a multi-modal encoder for unified token representation [12][13]. - The mixed position-force control mechanism ensures precision in position while allowing for fine-tuned force adjustments during contact tasks [13]. - The Tactile-VLA-CoT variant incorporates a chain of thought (CoT) reasoning mechanism, enabling robots to analyze failure causes based on tactile feedback and autonomously adjust strategies [13][14]. Group 4: Experimental Validation and Results - Three experimental setups were designed to validate Tactile-VLA's capabilities in instruction adherence, common sense application, and adaptive reasoning [17]. - In the instruction adherence experiment, Tactile-VLA achieved a success rate of 35% in USB tasks and 90% in charger tasks, significantly outperforming baseline models [21][22]. - The common sense application experiment demonstrated Tactile-VLA's ability to adjust interaction forces based on object properties, achieving success rates of 90%-100% for known objects and 80%-100% for unknown objects [27]. - The adaptive reasoning experiment showed that Tactile-VLA-CoT could successfully complete a blackboard task with an 80% success rate, demonstrating its problem-solving capabilities through reasoning [33].
InformationFusion期刊发表:Touch100k用语言解锁触觉感知新维度
机器人大讲堂· 2025-06-08 08:47
Core Insights - The article discusses the significance of touch in enhancing the perception and interaction capabilities of robots, highlighting the development of the Touch100k dataset and the TLV-Link pre-training method [1][11]. Group 1: Touch100k Dataset - Touch100k is the first large-scale dataset that integrates tactile, multi-granular language, and visual modalities, aiming to expand tactile perception from "seeing" and "touching" to "expressing" through language [2][11]. - The dataset consists of tactile images, visual images, and multi-granular language descriptions, with tactile and visual images sourced from publicly available datasets and language descriptions generated through human-machine collaboration [2][11]. Group 2: TLV-Link Method - TLV-Link is a multi-modal pre-training method designed for tactile representation using the Touch100k dataset, consisting of two phases: course representation and modality alignment [6][11]. - The course representation phase employs a "teacher-student" paradigm where a well-trained visual encoder transfers knowledge to a tactile encoder, gradually reducing the teacher model's influence as the student model improves [6][11]. Group 3: Experiments and Analysis - Experiments evaluate TLV-Link from the perspectives of tactile representation and zero-shot tactile understanding, demonstrating its effectiveness in material property recognition and robot grasping prediction tasks [8][11]. - Results indicate that the Touch100k dataset is practical, and TLV-Link shows significant advantages over other models in both linear probing and zero-shot evaluations [9][11]. Group 4: Summary - The research establishes a foundational dataset and method for tactile representation learning, enhancing the modeling capabilities of tactile information and paving the way for applications in robotic perception and human-robot interaction [11].
帕西尼获比亚迪数亿元融资,具身智能融资热度持续升温
Nan Fang Du Shi Bao· 2025-04-28 09:55
Group 1 - The core point of the news is that PaxiNi Sensory Technology has received a strategic investment of several hundred million yuan from BYD, marking BYD's largest single investment in the field of embodied intelligence to date [1][3] - This funding will be used to advance the research and mass production of PaxiNi's multi-dimensional tactile sensing technology and humanoid robot product matrix [1][3] - PaxiNi, established in 2021, focuses on the independent research and industrialization of high-precision multi-dimensional tactile sensors, breaking the overseas technology monopoly with its 6D Hall array tactile sensor [3] Group 2 - The tactile sensing technology is considered a key component of the embodied intelligence industry and is ranked fourth among China's 35 critical technologies [3][4] - The investment landscape for humanoid robots has significantly increased in 2023, with 37 financing events in the first quarter alone, totaling approximately 3.5 billion yuan [4] - Major cities like Beijing, Shenzhen, and the Yangtze River Delta remain the primary hubs for entrepreneurship and investment in this sector, with many companies established in 2023 and 2024 [4] Group 3 - Despite the rising interest and investment in humanoid robots, challenges remain in commercializing these technologies, with uncertainties regarding application scenarios and profit models [4][5] - The industry faces the important challenge of balancing technological innovation with sustainable commercial development as it moves forward [5]
比亚迪具身智能领域最大单笔投资落地,超亿元瞄准帕西尼触觉感知技术
3 6 Ke· 2025-04-28 02:25
Core Viewpoint - PaXini Tech, a leader in tactile perception and humanoid robotics, has received a strategic investment exceeding 100 million yuan from BYD, marking BYD's first major equity investment of the year and its largest in the embodied intelligence sector to date [1][3][12] Company Overview - PaXini Tech was established in June 2021 and is one of the few companies in China capable of independently controlling high-precision multi-dimensional tactile sensors, with a complete product matrix including "sensor - dexterous hand - humanoid robot" [1][4] - The founding team comes from Waseda University's Kanno Robotics Laboratory, known for creating the world's first humanoid robot [1][4] Technology and Product Development - PaXini has developed a 6D Hall array tactile sensor that provides robots with human-like tactile perception capabilities, forming a comprehensive embodied intelligence technology system from hardware packaging to algorithm integration [1][4][6] - The high-precision array tactile sensor can achieve 1 million tactile samples per second, capturing 15 types of tactile information with a precision force recognition capability of 0.01N [5][6] Strategic Investment Implications - BYD's investment reflects a deep recognition of PaXini's technology and a strategic move to enhance its position in the embodied intelligence field [3][12] - The collaboration is expected to leverage both companies' strengths, enabling rapid product iteration and industry leadership for PaXini while allowing BYD to advance its smart automotive upgrades [10][12] Market Position and Future Outlook - PaXini's 6D Hall array tactile sensors are already being utilized by leading humanoid robot companies globally, establishing it as the only supplier of tactile sensors in large-scale humanoid robot applications [12] - The investment signifies a pivotal moment in the tactile perception and embodied intelligence sectors, with PaXini poised to lead the transformation towards practical implementation and industrialization of embodied intelligence technologies [11][12]