Workflow
机器人操作
icon
Search documents
港股异动 | 速腾聚创(02498)涨超4% 公司受益ADAS及高阶智驾快速发展 AC系列有望打开长期增长空间
智通财经网· 2026-01-06 06:29
国信证券指出,车端,受益于ADAS及高阶智驾快速发展,公司在手订单充沛。目前,公司已累计获得 32家车企及Tier1的144款车型定点。机器人端,公司依托硬件、芯片、AI等技术,为机器人提供增量零 部件及解决方案。国盛证券认为AC系列打开长期增长空间,看好速腾聚创激光雷达产品的订单需求和 商业化潜力。 智通财经APP获悉,速腾聚创(02498)涨超4%,截至发稿,涨4.12%,报38.96港元,成交额4.01亿港元。 消息面上,据RoboSense速腾聚创官方信息,真正的机器人操作之眼AC2将在CES 2026中面向全球机器 人市场展出。据悉,AC2是业界首款同时集成全固态dToF激光雷达、双目RGB相机、IMU的超级传感器 系统,将帮助目前机器人突破跨复杂场景精细操作难题,可广泛应用于人形机器人、仓储AGV、家庭 机器人、数字孪生等场景。 ...
速腾聚创涨超4% 公司受益ADAS及高阶智驾快速发展 AC系列有望打开长期增长空间
Zhi Tong Cai Jing· 2026-01-06 06:27
消息面上,据RoboSense速腾聚创官方信息,真正的机器人操作之眼AC2将在CES2026中面向全球机器 人市场展出。据悉,AC2是业界首款同时集成全固态dToF激光雷达、双目RGB相机、IMU的超级传感器 系统,将帮助目前机器人突破跨复杂场景精细操作难题,可广泛应用于人形机器人、仓储AGV、家庭 机器人、数字孪生等场景。 速腾聚创(02498)涨超4%,截至发稿,涨4.12%,报38.96港元,成交额4.01亿港元。 国信证券指出,车端,受益于ADAS及高阶智驾快速发展,公司在手订单充沛。目前,公司已累计获得 32家车企及Tier1的144款车型定点。机器人端,公司依托硬件、芯片、AI等技术,为机器人提供增量零 部件及解决方案。国盛证券(002670)认为AC系列打开长期增长空间,看好速腾聚创激光雷达产品的 订单需求和商业化潜力。 ...
如何让机器人学会使用螺丝刀、拧紧螺母?加州伯克利给出了答案!
机器人大讲堂· 2025-12-08 09:03
Core Insights - The article discusses the challenges robots face in performing precise tasks like screwing and fastening, which are relatively easy for humans due to complex friction and tactile feedback issues [1] - A new framework called DexScrew has been developed by a research team from UC Berkeley, which allows robots to perform these tasks without visual reliance, using tactile and temporal information instead [3] Summary by Sections Research Method Overview - The DexScrew framework consists of a three-step process: simplified simulation to develop core skills, remote operation to collect real-world data, and behavior cloning to train precise tactile strategies [4][12] Step 1: Simplified Simulation - The first step involves creating a highly simplified model of the screws and nuts, focusing on the core rotational skills rather than complex details like thread structure [5][8] - The training uses a "prophet strategy + sensory-motor strategy" approach to quickly find optimal rotational actions and prepare for real-world deployment [8][9] Step 2: Remote Operation for Real-World Data - The second step involves using remote operation to gather real-world multi-sensory data, which includes joint movement data and tactile signals from the robot's fingertips [11][12] - A total of 50 trajectories for nut tasks and 72 for screwdriver tasks were collected, creating a comprehensive dataset for training [11] Step 3: Behavior Cloning for Tactile Strategies - The final step employs behavior cloning to allow the robot to mimic successful actions from the remote operation while integrating tactile feedback and temporal information [12][13] - The strategy's neural network is designed to predict future actions based on past movements and tactile signals, enhancing the robot's ability to adjust in real-time [13] Performance Testing - The DexScrew strategy was tested on various shapes of nuts and showed a fastening progress ratio exceeding 95%, with the cross-shaped nut reaching 98.75% [16][17] - In screwdriver tasks, the DexScrew strategy achieved a progress ratio of 95% with an average completion time of 187.87 seconds, significantly outperforming traditional methods [19][20] Robustness and Adaptability - The strategy demonstrated strong resistance to disturbances, quickly readjusting the robot's position and maintaining task continuity even under external forces [24][25] - The article emphasizes the importance of tactile feedback in enhancing performance, particularly in complex or slippery scenarios [25][27] Conclusion and Future Directions - DexScrew not only addresses specific tasks but also provides a scalable solution for dexterous operations, avoiding the pitfalls of traditional high-fidelity simulations [28] - The framework lays the groundwork for future applications in industrial assembly, home services, and precision manufacturing [28]
SpatialActor:解耦语义与几何,为具身智能注入强鲁棒空间基因
具身智能之心· 2025-12-05 16:02
Core Insights - The article discusses the development of SpatialActor, a robust spatial representation framework for robotic manipulation, which addresses challenges related to precise spatial understanding, sensor noise, and effective interaction [21][24] - SpatialActor separates semantic information from geometric information, enhancing the robot's ability to understand tasks and accurately perceive its environment [21][6] Methodology and Architecture - SpatialActor employs a "dual-stream disentanglement and fusion" architecture, integrating semantic understanding from visual language models (VLM) and precise geometric control from 3D representations [6][21] - The architecture includes independent visual and depth encoders, with a Semantic-Guided Geometry Module (SGM) that adaptively fuses robust geometric priors with fine-grained depth features [9][10] - A Spatial Transformer (SPT) establishes accurate 2D to 3D mappings and integrates multi-modal features, crucial for generating precise actions [12][9] Performance Evaluation - In simulations, SpatialActor achieved an average success rate of 87.4%, outperforming the previous state-of-the-art model RVT-2 by 6.0% [13][19] - The model demonstrated significant robustness against noise, with performance improvements of 13.9% to 19.4% across different noise levels compared to RVT-2 [14][19] - Real-world experiments showed SpatialActor consistently outperforming RVT-2 by approximately 20% across various tasks, confirming its effectiveness in complex environments [19][18] Conclusion - The results highlight the importance of disentangled spatial representations in developing more robust and generalizable robotic systems, with SpatialActor showing superior performance in diverse conditions [21][20]
原力灵机提出ManiAgent!会 “动手”,会 “思考”,还会“采数据”!
具身智能之心· 2025-10-20 10:00
Core Insights - The article introduces ManiAgent, an innovative agentic framework designed for general robotic manipulation tasks, addressing limitations in existing Vision-Language-Action (VLA) models in complex reasoning and long-term task planning [1][2][26]. Group 1: Framework Overview - ManiAgent consists of multiple agents that collaboratively handle environment perception, sub-task decomposition, and action generation, enabling efficient responses to complex operational scenarios [2][10]. - The framework employs four key technologies: tool invocation, context engineering, real-time optimization, and automated data collection, creating a complete technical link from perception to action execution [8][12]. Group 2: Performance Metrics - In the SimplerEnv benchmark tests, ManiAgent achieved a task success rate of 86.8%, while in real-world pick-and-place tasks, the success rate reached 95.8% [2][10][28]. - The high success rates indicate that ManiAgent can serve as an effective automated data collection tool, generating training data that can match the performance of models trained on manually annotated datasets [2][10]. Group 3: Methodology - The framework includes four types of agents: 1. Scene perception agent, which generates task-relevant scene descriptions using visual language models [11]. 2. Reasoning agent, which evaluates task states and proposes achievable sub-tasks using large language models [11]. 3. Object-level perception agent, which identifies target objects and extracts detailed information for action generation [11]. 4. Controller agent, which generates executable action sequences based on sub-task descriptions and object details [11]. Group 4: Data Collection and Optimization - The automated data collection system is designed to operate with minimal human intervention, significantly reducing labor costs while ensuring high-quality data for VLA model training [12][21]. - The framework incorporates a context processing mechanism to enhance task relevance and information effectiveness, alongside a caching mechanism to reduce action generation delays [12][17]. Group 5: Experimental Results - In the SimplerEnv simulation environment, various tasks demonstrated an average success rate of 86.8%, with specific tasks achieving rates as high as 95.8% [22][28]. - Real-world experiments with the WidowX 250S robotic arm showed a range of tasks with success rates, indicating the framework's versatility across different operational contexts [25][28].
史上最全robot manipulation综述,多达1200篇!八家机构联合发布
自动驾驶之心· 2025-10-14 23:33
Core Insights - The article discusses the rapid advancements in artificial intelligence, particularly in embodied intelligence, which connects cognition and action, emphasizing the importance of robot manipulation in achieving general artificial intelligence (AGI) [5][9]. Summary by Sections Overview of Robot Manipulation - The paper titled "Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey" provides a comprehensive overview of the field of robot manipulation, detailing the evolution from rule-based control to intelligent control systems that integrate reinforcement learning and large models [6][10]. Key Challenges in Embodied Intelligence - Robot manipulation is identified as a core challenge in embodied intelligence due to its requirement for seamless integration of perception, planning, and control, which is essential for real-world interactions in diverse and unstructured environments [9][10]. Unified Framework - A unified understanding framework is proposed, which expands the traditional high-level planning and low-level control paradigm to include language, code, motion, affordance, and 3D representation, enhancing the semantic decision-making role of high-level planning [11][21]. Classification of Learning Control - A novel classification method for low-level learning control is introduced, dividing it into input modeling, latent learning, and policy learning, providing a systematic perspective for research in low-level control [24][22]. Bottlenecks in Robot Manipulation - The article identifies two major bottlenecks in robot manipulation: data collection and utilization, and system generalization capabilities, summarizing existing research progress and solutions for these challenges [27][28]. Future Directions - Four key future directions are highlighted: building a true "robot brain" for general cognition and control, breaking data bottlenecks for scalable data generation and utilization, enhancing multimodal perception for complex object interactions, and ensuring human-robot coexistence safety [35][33].
史上最全robot manioulation综述,多达1200篇!西交,港科,北大等八家机构联合发布
具身智能之心· 2025-10-14 03:50
Core Insights - The article discusses the rapid advancements in artificial intelligence, particularly in embodied intelligence, which connects cognition and action, emphasizing the importance of robot manipulation in achieving general artificial intelligence (AGI) [3][4]. Summary by Sections Overview of Embodied Intelligence - Embodied intelligence is highlighted as a crucial frontier that enables agents to perceive, reason, and act in real environments, moving from mere language understanding to actionable intelligence [3]. Paradigm Shift in Robot Manipulation - The research in robot manipulation is undergoing a paradigm shift, integrating reinforcement learning, imitation learning, and large models into intelligent control systems [4][6]. Comprehensive Survey of Robot Manipulation - A comprehensive survey titled "Towards a Unified Understanding of Robot Manipulation" systematically organizes over 1000 references, covering hardware, control foundations, task and data systems, and cross-modal generalization research [4][6][7]. Unified Framework for Understanding Robot Manipulation - The article proposes a unified framework that extends traditional high-level planning and low-level control classifications, incorporating language, code, motion, affordance, and 3D representations [9][20]. Key Bottlenecks in Robot Manipulation - Two major bottlenecks in robot manipulation are identified: data collection and utilization, and system generalization capabilities, with a detailed analysis of existing solutions [27][28]. Future Directions - Four key future directions are proposed: building a true "robot brain" for general cognition and control, breaking data bottlenecks for scalable data generation and utilization, enhancing multi-modal perception for complex interactions, and ensuring human-robot coexistence safety [34].
硬件不是问题,理解才是门槛:为什么机器人还没走进你家
锦秋集· 2025-09-29 13:40
Core Viewpoint - The article discusses the limitations of current robotics technology, emphasizing that while hardware has advanced significantly, the real challenge lies in robots' ability to understand and predict physical interactions in the world, which is essential for practical applications in everyday environments [2][20]. Group 1: Learning-Based Dynamics Models - The article reviews the application of learning-based dynamics models in robotic operations, focusing on how these models can predict physical interactions from sensory data, allowing robots to perform complex tasks [8][20]. - Learning-based dynamics models face challenges in designing efficient state representation methods, which directly impact the model's generalization ability and data efficiency [9][20]. - Various state representation methods are discussed, including raw sensory data, latent representations, particle representations, keypoint representations, and object-centric representations, each with its advantages and disadvantages [10][11][17][20]. Group 2: Integration with Control Methods - The article explores how dynamics models can be integrated with control methods, particularly in motion planning and policy learning applications, enabling robots to autonomously plan and adjust operations in complex environments [12][14][20]. - Motion planning optimizes paths or trajectories to guide robots in task execution without precise models, while policy learning directly maps sensory data to action strategies [13][14]. Group 3: Future Research Directions - Future research will focus on enhancing the robustness of learning models, especially in partially observable and complex environments, with multi-modal perception and uncertainty quantification being key areas of exploration [15][16][20]. - The article highlights the importance of state representation methods in improving the performance of learning-based dynamics models, emphasizing the need for structured prior knowledge to efficiently process information [24][25][20].
最新综述!多模态融合与VLM在具身机器人领域中的方法盘点
具身智能之心· 2025-09-01 04:02
Core Insights - The article discusses the transformative impact of Multimodal Fusion and Vision-Language Models (VLMs) on robot vision, enabling robots to evolve from simple mechanical executors to intelligent partners capable of understanding and interacting with complex environments [3][4][5]. Multimodal Fusion in Robot Vision - Multimodal fusion integrates various data types such as RGB images, depth information, LiDAR point clouds, language, and tactile data, significantly enhancing robots' perception and understanding of their surroundings [3][4][9]. - The main fusion strategies have evolved from early explicit concatenation to implicit collaboration within unified architectures, improving feature extraction and task prediction [10][11]. Applications of Multimodal Fusion - Semantic scene understanding is crucial for robots to recognize objects and their relationships, where multimodal fusion greatly improves accuracy and robustness in complex environments [9][10]. - 3D object detection is vital for autonomous systems, combining data from cameras, LiDAR, and radar to enhance environmental understanding [16][19]. - Embodied navigation allows robots to explore and act in real environments, focusing on goal-oriented, instruction-following, and dialogue-based navigation methods [24][26][27][28]. Vision-Language Models (VLMs) - VLMs have advanced significantly, enabling robots to understand spatial layouts, object properties, and semantic information while executing tasks [46][47]. - The evolution of VLMs has shifted from basic models to more sophisticated systems capable of multimodal understanding and interaction, enhancing their applicability in various tasks [53][54]. Future Directions - The article identifies key challenges in deploying VLMs on robotic platforms, including sensor heterogeneity, semantic discrepancies, and the need for real-time performance optimization [58]. - Future research may focus on structured spatial modeling, improving system interpretability, and developing cognitive VLM architectures for long-term learning capabilities [58][59].
VLA之外,具身+VA工作汇总
自动驾驶之心· 2025-07-14 10:36
Core Insights - The article focuses on advancements in embodied intelligence and robotic manipulation, highlighting various research projects and methodologies aimed at improving robot learning and performance in real-world tasks [2][3][4]. Group 1: 2025 Research Highlights - Numerous projects are set for 2025, including "Steering Your Diffusion Policy with Latent Space Reinforcement Learning" and "Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation," which aim to enhance robotic capabilities in manipulation and interaction [2]. - The "BEHAVIOR Robot Suite" aims to streamline real-world whole-body manipulation for everyday household activities, indicating a focus on practical applications of robotic technology [2]. - "You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations" emphasizes the potential for robots to learn complex tasks from minimal demonstrations, showcasing advancements in imitation learning [2]. Group 2: Methodological Innovations - The article discusses various innovative methodologies such as "Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning," which aims to improve the adaptability of robots in different environments [2]. - "Learning Dexterous In-Hand Manipulation with Multifingered Hands via Visuomotor Diffusion" highlights the focus on enhancing dexterity in robotic hands, crucial for complex manipulation tasks [4]. - "Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation" indicates a trend towards using synthetic data to train robots, which can significantly reduce the need for real-world data collection [7]. Group 3: Future Directions - The research agenda for 2024 and beyond includes projects like "Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching," which suggests a shift towards utilizing advanced data representations for improved learning outcomes [9]. - "Zero-Shot Framework from Image Generation World Model to Robotic Manipulation" indicates a future direction where robots can generalize from visual data without prior specific training, enhancing their versatility [9]. - The emphasis on "Human-to-Robot Data Augmentation for Robot Pre-training from Videos" reflects a growing interest in leveraging human demonstrations to improve robotic learning efficiency [7].