Vidar

Search documents
如何做到的?20分钟机器人真机数据,即可跨本体泛化双臂任务
具身智能之心· 2025-08-11 00:14
Core Insights - Vidar represents a significant breakthrough in the field of embodied intelligence, being the first global model to transfer video understanding capabilities to physical decision-making systems [2] - The model innovatively constructs a multi-view video prediction framework that supports collaborative tasks for dual-arm robots, demonstrating state-of-the-art performance while exhibiting significant few-shot learning advantages [2] - The model requires only 20 minutes of real robot data to generalize quickly to new robot bodies, significantly reducing the data requirements compared to industry-leading models [2][6] Group 1 - Vidar is based on a general video model and achieves systematic migration of video understanding capabilities [2] - The model's data requirement is approximately one-eighth of the leading RDT model and one-thousand-two-hundredth of π0.5, greatly lowering the barrier for large-scale generalization in robotics [2] - After fine-tuning, the model can perform multi-view dual-arm tasks effectively, executing commands as instructed [2] Group 2 - The Tsinghua University team proposed a new paradigm to address challenges in embodied intelligence, breaking down tasks into "prediction + execution" [6] - This approach utilizes visual generative models like Vidar to learn target predictions from vast amounts of internet video, while employing task-agnostic inverse dynamics models like Anypos for action execution [6] - The method significantly reduces the dependency on large-scale paired action-instruction data, requiring only 20 minutes of task data to achieve high generalization [6] Group 3 - The presentation includes an overview and demonstration video, discussing the rationale for utilizing video modalities and considering embodied video base models [8] - It covers the training of Vidar and the concept of task-agnostic actions with AnyPos [8] - The speaker, Hengkai Tan, is a PhD student at Tsinghua University, focusing on the integration of embodied large models and multi-modal large models [11]
2025世界人工智能大会这些新品最值得关注!一文看懂→
第一财经· 2025-07-29 10:35
Core Viewpoint - The article highlights the significant advancements in robotics showcased at the World Artificial Intelligence Conference (WAIC) 2025, emphasizing the shift from remote-controlled to autonomous robots, driven by new perception-action models and world models developed by various companies [3][4][5]. Group 1: Robotics Developments - Nearly all humanoid robot companies, including Zhiyuan, Yushu Technology, and Galaxy General, showcased their progress at WAIC 2025, with a focus on software advancements rather than hardware changes [4]. - Companies like Tencent and SenseTime introduced perception-action models aimed at improving robot interactions with their environments, marking a paradigm shift in robotics [4][5]. - Zhiyuan's "Genie Envisioner" world model allows robots to pre-visualize actions before execution, enhancing their operational capabilities [10][12][14]. Group 2: Major Product Releases - SenseTime launched the "Wuneng" embodied intelligence platform, enabling robots to understand and interact with their environments effectively [17][18]. - Alibaba announced the development of its first self-developed AI glasses, integrating various functionalities and aiming to enhance user experience [19]. - Tencent released the "Hunyuan 3D World Model," which simplifies 3D scene construction and allows users to generate 360-degree scenes from text or images [20][21]. Group 3: Competitive Landscape - MiniMax and Yuezhi Anmian are competing for dominance in the open-source model community, with both claiming significant achievements in their respective model rankings [8][9]. - The focus of major model companies has shifted towards professional developers rather than general consumers, indicating a strategic pivot in their market approach [8][9]. Group 4: Industry Insights - Industry leaders emphasize the importance of high-precision actuators and sensor integration for the successful deployment of robots in real-world applications [26][27]. - The distinction between world models and multimodal models is highlighted, with world models aiming for deeper environmental understanding and proactive interaction capabilities [28]. - The current investment climate in AI is robust, with a notable increase in funding and interest in AI applications, reminiscent of the mobile internet boom from 2009 to 2014 [42].