数据采集

Search documents
成本相差200倍!遥操作、仿真、UMI、视频学习,谁才是具身智能数据领跑者?
机器人大讲堂· 2025-10-03 04:04
2025 年具身智能行业投融资热度持续攀升,行业内已形成明确共识:要实现从 L1 特定任务具身智能到 L2 组合任务具身智能的技术跨越,乃至向更高阶通用能力迈进,数据采集是必须突破的核心环节。 不同于语言、图像等低维数据的获取,具身智能需要的是物理世界绝对坐标系下的精确测量数据,其获取难 度、成本投入与标注周期,都远超传统模型训练需求。 具身智能的核心目标,是让机器人在物理世界中具备 "举一反三" 的常识理解能力,面对未曾接触过的物体和 任务,能像人类一样依托过往经验推导操作逻辑。而这一能力的构建基础,正是高质量、多模态的交互数据。 当前国内头部企业仍处于 L1 前期发展阶段,可完成特定环境下的单工位操作任务;而 π0.5 模型通过融合操 作数据、网络数据、语言指令等多源信息的预训练方式,在家庭真实环境长程任务中的准确率已突破 60%, 接近 L2 水平。 行业普遍认为,预训练是具身智能技术进阶的核心,而预训练效果直接取决于数据的 "量" 与 "质":一方面, L1 级模型已需 1 万小时 + 的数据量支撑训练,且 Scaling Law 规模定律在具身智能领域仍未见顶,数据规 模的扩大仍能持续推动模型性能提 ...
王兴兴回应“限制机器人爆发的核心问题”:数据采集处在模糊阶段
Bei Ke Cai Jing· 2025-09-11 05:33
校对 陈荻雁 新京报贝壳财经讯(记者罗亦丹)9月11日上午,宇树科技创始人兼首席执行官王兴兴在外滩大会开幕 式上回应了此前"限制机器人产业爆发的核心不是数据不足,而是模型架构落后"的观点。他表示,数据 和模型都非常重要,"从数据角度看,当前核心问题是难以判断优质数据的标准,优质数据应怎样采 集、采集多大规模都处在模糊阶段,应该尽可能提高对数据的利用率。" 编辑 杨娟娟 ...
自动驾驶转具身智能有哪些切入点?
自动驾驶之心· 2025-08-24 23:32
Core Viewpoint - The article discusses the transition from autonomous driving to embodied intelligence, highlighting the similarities and differences in algorithms and tasks between the two fields [1]. Group 1: Algorithm and Task Comparison - Embodied intelligence largely continues the algorithms used in robotics and autonomous driving, such as training and fine-tuning methods, as well as large models [1]. - There are notable differences in specific tasks, including data collection methods and the emphasis on execution hardware and structure [1]. Group 2: Community and Learning Resources - A full-stack learning community named "Embodied Intelligence Heart" has been established to share knowledge related to algorithms, data collection, and hardware solutions in the field of embodied intelligence [1]. - Key areas of focus within the community include VLA, VLN, Diffusion Policy, reinforcement learning, robotic arm grasping, pose estimation, robot simulation, multimodal large models, chip deployment, sim2real, and robot hardware structure [1].
又帮到了一位同学拿到了VLA算法岗......
具身智能之心· 2025-08-22 16:03
昨天下午有个小朋友,底子还不错,C9即将研三。正在秋招,来找峰哥诉苦,同门找到了VLA算法岗位 (一个特别有钱的具身公司),我想转来不及了......刚开始都是一起做的传统机器人,SLAM相关。后面不 知道他做了什么项目,进度这么快,面试几家都过了。 这两天同门才刚给我推荐你们社区,体系很完整, 就怕有点晚了。 8月份,陆续有同学找到峰哥,不是拿到口头offer,就是想转具身担心来不及。虽然秋招将近, 但还是那 句话,"什么时候都不算太晚。" 尽快把完整的具身路线补齐才是重中之重,特别是数采和算法、仿真等。 如果你没有较强独立学习和搜索问题的能力,可以来我们的具身社区,也是目前国内最大最全的具身学习 平台【具身智能之心】知识星球。 "具身智能之心知识星球"目前集视频 + 图文 + 学习路线 + 问答 + 求职交流为一体,是一个综合类的具身社 区,近2000人了。我们期望未来2年内做到近万人的规模。给大家打造一个交流+技术分享的聚集地,是许 多初学者和进阶的同学经常逛的地方。 社区内部还经常为大家解答各类实用问题:如何使用设备?如何有效采集数据?如何部署VA、VLA模型 等。是采集背景太复杂还是数据比较dirt ...
无人谈论的AI堆栈:数据采集作为基础设施
3 6 Ke· 2025-08-07 07:23
Core Insights - The performance of AI products increasingly relies on data quality and freshness rather than just model size [1][2][3] - Companies like Salesforce and IBM are acquiring data infrastructure firms to enhance their AI capabilities with real-time, structured data [2][5][6] - The definition of "good data" includes being domain-specific, continuously updated, structured, deduplicated, and real-time actionable [4][5][6] Data Infrastructure Importance - Data collection is now seen as a critical infrastructure rather than a secondary task, emphasizing the need for reliable, real-time access to data [2][9][22] - The modern AI data stack has evolved into a value chain that includes data acquisition, transformation, organization, and storage [10][22] - Effective data retrieval quality surpasses prompt engineering, as outdated or irrelevant data can hinder model performance [7][19] Strategic Data Collection - Data collection must be strategic, providing structured and immediate data for AI agents [12][13] - It should handle dynamic user interfaces, CAPTCHAs, and mixed extraction methods to ensure comprehensive data gathering [14][15] - Data collection infrastructure should be scalable and compliant with legal standards, moving beyond fragile scraping tools [16][22] Future of AI Systems - The future of AI performance will depend more on knowledge acquisition speed and context management rather than just model size [23][24] - Companies that view data collection as a foundational capability will likely achieve faster and more cost-effective success [25]
人形机器人也要“进校学习”?数据采集成必答题
2 1 Shi Ji Jing Ji Bao Dao· 2025-07-16 13:53
Core Viewpoint - The scarcity of real-world data is a significant constraint on the development of the embodied intelligence industry, and data collection centers may provide a solution to this issue [1][4]. Group 1: Data Collection Initiatives - Dematech and Zhiyuan Robotics have established the world's first logistics training factory for humanoid robots to collect data in real logistics scenarios [1]. - The Hefei City humanoid robot data collection pre-training site was launched in June, and the Pacini humanoid super data factory began operations this year [1][3]. - The establishment of data collection centers has accelerated since the second half of last year, with companies like Zhiyuan Robotics and Pacini leading the way [3]. Group 2: Data Collection Challenges - The humanoid robots require extensive data for training, with a single scenario potentially needing millions of data points, but the industry lacks high-quality, standardized data [4]. - Two main approaches to overcome the data scarcity have emerged: generating simulation data for training and building large-scale data collection centers for high-quality real-world data [4]. - The industry is currently facing challenges such as hardware solutions not being standardized and the issue of data silos, which increases data collection costs [7][8]. Group 3: Government Involvement - Local governments are also investing in data collection centers, with initiatives like the national and local co-built humanoid robot innovation centers [5]. - Government-led data collection centers typically serve as public service platforms, with collected data being made available to local robot companies once sufficient data is accumulated [5]. Group 4: Market Dynamics - The humanoid robot industry is expected to see significant data collection activity in the next two years, particularly in industrial applications [7]. - A complete data collection solution typically includes robots, hardware, software, cloud data processing services, and model training platforms, with costs ranging from 400,000 to 500,000 yuan [5].
入门具身离不开3个要素,数据+算法+本体
具身智能之心· 2025-06-23 13:54
Core Insights - The article emphasizes the importance of three key elements in embodied intelligence: data, algorithms, and embodiment. Many individuals only understand algorithms, while data collection requires experience and effective strategies [1][2] - The community aims to create a platform for knowledge sharing and collaboration in the field of embodied intelligence, targeting a membership of 10,000 within three years [2][6] Data Collection - Remote operation data collection relies on embodiment and is costly, but preprocessing and postprocessing are simpler, yielding high-quality data suitable for robotic arms [1] - The community provides various data collection strategies and high-cost-performance robotic arm platforms to support research [1][2] Algorithm Development - Common technologies in embodied intelligence include VLN, VLA, Diffusion Policy, and reinforcement learning, which require continuous reading of academic papers to stay updated [1] - The community offers a comprehensive set of learning paths and resources for newcomers and advanced researchers alike [9][12] Hardware and Resources - Well-funded laboratories can purchase high-cost embodiment systems, while those with limited budgets may rely on 3D printing or cost-effective hardware platforms [1] - The community has compiled a list of over 40 open-source projects and nearly 60 datasets related to embodied intelligence, along with mainstream simulation platforms [9][26][28] Community Engagement - The community has established connections with various companies in the field, creating a bridge for academic collaboration, product development, and recruitment [2][6] - Members can access job postings, industry insights, and a supportive environment for learning and networking [5][12] Educational Content - The community provides a wealth of educational materials, including summaries of research papers, books, and learning routes across various topics in embodied intelligence [10][18][20] - Regular discussions and Q&A sessions are held to address common challenges in the field, such as data collection platforms and robot learning techniques [11][12]
机器人数据采集助力智能化进阶
news flash· 2025-06-18 23:29
Core Insights - The Zhiyuan Data Collection Center operates in Shanghai Pudong, enhancing robot intelligence through "data + AI" since its launch in September 2024 [1] - The center has collected over one million high-quality data points covering various real-world scenarios [1] - Zhiyuan Robotics has open-sourced the AgiBot World dataset and released the GO-1 general embodiment base model to improve robot learning efficiency [1] - The Genie Studio platform, launched in April this year, provides a one-stop solution for developers [1] - Zhiyuan Robotics is expected to enter a mass production phase in 2025, aiming for thousands of units shipped commercially [1] - The company has completed a new round of financing to support its intelligence advancement [1]
机器人动捕设备专家
2025-05-20 15:24
Summary of Key Points from the Conference Call Industry Overview - The conference call discusses the robotics motion capture industry, focusing on data collection methods and challenges faced by companies in this sector [1][2][4]. Core Insights and Arguments - **Data Collection Modes**: There are four primary modes of data collection in motion capture systems: 1. Real human motion capture with a physical robot, yielding 30% to 50% effective data but at a high cost. 2. Combination of real motion capture and virtual engines, allowing for 15 to 20 minutes of data collection per day at a lower cost. 3. Pure motion capture systems without physical robots, resulting in a lower effective data ratio. 4. Use of synthetic data for large-scale training, which is currently debated [2][19]. - **Data Validity Measurement**: Validity is assessed through initial human motion verification followed by robot posture validation. There is no industry standard, and the process involves multi-sensor information fusion to ensure reliability [5]. - **Data Collection Efficiency**: The efficiency of data collection is low, with 1,300 seconds of data requiring experienced motion capture experts to work continuously for several days. The main issues are the immaturity of virtual body software and challenges in interacting with real objects [6][3]. - **Cost of Data Collection**: The cost of effective data collection is approximately 300 yuan per second, with repeated data costing around 60 yuan per second. Future projections suggest costs may drop to around 200 yuan in 1-3 years, potentially below 100 yuan with student involvement [3][22]. - **Mapping Challenges**: The primary challenge in motion capture technology is the mapping of human actions to robotic actions. Current solutions often prioritize accuracy over posture, which can lead to discrepancies in execution [7][9]. - **Role of Data Factories**: Establishing data factories can significantly enhance data collection efficiency, allowing for the use of hundreds to thousands of devices to gather extensive data, which is crucial for training algorithms [10]. - **Customer Demand**: The most significant current demand comes from companies like Shiyuan, which has placed a large order of 1,000 sets, while most other companies remain in the verification stage [16]. Other Important but Overlooked Content - **Application Prioritization**: Data collection priorities are determined by customer needs and application scenarios rather than specific actions [11][12]. - **Domestic Companies' Focus**: Major domestic companies are concentrating on data collection in areas such as home services, healthcare, and rescue applications, tailoring their data collection environments accordingly [13]. - **Integration of Data Types**: The integration of motion capture data with force and tactile information is being explored to enhance the capabilities of motion capture devices [18]. - **Challenges in Mapping Solutions**: Companies face challenges in understanding human biomechanics when designing mapping solutions, often outsourcing this work to specialized motion capture firms [25]. - **Future Cost Reduction Strategies**: Cost reduction in data collection can be achieved through bulk production and collaboration with educational institutions to utilize student labor [21].