数据采集

Search documents
自动驾驶转具身智能有哪些切入点?
自动驾驶之心· 2025-08-24 23:32
如果您真的需要,可以关注下我们的公众号,助力学习少踩坑。 这几天很多同学后台私信我们,自动驾驶如何转具身智能?会不会有比较大的gap。从算法维度上看,具身 智能领域基本延续了机器人和自驾的一些算法,比如训练与微调方式、大模型。当然也有很多具体的任务 不太一样,比如数据采集方式、重执行硬件与结构。 我们也创办了一个具身智能全栈学习社区:具身智能之心,平时分享了很多具身智能相关的算法、数据采 集、软硬件方案等。主要方向涉及VLA、VLN、Diffusion Policy、强化学习、机械臂抓取、位姿估计、机 器人仿真、多模态大模型、芯片部署、sim2real、机器人硬件结构等,日常也分享了很多行业与招聘相关内 容。 ...
又帮到了一位同学拿到了VLA算法岗......
具身智能之心· 2025-08-22 16:03
昨天下午有个小朋友,底子还不错,C9即将研三。正在秋招,来找峰哥诉苦,同门找到了VLA算法岗位 (一个特别有钱的具身公司),我想转来不及了......刚开始都是一起做的传统机器人,SLAM相关。后面不 知道他做了什么项目,进度这么快,面试几家都过了。 这两天同门才刚给我推荐你们社区,体系很完整, 就怕有点晚了。 8月份,陆续有同学找到峰哥,不是拿到口头offer,就是想转具身担心来不及。虽然秋招将近, 但还是那 句话,"什么时候都不算太晚。" 尽快把完整的具身路线补齐才是重中之重,特别是数采和算法、仿真等。 如果你没有较强独立学习和搜索问题的能力,可以来我们的具身社区,也是目前国内最大最全的具身学习 平台【具身智能之心】知识星球。 "具身智能之心知识星球"目前集视频 + 图文 + 学习路线 + 问答 + 求职交流为一体,是一个综合类的具身社 区,近2000人了。我们期望未来2年内做到近万人的规模。给大家打造一个交流+技术分享的聚集地,是许 多初学者和进阶的同学经常逛的地方。 社区内部还经常为大家解答各类实用问题:如何使用设备?如何有效采集数据?如何部署VA、VLA模型 等。是采集背景太复杂还是数据比较dirt ...
无人谈论的AI堆栈:数据采集作为基础设施
3 6 Ke· 2025-08-07 07:23
Core Insights - The performance of AI products increasingly relies on data quality and freshness rather than just model size [1][2][3] - Companies like Salesforce and IBM are acquiring data infrastructure firms to enhance their AI capabilities with real-time, structured data [2][5][6] - The definition of "good data" includes being domain-specific, continuously updated, structured, deduplicated, and real-time actionable [4][5][6] Data Infrastructure Importance - Data collection is now seen as a critical infrastructure rather than a secondary task, emphasizing the need for reliable, real-time access to data [2][9][22] - The modern AI data stack has evolved into a value chain that includes data acquisition, transformation, organization, and storage [10][22] - Effective data retrieval quality surpasses prompt engineering, as outdated or irrelevant data can hinder model performance [7][19] Strategic Data Collection - Data collection must be strategic, providing structured and immediate data for AI agents [12][13] - It should handle dynamic user interfaces, CAPTCHAs, and mixed extraction methods to ensure comprehensive data gathering [14][15] - Data collection infrastructure should be scalable and compliant with legal standards, moving beyond fragile scraping tools [16][22] Future of AI Systems - The future of AI performance will depend more on knowledge acquisition speed and context management rather than just model size [23][24] - Companies that view data collection as a foundational capability will likely achieve faster and more cost-effective success [25]
人形机器人也要“进校学习”?数据采集成必答题
2 1 Shi Ji Jing Ji Bao Dao· 2025-07-16 13:53
Core Viewpoint - The scarcity of real-world data is a significant constraint on the development of the embodied intelligence industry, and data collection centers may provide a solution to this issue [1][4]. Group 1: Data Collection Initiatives - Dematech and Zhiyuan Robotics have established the world's first logistics training factory for humanoid robots to collect data in real logistics scenarios [1]. - The Hefei City humanoid robot data collection pre-training site was launched in June, and the Pacini humanoid super data factory began operations this year [1][3]. - The establishment of data collection centers has accelerated since the second half of last year, with companies like Zhiyuan Robotics and Pacini leading the way [3]. Group 2: Data Collection Challenges - The humanoid robots require extensive data for training, with a single scenario potentially needing millions of data points, but the industry lacks high-quality, standardized data [4]. - Two main approaches to overcome the data scarcity have emerged: generating simulation data for training and building large-scale data collection centers for high-quality real-world data [4]. - The industry is currently facing challenges such as hardware solutions not being standardized and the issue of data silos, which increases data collection costs [7][8]. Group 3: Government Involvement - Local governments are also investing in data collection centers, with initiatives like the national and local co-built humanoid robot innovation centers [5]. - Government-led data collection centers typically serve as public service platforms, with collected data being made available to local robot companies once sufficient data is accumulated [5]. Group 4: Market Dynamics - The humanoid robot industry is expected to see significant data collection activity in the next two years, particularly in industrial applications [7]. - A complete data collection solution typically includes robots, hardware, software, cloud data processing services, and model training platforms, with costs ranging from 400,000 to 500,000 yuan [5].
入门具身离不开3个要素,数据+算法+本体
具身智能之心· 2025-06-23 13:54
Core Insights - The article emphasizes the importance of three key elements in embodied intelligence: data, algorithms, and embodiment. Many individuals only understand algorithms, while data collection requires experience and effective strategies [1][2] - The community aims to create a platform for knowledge sharing and collaboration in the field of embodied intelligence, targeting a membership of 10,000 within three years [2][6] Data Collection - Remote operation data collection relies on embodiment and is costly, but preprocessing and postprocessing are simpler, yielding high-quality data suitable for robotic arms [1] - The community provides various data collection strategies and high-cost-performance robotic arm platforms to support research [1][2] Algorithm Development - Common technologies in embodied intelligence include VLN, VLA, Diffusion Policy, and reinforcement learning, which require continuous reading of academic papers to stay updated [1] - The community offers a comprehensive set of learning paths and resources for newcomers and advanced researchers alike [9][12] Hardware and Resources - Well-funded laboratories can purchase high-cost embodiment systems, while those with limited budgets may rely on 3D printing or cost-effective hardware platforms [1] - The community has compiled a list of over 40 open-source projects and nearly 60 datasets related to embodied intelligence, along with mainstream simulation platforms [9][26][28] Community Engagement - The community has established connections with various companies in the field, creating a bridge for academic collaboration, product development, and recruitment [2][6] - Members can access job postings, industry insights, and a supportive environment for learning and networking [5][12] Educational Content - The community provides a wealth of educational materials, including summaries of research papers, books, and learning routes across various topics in embodied intelligence [10][18][20] - Regular discussions and Q&A sessions are held to address common challenges in the field, such as data collection platforms and robot learning techniques [11][12]
机器人数据采集助力智能化进阶
news flash· 2025-06-18 23:29
Core Insights - The Zhiyuan Data Collection Center operates in Shanghai Pudong, enhancing robot intelligence through "data + AI" since its launch in September 2024 [1] - The center has collected over one million high-quality data points covering various real-world scenarios [1] - Zhiyuan Robotics has open-sourced the AgiBot World dataset and released the GO-1 general embodiment base model to improve robot learning efficiency [1] - The Genie Studio platform, launched in April this year, provides a one-stop solution for developers [1] - Zhiyuan Robotics is expected to enter a mass production phase in 2025, aiming for thousands of units shipped commercially [1] - The company has completed a new round of financing to support its intelligence advancement [1]
机器人动捕设备专家
2025-05-20 15:24
Summary of Key Points from the Conference Call Industry Overview - The conference call discusses the robotics motion capture industry, focusing on data collection methods and challenges faced by companies in this sector [1][2][4]. Core Insights and Arguments - **Data Collection Modes**: There are four primary modes of data collection in motion capture systems: 1. Real human motion capture with a physical robot, yielding 30% to 50% effective data but at a high cost. 2. Combination of real motion capture and virtual engines, allowing for 15 to 20 minutes of data collection per day at a lower cost. 3. Pure motion capture systems without physical robots, resulting in a lower effective data ratio. 4. Use of synthetic data for large-scale training, which is currently debated [2][19]. - **Data Validity Measurement**: Validity is assessed through initial human motion verification followed by robot posture validation. There is no industry standard, and the process involves multi-sensor information fusion to ensure reliability [5]. - **Data Collection Efficiency**: The efficiency of data collection is low, with 1,300 seconds of data requiring experienced motion capture experts to work continuously for several days. The main issues are the immaturity of virtual body software and challenges in interacting with real objects [6][3]. - **Cost of Data Collection**: The cost of effective data collection is approximately 300 yuan per second, with repeated data costing around 60 yuan per second. Future projections suggest costs may drop to around 200 yuan in 1-3 years, potentially below 100 yuan with student involvement [3][22]. - **Mapping Challenges**: The primary challenge in motion capture technology is the mapping of human actions to robotic actions. Current solutions often prioritize accuracy over posture, which can lead to discrepancies in execution [7][9]. - **Role of Data Factories**: Establishing data factories can significantly enhance data collection efficiency, allowing for the use of hundreds to thousands of devices to gather extensive data, which is crucial for training algorithms [10]. - **Customer Demand**: The most significant current demand comes from companies like Shiyuan, which has placed a large order of 1,000 sets, while most other companies remain in the verification stage [16]. Other Important but Overlooked Content - **Application Prioritization**: Data collection priorities are determined by customer needs and application scenarios rather than specific actions [11][12]. - **Domestic Companies' Focus**: Major domestic companies are concentrating on data collection in areas such as home services, healthcare, and rescue applications, tailoring their data collection environments accordingly [13]. - **Integration of Data Types**: The integration of motion capture data with force and tactile information is being explored to enhance the capabilities of motion capture devices [18]. - **Challenges in Mapping Solutions**: Companies face challenges in understanding human biomechanics when designing mapping solutions, often outsourcing this work to specialized motion capture firms [25]. - **Future Cost Reduction Strategies**: Cost reduction in data collection can be achieved through bulk production and collaboration with educational institutions to utilize student labor [21].