具身智能之心
Search documents
从数采方案来看,具身数据的需求有哪些?
具身智能之心· 2025-09-19 16:04
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 当前,具身智能已成为全球的新焦点,如何打造一个通用的本体和大脑是各个创业公司一直努力突破的,更是受到资本和产业界的高度关注。而数采作为基础模 块,是重中之重,好的数据更是很多算法取得效果的基础。 今天就为大家全面梳理下具备研发和产品力的数采领域相关公司,深入分析其技术特点、产品布局和应用场景,为公司提供行业全景图,助力战略决策和业务拓 展。 重点关注 :专注于数据采集设备与解决方案的企业,包括硬件采集设备、软件平台及整体解决方案。 国内公司 星海图 自研数采任务管理平台 :支持任务发布、上传、存储、审核等全流程 可视化管控,高效管理,无惧数据丢失! 一站式数据采集链路 :任务下发 → 采集 → 清洗 → 补采 → 压缩 → 上传 → 审核 → 标注 → 存储 兼容主流算法与格式 :输出格式:rosbag、ARIO、lerobot;适配模型:ACT ...
智源牵头举办的具身大模型挑战赛火热报名中!
具身智能之心· 2025-09-19 16:04
编辑丨 BAAI具身智能 点击下方 卡片 ,关注" 具身智能之心 "公众号 2025 第二届中关村具身智能机器人应用大赛 了解更多信息 欢迎大家踊跃报名参赛! 智源具身智能模型能力挑战赛火热报名中! 本届赛事以 「具身引智 · 应用未来」 为主题,打造一个 汇聚尖端技术与产业应用 的舞台。这里不仅是比拼模型实力的竞技场,更是展示创意与才华的舞台。让我们一起突破边界,提升模型能力,推动具身智能 走 出实验室,走进现实世界,创造真正的价值! 未来已来,等你出发! 指导教师荣誉:有机会获得"智源学者"身份,享受专项科研资金支持 10.23 - 10.24 11.02 - 11.16 11.17 - 11.18 决赛 初赛 真机调试与数据采集 资源支持 真机数据采集、标注一站式平台支持 充足的算力支持 机器人本体设备支持 智源专家全程技术指导 场地与环境保障 奖金与荣誉 单暴道奖金 优胜奖(第4-6名) 三等奖 2万 等奖 5万 二等奖 3万 学生选手福利:有机会获得直通智源研究院实习、入职机会 在智源,你将收获: 真机实战:人形机器人、高性能机械臂、移动操作平台等 顶级算力 & 自由科研:享用智源充足的算力与海量数 ...
NeurIPS 2025 | 人类认知对齐的CogVLA,突破VLA效率与性能瓶颈
具身智能之心· 2025-09-19 05:43
Core Insights - The article discusses the development of a new model called CogVLA, which addresses the efficiency challenges and semantic degradation in Vision-Language-Action (VLA) research, driven by the capabilities of pre-trained Vision-Language Models (VLM) [5][6][10]. Group 1: Background and Challenges - The transition from large models to embodied intelligence faces efficiency dilemmas and semantic degradation, with existing VLA methods often neglecting the semantic coupling between perception, language alignment, and action decoding [5]. - Key challenges include redundant perception, instruction-semantic disconnection, and action incoherence, which hinder the performance of traditional VLA models [6][10]. Group 2: Proposed Solution - CogVLA introduces a cognitive-aligned three-stage design that mimics human multimodal coordination mechanisms, consisting of EFA-Routing, LFP-Routing, and CAtten [12][14]. - EFA-Routing focuses on instruction-driven visual aggregation, LFP-Routing performs semantic pruning in language models, and CAtten ensures semantic consistency and action sequence coherence [16]. Group 3: Experimental Results - CogVLA outperforms advanced models like OpenVLA-OFT and π0, achieving a state-of-the-art (SOTA) success rate of 97.4% on LIBERO while maintaining an 8× visual compression ratio [18]. - The model significantly improves efficiency, with inference time reduced by 2.79 times, throughput increased by 22.54 times, and training costs lowered by 2.49 times compared to OpenVLA [20]. Group 4: Visualization and Performance - Visual analysis demonstrates CogVLA's ability to focus on task-relevant areas in input images, showcasing its human-aligned perception capabilities even in chaotic or unclear scenes [21].
智平方2026年大规模校园招聘来袭!具身算法/开发/仿真等
具身智能之心· 2025-09-19 00:03
Core Insights - The article highlights the advancements in AI and robotics, particularly focusing on the development of the world's first end-to-end VLA technology and the launch of the GOVLA model, which outperforms international benchmarks by 30% [2] - The company is recognized as the only domestic entity to open-source a robot model, showcasing its strong technical capabilities [2] - The introduction of the AlphaBot series demonstrates the company's commitment to creating versatile robots capable of seamless task switching across various scenarios [2] Technology and Innovation - The company has developed a unified technology platform that supports sustainable and compounding intelligent services across multiple real-world applications [2] - It has established partnerships with leading clients in sectors such as automotive manufacturing, semiconductors, biotechnology, and public services [2] Talent and Culture - The company promotes a flat organizational structure that encourages input from all levels, including fresh graduates [5] - It seeks individuals with a strong curiosity, learning ability, and practical skills, emphasizing the importance of collaboration and resilience in facing real-world uncertainties [12] Recruitment and Opportunities - The company is actively hiring for various positions across algorithms, engineering, product management, design, and manufacturing [8][9][10][11] - It offers a clear recruitment process, including online applications, resume screening, and interviews, aimed at attracting top talent from diverse educational backgrounds [13][14]
让机器人「不只是走路」,Nav-R1引领带推理的导航新时代
具身智能之心· 2025-09-19 00:03
Core Viewpoint - The article discusses the introduction of Nav-R1, a new embodied foundation model designed to enhance the reasoning and navigation capabilities of robots in 3D environments, integrating perception, reasoning, and action effectively [5][30]. Group 1: Key Innovations - Nav-R1 utilizes a large-scale dataset called Nav-CoT-110K, which contains approximately 110,000 Chain-of-Thought trajectories, facilitating a stable reasoning and action foundation before reinforcement learning optimization [8][6]. - The model incorporates three types of rewards: Format Reward, Understanding Reward, and Navigation Reward, which ensure structured output, semantic understanding, and path fidelity respectively [10][15]. - The Fast-in-Slow reasoning paradigm is inspired by human cognition, where a fast system handles immediate responses while a slow system manages long-term planning and semantic consistency [11][16]. Group 2: Experimental Results - Nav-R1 demonstrated significant improvements in various navigation tasks, achieving an increase of approximately 8% or more in success rates and path efficiency compared to other advanced methods [14]. - In real-world deployments, Nav-R1 was tested on a mobile robot platform, showing robust performance in navigating complex indoor environments [19][26]. Group 3: Applications and Implications - The model has potential applications in service robots and home assistants, enhancing user experience by enabling robots to navigate cluttered environments and understand commands [31]. - In healthcare settings, Nav-R1 can assist in navigating complex environments safely and reliably, which is crucial for elderly care and medical facilities [32]. - The technology is also applicable in augmented and virtual reality, where virtual agents need to navigate physical spaces effectively [33]. - In industrial and hazardous environments, Nav-R1's robustness and generalization capabilities make it suitable for executing tasks in unknown or dangerous settings [34].
具身的这几个方向,组成了所谓的大小脑算法
具身智能之心· 2025-09-19 00:03
Core Viewpoint - The article discusses the evolution and current trends in embodied intelligence technology, emphasizing the integration of various models and techniques to enhance robotic capabilities in real-world environments [3][10]. Group 1: Technology Development Stages - The development of embodied intelligence has progressed through several stages, starting from grasp pose detection to behavior cloning, and now to diffusion policy and VLA models [7][10]. - The first stage focused on static object grasping with limited decision-making capabilities [7]. - The second stage introduced behavior cloning, allowing robots to learn from expert demonstrations but faced challenges in generalization and error accumulation [7]. - The third stage, marked by the introduction of diffusion policy methods, improved stability and generalization by modeling action sequences [8]. - The fourth stage, beginning in 2025, explores the integration of VLA models with reinforcement learning and world models to enhance predictive capabilities and multi-modal perception [9][10]. Group 2: Key Technologies and Techniques - Key technologies in embodied intelligence include VLA, diffusion policy, and reinforcement learning, which collectively enhance robots' task execution and adaptability [5][10]. - VLA models combine visual perception, language understanding, and action generation, enabling robots to interpret human commands and perform complex tasks [8]. - The integration of tactile sensing with VLA models expands the sensory capabilities of robots, allowing for more precise operations in unstructured environments [10]. Group 3: Industry Implications and Opportunities - The advancements in embodied intelligence are leading to increased demand for engineering and system capabilities, transitioning from theoretical research to practical deployment [10][14]. - There is a growing interest in training and deploying various models, including diffusion policy and VLA, on platforms like Mujoco and IsaacGym [14]. - The industry is witnessing a surge in job opportunities and research interest, prompting many professionals to shift focus towards embodied intelligence [10].
390亿美元,全球具身智能第一估值来了!英伟达持续加注中
具身智能之心· 2025-09-19 00:03
Core Insights - Figure has successfully raised over $1 billion in Series C funding, achieving a post-money valuation of $39 billion, setting a record in the field of embodied intelligence [3][33] - The funding round was led by Parkway Venture Capital, with participation from major investors including Nvidia, Brookfield Asset Management, and Intel Capital [5] - The company aims to expand its humanoid robot manufacturing and deployment in both household and commercial settings [10][22] Funding and Valuation - The Series C funding raised over $1 billion, resulting in a valuation of $39 billion, the highest in the current publicly available information on the embodied intelligence sector [3][33] - Previous funding rounds include a $675 million Series B round in February 2024, which valued the company at $2.6 billion [23] Technological Advancements - Figure has developed the Helix architecture, a visual-language-action model that allows robots to perceive, understand, and act like humans [18][22] - The Helix system consists of two components that communicate with each other, enabling the robot to perform various tasks using a unified model [19] - The latest funding will support the development of the Helix system, including building next-generation GPU infrastructure and advanced data collection projects [22][21] Recruitment and Expansion - Figure is actively recruiting across 13 areas, including AI-Helix and BotQ manufacturing, to support its growth and technological advancements [6] - The company is expanding its humanoid robot production capabilities to assist with household chores and commercial labor tasks [10][22] Market Position - Figure has positioned itself as a leading player in the humanoid robotics sector, especially after parting ways with OpenAI and focusing on developing its proprietary AI models [29][31] - The company has quickly gained attention in the market, with significant advancements in technology and funding, making it a notable competitor in the embodied intelligence landscape [32][33]
VLA的论文占据具身方向的近一半......
具身智能之心· 2025-09-18 04:00
Core Insights - The article emphasizes the significance of Vision-Language-Action (VLA) models in the field of embodied intelligence, highlighting their ability to enable robots to autonomously make decisions in diverse environments, thus breaking the limitations of traditional single-task training methods [1][4]. Industry Development - The embodied intelligence sector is experiencing rapid growth, with teams like Unitree, Zhiyuan, Xinghaitu, and Yinhai General transitioning from laboratory research to commercialization, alongside major tech companies such as Huawei, JD, and Tencent collaborating with international firms like Tesla and Figure AI [3]. Research Opportunities - VLA is identified as a current research hotspot with many unresolved issues, making it a promising area for academic papers. The article mentions the establishment of a specialized VLA research guidance course aimed at helping individuals quickly enter or transition within this field [3][4]. Course Content and Structure - The course focuses on how agents interact effectively with the physical world through a perception-cognition-action loop, covering the evolution of VLA technology from early grasp pose detection to recent models like Diffusion Policy and multimodal foundational models [7][8]. - It addresses core challenges in embodied intelligence, such as cross-domain generalization and long-term planning, and explores how to integrate large language models with robotic control systems [8]. Learning Outcomes - Upon completion, participants are expected to master the theoretical foundations and technical evolution of VLA models, gain proficiency in simulation environments, and develop independent research capabilities [14]. - The course aims to guide students from idea generation to the completion of a high-quality academic paper, ensuring they can identify research opportunities and design effective experiments [10][14].
10000台,特斯拉Optimus Gen3刚刚拿下了全球最大订单!
具身智能之心· 2025-09-18 01:23
Core Insights - Tesla's Optimus Gen3 has secured its first external order for 10,000 units from PharmAGRI, aimed at automating drug production processes for precision and efficiency [1] - Elon Musk invested $10 billion in Tesla stock, linked to a performance-based compensation plan that could unlock $1.2 trillion in stock rewards if 1 million units of Optimus are delivered in the future [1] - The Optimus Gen3+ has demonstrated a 30% efficiency improvement over human labor in Tesla's factories, with future costs potentially dropping below $20,000 per unit, indicating both capability and affordability [2] Summary by Sections - **Order Acquisition** - Tesla's Optimus Gen3 has received a historic order of 10,000 units from PharmAGRI for automating drug production [1] - **Investment and Incentives** - Elon Musk's $10 billion stock purchase is tied to a compensation plan that rewards him significantly if Tesla achieves the delivery target of 1 million Optimus units [1] - **Efficiency and Cost** - The Optimus Gen3+ has been validated to be 30% more efficient than human workers, with a potential cost reduction to below $20,000, highlighting its economic viability [2]
具身智能能力狂飙,安全却滞后?首个安全可信EAI框架与路线图!
具身智能之心· 2025-09-18 00:03
编辑丨机器之心 为了弥合这一关键差距, 上海人工智能实验室和华东师范大学的研究团队 撰写了这篇 Position Paper,旨在为「安全可信具身智能」这一新兴领域建立一个系统性 的理论框架与发展蓝图,推动领域从碎片化研究走向整体性构建。 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 近年来,以人形机器人、自动驾驶为代表的具身人工智能(Embodied Artificial Intelligence, EAI)正以前所未有的速度发展,从数字世界大步迈向物理现实。然而, 当一次错误的风险不再是屏幕上的一行乱码,而是可能导致真实世界中的物理伤害时,一个紧迫的问题摆在了我们面前: 如何确保这些日益强大的具身智能体是安全且值得信赖的? 现实情况是,能力与安全,这两条本应齐头并进的轨道,正出现令人担忧的「脱钩」。如图 1 所示,业界的基础模型在能力上飞速迭代,却普遍忽视了与之匹配的 安全对齐机制;而学术界虽有探索,但研究成果往往零散、不成体系。 图 1: EA ...