具身智能

Search documents
越疆发布全新一代具身智能人形机器人
Mei Ri Jing Ji Xin Wen· 2025-08-11 00:28
每经AI快讯,8月8日,2025世界机器人大会(简称"WRC")在北京盛大开幕。在WRC产品发布会现场, 越疆科技首席科学家郎需林发布全新一代具身智能人形机器人DOBOT AtomⅡ,ATOM具备先进的上下 肢协同控制能力与瞬时反应速度,以及操作能力全面升级,推进行业领先的具身智能技术多场景应用落 地。 ...
A股盘前播报 | 美俄首脑会晤将于15日举行 北京亦庄发布“具身智能机器人十条”
智通财经网· 2025-08-11 00:28
华为将于8月12日发布AI推理领域的突破性技术成果,或能降低中国AI推理对HBM(高带宽内存)技术的 依赖,提升国内AI大模型推理性能,完善中国AI推理生态的关键部分。业内人士表示,当前AI产业已 从"追求模型能力的极限"转向"追求应用价值的最大化",推理成为AI下一阶段的发展重心。 盘前要闻 1、美俄首脑会晤将于15日举行,会谈重点是实现乌克兰长期和平的方案 类型:宏观 情绪影响:正面 特朗普宣布将于8月15日与俄罗斯总统普京在美国阿拉斯加州会晤,讨论乌克兰危机。俄罗斯总统助理 乌沙科夫透露称,普京和特朗普将集中讨论乌克兰危机的长期和平解决方案。 2、华为将发布AI推理领域突破性成果,完善中国AI推理生态关键部分 类型:行业 情绪影响:正面 3、证监会:继续严把发行上市入口关,不会出现大规模扩容的情况 类型:市场 情绪影响:正面 针对包容性增强会否造成IPO大规模扩容的担忧,证监会表示,将继续严把发行上市入口关,做好逆周 期调节,不会出现大规模扩容的情况。当前,全球主要市场都在主动适应科技发展趋势,加大制度机制 创新,在持续吸纳优质企业、改进监管服务同时也反过来促进了二级市场的活跃和走强。 4、北京亦庄发布 ...
马斯克:可能失去特斯拉控制权;何小鹏:听雷军劝,新 P7 做 24 小时耐力测试;传华为将发 AI 推理突破成果 | 极客早知道
Sou Hu Cai Jing· 2025-08-11 00:27
Group 1 - Elon Musk expressed concerns about his control over the company, stating that his current 12.8% shareholding may not be sufficient to maintain his dominant position, especially with the potential production of millions of robots in the future [1] - A post recently indicated that Musk's shareholding was 21.2%, with a suggestion that most of these shares were used as collateral for loans [3] - Musk clarified that he currently has no personal loans secured by Tesla stock and noted that the tax rate on his stock options is close to 45%, resulting in a net increase of only about 4% in voting control [4] Group 2 - Musk aims to hold approximately 25% of the company's shares to have enough influence to guide its development direction [5] - The unemployment rate for recent graduates in computer science in the U.S. is reported to be between 6.1% and 7.5%, which is more than double that of graduates in biology and art history [6] - The widespread application of AI programming technology is gradually eliminating entry-level positions, leading to increased competition in the job market as major tech companies like Amazon, Meta, and Microsoft are laying off employees [6] Group 3 - The CEO of Intel, Chen Lifeng, is expected to visit the White House to discuss potential collaborations between Intel and the U.S. government [6] - A new electric vehicle from XPeng, the P7, is set to undergo a 24-hour endurance test, which is a challenging assessment of its performance [8] - The new XPeng P7 features an 800V high-voltage architecture, with a 10-minute charging capability providing 525 km of range and a maximum range of 820 km [8] Group 4 - The CEO of Yushutech, Wang Xingxing, highlighted the current challenges in the robotics industry, particularly the lack of unified models and architectures, comparing it to the early years before the emergence of ChatGPT [10] - Wang emphasized the need for a unified, end-to-end intelligent robot model and lower-cost, high-lifespan hardware in the next 2-5 years [12] - Apple is reportedly testing a new AI voice control feature for Siri, which aims to allow iPhone users to perform precise operations using only voice commands [14] Group 5 - AOL announced it will officially discontinue its dial-up internet service on September 30, 2025, ending a 34-year history of operation [17] - Despite the prevalence of broadband, some remote areas still lack coverage, with 23.3% of residents in rural areas and 27.7% in tribal areas lacking access to fixed broadband [17]
Genie Envisioner:面向机器人操作的统一世界基础平台
具身智能之心· 2025-08-11 00:14
Core Viewpoint - The article discusses the development of Genie Envisioner, a unified world foundation platform for robotic manipulation, which integrates strategy learning, evaluation, and simulation through a single video generation framework [3][27]. Group 1: Platform Overview - Genie Envisioner is built on a core component called GE-Base, which captures the spatial, temporal, and semantic dynamics of robot interactions [5][27]. - The platform includes GE-Act, a world action model that enables instruction-conditioned strategy reasoning, and GE-Sim, a video world simulator that supports closed-loop execution [6][21]. Group 2: Key Components - GE-Base is a large-scale video diffusion model that accurately captures real-world robot interaction features in a structured latent space [3][27]. - GE-Act utilizes a lightweight decoder with 160 million parameters to provide real-time control capabilities, achieving less than 10ms latency for diverse robotic tasks [15][27]. - GE-Sim constructs a high-fidelity environment for closed-loop strategy development, enhancing the framework's capabilities [21][27]. Group 3: Evaluation Framework - EWMBench is introduced as a standardized evaluation suite to assess the fidelity and utility of video-based world models in real-world robotic operations [23][27]. - The evaluation focuses on visual scene consistency, motion correctness, and semantic alignment, ensuring rigorous assessment of task-oriented scenarios [23][27]. Group 4: Training and Adaptation - The training process for GE-Base involves a large dataset with 1 million instruction-aligned video sequences, enabling robust model performance [11][27]. - GE-Act employs a three-phase training strategy to derive action strategies from the GE-Base model, optimizing for specific tasks and environments [17][19][27]. Group 5: Performance and Contributions - The integration of GE-Base, GE-Act, and GE-Sim has demonstrated superior performance in complex tasks such as fabric folding and packing, showcasing strong generalization capabilities [27]. - The platform establishes a powerful foundation for building general-purpose, instruction-driven embodied intelligence systems [27].
国内首个具身大脑+小脑算法实战全栈教程
具身智能之心· 2025-08-11 00:14
Core Viewpoint - The exploration towards Artificial General Intelligence (AGI) highlights embodied intelligence as a key direction, focusing on the interaction and adaptation of intelligent agents within physical environments [1][6]. Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, driving advancements in embodied brain and cerebellum technologies [3]. - Major domestic companies like Huawei, JD.com, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build an ecosystem for embodied intelligence, while international firms like Tesla and investment institutions in the U.S. are supporting companies like Wayve and Apptronik in autonomous driving and warehouse robotics [5]. Technological Evolution - The development of embodied intelligence has progressed through several stages: - The first stage focused on grasp pose detection, which struggled with complex tasks due to a lack of context modeling [6]. - The second stage involved behavior cloning, allowing robots to learn from expert demonstrations but revealing weaknesses in generalization and performance in multi-target scenarios [6]. - The third stage introduced Diffusion Policy methods, enhancing stability and generalization in task execution through sequence modeling [7]. - The fourth stage, emerging in 2025, explores the integration of VLA models with reinforcement learning and tactile sensing, aiming to overcome limitations in feedback and future prediction capabilities [8]. Product and Market Development - The evolution from grasp pose detection to behavior cloning and advanced VLA models signifies a shift towards intelligent agents capable of performing complex tasks in open environments, leading to a surge in product development across various sectors such as industrial, home, dining, and healthcare [9]. - The demand for engineering and system capabilities is increasing as the industry transitions from research to deployment, necessitating higher engineering standards [12]. Educational Initiatives - A comprehensive curriculum has been developed to assist learners in mastering the full spectrum of embodied intelligence algorithms, covering topics from basic tasks to advanced models like VLA and its integrations [9][12].
聊聊DreamVLA:让机器人先看后想再动
具身智能之心· 2025-08-11 00:14
Core Viewpoint - The article introduces DreamVLA, a new Vision-Language-Action model that enhances robotic decision-making by integrating comprehensive world knowledge, allowing robots to predict dynamic environments and make more accurate action decisions [1][27]. Group 1: Background and Need for Advanced VLA Models - Traditional VLA models directly map visual inputs and language commands to actions, which can lead to interference from irrelevant information in complex environments [3][5]. - DreamVLA addresses this by adding a layer of "thinking" that predicts world knowledge, including dynamic areas, depth information, and semantic features before planning actions [5][27]. Group 2: Model Architecture and Functionality - DreamVLA operates on a "perception-prediction-action" cycle, treating the task as an inverse dynamics problem to derive necessary actions from predicted future states [7][27]. - The model processes three types of inputs: visual images, language commands, and the robot's own state, using dedicated encoders for each [10][14]. Group 3: World Knowledge Prediction - DreamVLA predicts world knowledge, which includes dynamic areas, depth maps, and semantic features, rather than directly predicting actions [11][18]. - Dynamic area prediction utilizes CoTracker to identify moving objects and generate masks that highlight relevant areas while filtering out static backgrounds [12][15]. - Depth prediction estimates the spatial relationships of objects, generating depth maps to assist in obstacle avoidance [13][17]. - Semantic prediction employs DINOv2 and SAM models to extract high-level semantic information, which is then encoded into a unified "world embedding" for action generation [18][22]. Group 4: Action Generation - The action generation component uses a diffusion Transformer to produce future action sequences based on the latent action embedding derived from multi-modal inputs [23][27]. - A structured attention mechanism is implemented to ensure coherent multi-step action reasoning and prevent cross-modal knowledge leakage [19][31]. Group 5: Performance and Validation - DreamVLA achieved an average task completion length of 4.44 in the CALVIN ABC-D benchmark, outperforming previous methods by 3.5%, with a real-world task success rate of 76.7% [25][27]. - Ablation studies confirmed the contributions of various components, demonstrating the model's robustness and generalization capabilities [25][31].
如何做到的?20分钟机器人真机数据,即可跨本体泛化双臂任务
具身智能之心· 2025-08-11 00:14
Core Insights - Vidar represents a significant breakthrough in the field of embodied intelligence, being the first global model to transfer video understanding capabilities to physical decision-making systems [2] - The model innovatively constructs a multi-view video prediction framework that supports collaborative tasks for dual-arm robots, demonstrating state-of-the-art performance while exhibiting significant few-shot learning advantages [2] - The model requires only 20 minutes of real robot data to generalize quickly to new robot bodies, significantly reducing the data requirements compared to industry-leading models [2][6] Group 1 - Vidar is based on a general video model and achieves systematic migration of video understanding capabilities [2] - The model's data requirement is approximately one-eighth of the leading RDT model and one-thousand-two-hundredth of π0.5, greatly lowering the barrier for large-scale generalization in robotics [2] - After fine-tuning, the model can perform multi-view dual-arm tasks effectively, executing commands as instructed [2] Group 2 - The Tsinghua University team proposed a new paradigm to address challenges in embodied intelligence, breaking down tasks into "prediction + execution" [6] - This approach utilizes visual generative models like Vidar to learn target predictions from vast amounts of internet video, while employing task-agnostic inverse dynamics models like Anypos for action execution [6] - The method significantly reduces the dependency on large-scale paired action-instruction data, requiring only 20 minutes of task data to achieve high generalization [6] Group 3 - The presentation includes an overview and demonstration video, discussing the rationale for utilizing video modalities and considering embodied video base models [8] - It covers the training of Vidar and the concept of task-agnostic actions with AnyPos [8] - The speaker, Hengkai Tan, is a PhD student at Tsinghua University, focusing on the integration of embodied large models and multi-modal large models [11]
新央企董事长,登门拜访任正非
第一财经· 2025-08-11 00:13
2025.08. 11 这家新央企由原兵器装备集团实施分立,拥有117家分子公司。主要经营业务有汽车整车及零部件、汽车销售、金融及物流服务、摩托车 等。中国长安汽车集团负责人表示,新央企未来将着力打造智能汽车机器人、飞行汽车、具身智能等新质生产力,探索海陆空立体出行新生 态,并加速全球化发展,加快开拓东南亚、中东非洲、中南美洲、欧亚、欧洲五大区域市场。 长安汽车集团是继中国一汽集团、东风汽车集团后的第三家汽车央企。 来源:长安街知事 微信编辑 | 七三 第一财经持续追踪财经热点。若您掌握公司动态、行业趋势、金融事件等有价值的线索,欢迎提供。 专用邮箱: bianjibu@yicai.com 本文字数:559,阅读时长大约1分钟 封图 | 任正非(中)与朱华荣(右)。图源:@长安汽车董事长 8月9日晚,长安汽车董事长朱华荣发微博表示,他于8日前往深圳,拜访华为公司创始人任正非,围绕产业竞争态势、未来竞争格局等交流 学习。 "任总还就支持长安汽车、阿维塔品牌等提出针对性、指导性意见。任总的视野、格局、睿智、激情,我等感触颇深,受益匪浅,令人敬 佩!" 朱华荣说,他还与徐直军、余承东等华为公司领导进行了交流。 7月 ...
能娱乐能干活!广东硬核科技在世界机器人大会引关注
2 1 Shi Ji Jing Ji Bao Dao· 2025-08-11 00:08
人 观 众与机器人现场互动。受访者供图 有的表演架子鼓,有的当场制作美食,有的现场展示物流分拣等工作……8月9日,在北京举行的2025世 界机器人大会进行至第二天,广东机器人企业所推介的人形机器人继续展示新技能。 边展示边销售,是不少展商的新变化。"展会第一天就有客户当场预定了3台,预计展会期间将售出约20 台,我们全年出货量预计将达到数百台。"深圳大象安泰科技公司董事长蔺德刚说。 机器人迈向商业化落地 "充电1小时,连干8小时",机器人用自己的实力证明自己是当之无愧的"天选打工人"。现场,"大黄 蜂"机器人秀出了可以8小时续航工作的双足人形机器人,同时,双臂协同可以搬运30公斤。 为什么是8小时?因为人的常规工作模式多以"8小时工作制"为基础,人形机器人作为"人机协作"或"替 代人工"的角色,其续航恰好匹配这一节奏。 但机器人早就不满足只工作8小时。深圳优必选公司展示了人形机器人热插拔自主换电系统:无需人工 干预或关机,仅3分钟就能实现自主换电。这让机器人具备了"7×24小时"不间断工作能力。 除了能量越来越足,机器人能力边界也不断拓展。同样来自深圳的逐际动力展示其基于人形机器人平台 的"遥操作系统",可以 ...
【财经早报】003008,拟10派3元
Zhong Guo Zheng Quan Bao· 2025-08-11 00:08
Group 1: Company News - Industrial Fulian reported a revenue of 360.76 billion yuan for the first half of the year, a year-on-year increase of 35.58%, and a net profit of 12.11 billion yuan, up 38.61% [3] - Jinghua New Materials announced a revenue of 0.947 billion yuan for the first half of the year, a year-on-year increase of 10.53%, but a net profit of 3.77 million yuan, down 7.30% [4] - Yanjing Beer reported a revenue of 8.558 billion yuan for the first half of the year, a year-on-year increase of 6.37%, and a net profit of 1.103 billion yuan, up 45.45% [4] - Bawei Storage reported a revenue of 3.912 billion yuan for the first half of the year, a year-on-year increase of 13.70%, but a net loss of 226 million yuan [4] - Kaipu Testing announced a revenue of 0.111 billion yuan for the first half of the year, a year-on-year increase of 3.23%, and a net profit of 40.79 million yuan, up 3.73% [4] - Fangsheng Pharmaceutical's subsidiary received approval for a clinical trial of its innovative traditional Chinese medicine, indicating progress in its R&D efforts [4] - Jiachuan Video announced a change in control, which may impact its future operations and governance [5] - Chunguang Technology plans to invest up to 1 billion yuan in a new project for clean electrical appliances, indicating expansion in its operational capacity [5] - Shiyun Circuit plans to invest 125 million yuan in Shenzhen New Sound Semiconductor, acquiring a 3.8238% stake, which reflects its strategy to enhance its technological capabilities [5] - Wantong Development plans to invest 854 million yuan to acquire a 62.98% stake in Shudu Technology, aligning with its strategy to transition into digital technology [6] Group 2: Industry Insights - The A-share market will see 34 stocks facing unlocks this week, with a total unlock volume of 3.057 billion shares, representing a week-on-week increase of 149.66% [2] - The medical device industry is experiencing significant growth in international business, with many companies seeing high growth rates in overseas markets [7] - The medical device sector is expected to witness a performance turning point in the second half of the year, driven by policy optimizations and improving market conditions [7] - The market lacks a clear main narrative, but sectors like pharmaceuticals and overseas computing are identified as potential high-growth areas [7]