Workflow
具身世界模型
icon
Search documents
仅凭"动作剪影",打通视频生成与机器人世界模型!BridgeV2W让机器人学会"预演未来"
机器之心· 2026-02-21 02:57
Core Insights - The article discusses the concept of "embodied world models" that enable robots to simulate future actions before execution, akin to human cognitive processes [2][3] - The introduction of BridgeV2W, a solution developed by a collaboration between a robotics company and the Chinese Academy of Sciences, aims to bridge the gap between video generation models and robotic action representation [2][5] Challenges in Embodied World Models - Three main challenges are identified: 1. The language barrier between robot actions (joint angles and poses) and video generation models (pixels) [6] 2. The variability of actions from different camera angles, which can lead to a drop in prediction quality [7] 3. The need for custom architectures for different robotic platforms, making it difficult to create a unified world model [7] Innovative Solution: Embodiment Masks - BridgeV2W introduces "embodiment masks," which render robot actions as binary silhouettes in video frames, allowing for seamless mapping between action coordinates and pixel space [9][10] - This design effectively addresses the three challenges by providing a natural pixel-level signal that aligns robot actions with video model inputs [15] Experimental Validation - The research team validated BridgeV2W across various settings, demonstrating its robustness in unseen camera viewpoints and scenes [12][13] - The DROID dataset, a large-scale real-world robot operation dataset, showed that BridgeV2W outperformed state-of-the-art methods in key metrics such as PSNR and SSIM [13][14] Downstream Applications - BridgeV2W is not just a video generation model; it has practical applications in strategy evaluation and action planning based on visual goals [20] - The model can simulate different strategies in a world model, significantly reducing the cost of strategy iteration [20] Scalability and Generalization - The model's ability to utilize vast amounts of unannotated human video data allows for scalable training without the need for extensive geometric prior knowledge [21][25] - The architecture of BridgeV2W enables it to benefit from advancements in video generation technology, enhancing its predictive capabilities [25] Future Prospects - The potential for BridgeV2W to evolve as video generation models and training datasets expand is highlighted, suggesting significant advancements in robotic "pre-execution" capabilities [28] - The article posits that the integration of video generation models with embodied masks could lead to a new era of general embodied intelligence in robotics [25][28]
为什么春晚的机器人不“僵”了?具身智能正在经历一场大脑进化
机器人大讲堂· 2026-02-19 00:00
Core Viewpoint - The evolution of humanoid robots is moving from performance in controlled environments to practical applications in real-world scenarios, emphasizing the need for robots to understand and predict physical interactions [5][6][26]. Group 1: Humanoid Robot Performance - The performance of humanoid robots at the Spring Festival Gala has shown significant advancements, with previous years featuring coordinated movements and complex formations [1][2]. - This year's robots demonstrated a level of agility and responsiveness that suggests a breakthrough in their control algorithms and hardware integration [5]. Group 2: Challenges in Real-World Applications - Despite advancements, the transition from staged performances to real-world applications remains challenging, as robots must navigate unpredictable environments like factories and homes [5][6]. - Current humanoid robots lack the ability to understand physical laws, which limits their effectiveness in dynamic settings [13][22]. Group 3: VLA Paradigm and Industry Anxiety - The dominant paradigm for embodied intelligence is the Visual-Language-Action (VLA) model, which is currently highly competitive [7]. - Companies like Ant Group and Horizon are developing advanced VLA models that enhance spatial awareness and adaptability across different robotic configurations [8][10]. Group 4: Transition to World Models - The industry is recognizing the need to evolve from VLA to embodied world models that allow robots to simulate and predict physical interactions [14][15]. - Ant Group's LingBot-World is a notable example, providing a high-fidelity simulation environment for robots to learn and adapt without real-world consequences [16]. Group 5: Impact on Industry Scalability - The shift from action mapping to physical pre-simulation is expected to reduce the data requirements for training new skills significantly, from thousands of examples to just 30-50 [23]. - Robots equipped with predictive capabilities have shown a high success rate in complex tasks, achieving over 91% in multi-task scenarios [24]. Group 6: Conclusion and Future Directions - The journey of humanoid robots is transitioning from mere demonstrations to practical applications, with a focus on understanding physical laws and improving operational capabilities in real-world environments [26][28]. - The ongoing debate about the best approaches for robotic intelligence continues, with various strategies being explored to enhance performance in unpredictable settings [27].
仅凭"动作剪影",打通视频生成与机器人世界模型!BridgeV2W让机器人学会"预演未来"
AI科技大本营· 2026-02-11 06:50
Core Insights - The article discusses the innovative approach of BridgeV2W, which aims to enhance robots' predictive capabilities by bridging the gap between video generation models and embodied world models through the use of embodiment masks [2][4][22]. Group 1: Challenges in Robotic Prediction - Current embodied world models face three major challenges: the language barrier between robot actions and video generation models, the variability of actions from different perspectives, and the need for customized architectures for different robot types [5][6][4]. Group 2: Core Innovations of BridgeV2W - BridgeV2W introduces the concept of embodiment masks, which render robot actions as binary silhouettes in video frames, allowing for seamless mapping between coordinate space and pixel space [8][9]. - The model employs a ControlNet-style bypass injection, integrating the masks as conditional signals into pre-trained video generation models, enhancing their ability to understand robot actions while maintaining strong visual priors [9]. Group 3: Experimental Validation - The research team validated BridgeV2W across various settings, demonstrating its effectiveness with different robot platforms and in unseen scenarios, achieving superior performance metrics compared to state-of-the-art methods [11][12]. - In the DROID dataset, BridgeV2W outperformed existing methods in key indicators such as PSNR and SSIM, particularly excelling in unseen viewpoint tests [12][14]. Group 4: Generalization and Adaptability - The framework allows for cross-embodiment generalization, enabling different types of robots to utilize the same model architecture by simply providing their URDF [13][16]. - The model's adaptability was showcased with the AgiBot-G1 dataset, where it achieved comparable prediction quality to single-arm robots without modifying the model structure [16]. Group 5: Practical Applications - BridgeV2W is not just a model for generating visually appealing videos; it has practical applications in real-world tasks, leveraging vast amounts of unannotated human video data for training [19][20]. - The model can effectively utilize human video data to enhance its training process, demonstrating the potential for scalability and accuracy in robotic applications [21][22]. Group 6: Future Prospects - The article suggests that the capabilities demonstrated by BridgeV2W are just the beginning, with future advancements in video generation models and training data expected to significantly enhance robotic predictive abilities [25].
9点1氪|特朗普宣布下任美联储主席提名人选;广东连续8年成为第一生育大省;“黑白颠周媛”被立案调查
3 6 Ke· 2026-01-31 01:21
Group 1 - Trump nominates Kevin Warsh as the next Federal Reserve Chairman, replacing Jerome Powell [2] - Warsh, who joined the Fed in 2006, initially held a hawkish stance on monetary policy but has shifted to support Trump's tariff policies and faster rate cuts [2] - The nomination process requires Senate approval following the President's selection [3] Group 2 - Guangdong has maintained its position as the top province for births in China for eight consecutive years, with a birth rate of 7.82‰ and a natural population growth of 290,000 in 2025 [3] - The province's GDP reached 14.58 trillion yuan in 2025, accounting for over 10% of the national economy [3] Group 3 - Kimi's overseas revenue has surpassed domestic revenue, with a fourfold increase in global paid users following the release of the K2.5 model [4] - The K2.5 model has gained significant traction, ranking third on Openrouter, behind Claude Sonnet 4.5 and Gemini 3 Flash [4] Group 4 - The latest Forbes China Rich List shows Zhang Yiming, Zhong Shanshan, and Ma Huateng in the top three positions, with Lei Jun ranking tenth with a wealth of $30.4 billion [5] - Jack Ma follows Lei Jun with a wealth of $29.6 billion [5] Group 5 - Sora, an AI video generation app developed by OpenAI, has seen a 45% drop in mobile downloads and a 32% decrease in consumer spending [7] - The app previously reached the top of the Apple App Store but is now experiencing significant user attrition [7] Group 6 - Apple is prioritizing the production of high-end iPhone models for 2026 due to marketing strategy adjustments and supply chain constraints [7] - The company plans to delay the release of standard models [7] Group 7 - Morgan Stanley predicts gold prices could rise to $8,000 to $8,500 in the coming years, driven by retail investors seeking to hedge against stock market declines [7] - However, there are warnings about potential short-term risks due to overbought conditions in gold and silver [7] Group 8 - Xiaomi's SU7 Ultra sales have plummeted to 45 units in December 2025, down from over 3,000 units in March [12] - The model, which was launched in February 2025, is part of Xiaomi's high-end automotive strategy [12] Group 9 - Vanke reported an expected loss of approximately 82 billion yuan for 2025, a significant increase from the previous year's loss of 49.48 billion yuan [22] - The company faces substantial operational pressures and historical burdens [22] Group 10 - 360 expects a net profit of 213 million to 318 million yuan for 2025, marking a turnaround from previous losses [22] - The anticipated profit increase is attributed to gains from long-term equity investments [22] Group 11 - Light Media forecasts a net profit of 1.5 billion to 1.9 billion yuan for 2025, representing a year-on-year growth of 413.67% to 550.65% [23] - The increase is driven by the success of the film "Nezha: Birth of the Demon Child" and related IP operations [23] Group 12 - Adidas reported preliminary revenue growth in the fourth quarter, with revenues reaching 6.08 billion euros, up from 5.97 billion euros the previous year [24][25] - The company plans a stock buyback of 1 billion euros following the revenue increase [25] Group 13 - New Yi Sheng expects a net profit increase of 231.24% to 248.86% for 2025, driven by growth in high-speed product demand [26] - The company anticipates a net profit of 9.4 billion to 9.9 billion yuan [26] Group 14 - Western Gold expects a net profit increase of 46.78% to 69.23% for 2025, attributed to increased sales and prices of gold products [26] - The projected net profit is between 425 million and 490 million yuan [26] Group 15 - Duyue City anticipates a net loss of 2.1 billion to 2.7 billion yuan for 2025, although this is an improvement from the previous year's loss of 2.977 billion yuan [27] - The loss is linked to adjustments in pricing strategies and potential asset impairments [27] Group 16 - Yiteng Medical has completed nearly 100 million yuan in Series A financing, aimed at advancing high-end CT tube development [28] - The funding round was led by Honghui Fund with participation from other investors [28] Group 17 - In Situ New Materials has completed several million yuan in angel round financing, with funds directed towards capacity expansion and technology development [29] - The financing was led by Chery Group's venture capital platform [29]
9点1氪:特朗普宣布下任美联储主席提名人选;广东连续8年成为第一生育大省;“黑白颠周媛”被立案调查
36氪· 2026-01-31 01:21
Group 1 - The article discusses the nomination of Kevin Warsh by President Trump to replace Jerome Powell as the next Chairman of the Federal Reserve, highlighting Warsh's previous hawkish stance on monetary policy and recent support for Trump's tariff policies and accelerated interest rate cuts [3] - Warsh was the youngest member of the Federal Reserve Board when he joined in 2006 and was previously considered for the Chairman position before Powell was ultimately nominated [3] Group 2 - The article mentions the legal process for appointing the Federal Reserve Chairman, which requires the President's nomination to be approved by the Senate [4]
宇树王兴兴:谁能把机器人用的大模型做出来,谁就是全世界最厉害的AI公司和机器人公司;蚂蚁灵波开源具身世界模型LingBot-VA丨AIGC日报
创业邦· 2026-01-31 01:12
Group 1 - Kimi K2.5 has quickly gained popularity, becoming the highest called model on Kilo Code and ranking in the top three globally according to OpenRouter, with overseas revenue surpassing domestic and a fourfold increase in global paid users [2] - Ant Group's LingBot-VA has been open-sourced, introducing a self-regressive video-action world modeling framework that integrates large-scale video generation capabilities with robotic control, allowing robots to "predict and act" simultaneously [2] - DeepMind's AlphaGenome model can decode 98% of the human genome's "dark genome," which is crucial for health, potentially aiding in understanding genetic diseases and improving gene testing for new therapies [2] - Yushu Technology's founder Wang Xingxing stated that the company aims to create a large model for robots, asserting that whoever achieves this will be the leading AI and robotics company globally, with the potential for Nobel Prize recognition [2]
蚂蚁灵波开源具身世界模型LingBot-VA
Xin Lang Cai Jing· 2026-01-30 02:13
Core Insights - Ant Lingbo Technology announced the open-source of the embodied world model LingBot-VA, which introduces a novel autoregressive video-action world modeling framework [1] - The model integrates large-scale video generation capabilities with robotic control, enabling the generation of the "next world state" while simultaneously predicting and outputting corresponding action sequences [1] - This innovation allows robots to "simulate and act" in a manner similar to humans, enhancing their operational efficiency and adaptability [1]
NextX系列:颠覆性技术周报第2期(2025.1.02-2026.01.16):滑铁卢大学提出“加密量子比特克隆”协议,在不违反不可克隆定理的前提下实现量子态可复制性
Investment Rating - The report does not explicitly provide an investment rating for the industry Core Insights - The report highlights significant advancements in various technology sectors, including semiconductors, artificial intelligence, and quantum technology, indicating a robust investment landscape in these areas Summary by Sections 1. Financing Overview - From January 1 to January 16, 2026, there were 296 financing events in the technology sector globally, with 248 occurring domestically and 48 internationally. The leading sectors for domestic financing were advanced manufacturing (137 events), artificial intelligence (63 events), and enterprise services (25 events) [11] 2. IPO Updates - Notable IPOs included: - Zhaoyi Innovation listed on the Hong Kong main board on January 13, 2026, focusing on integrated circuit design with a strong market presence in various chip categories [14][15] - OmniVision Technologies listed on January 12, 2026, as a global fabless semiconductor design company specializing in image sensors and display solutions [17][18] - MiniMax listed on January 9, 2026, as an AI large model company aimed at enhancing productivity through advanced AI technologies [20][21] - Tensu Zhixin listed on January 8, 2026, providing general GPU products and AI computing solutions [23][24] 3. Market Performance Tracking - The report notes a mixed performance in the stock market, with the Shanghai Composite Index declining by 0.45% while the Shenzhen Component Index and the ChiNext Index increased by 1.14% and 1.00%, respectively. The semiconductor index saw a weekly increase of 4.92% [31][32] 4. Advanced Semiconductor Developments - Significant advancements include: - Xi'an University of Electronic Science and Technology's breakthrough in aluminum nitride "ion implantation induced nucleation," addressing thermal bottlenecks in third and fourth-generation semiconductors [38][39] - Wolfspeed's successful production of single-crystal 300 mm silicon carbide wafers, marking a milestone in silicon carbide technology [42][43] - Tsinghua University's progress in pixelated array lithography, enhancing manufacturing capabilities for infrared polarization imaging systems [44][45] 5. Quantum Technology Innovations - Key developments in quantum technology include: - The University of Waterloo's proposal for a "quantum bit cloning" protocol that achieves quantum state replicability without violating the no-cloning theorem [4] - The Weizmann Institute's observation of Aharonov–Bohm interference in quantum Hall states, providing insights into non-Abelian anyons [4]
欢迎具身世界模型&数采相关方向的大佬加入我们!
具身智能之心· 2025-11-05 09:00
Group 1 - The article emphasizes the value of embodied world models, robotic control, and data collection as significant industry directions with certain barriers to entry [2] - The company seeks to collaborate with experts in the field to develop courses or practical projects related to these topics, aiming to provide insights for professionals currently working in these areas [2][3] - Interested individuals with at least one year of industry experience or a publication in a CCF-A level conference are encouraged to participate in the collaboration [3] Group 2 - The company offers competitive salaries and resource sharing for collaborators, with opportunities for part-time involvement [5]
招募世界模型&人形运控&数采相关的合作伙伴!
具身智能之心· 2025-11-02 04:00
Group 1 - The article emphasizes the importance of embodied world models, robotic control, and data collection as valuable directions in the industry, despite existing barriers to entry [2] - The company seeks to collaborate with experts in the field to develop courses or practical projects related to these topics, aiming to provide insights for professionals currently working in these areas [2] - Interested parties are encouraged to contact the company for further consultation regarding course design and presentation materials related to embodied world models, control, and data collection [3] Group 2 - The company is looking for individuals engaged in embodied research who have either published a paper in a CCF A-level conference or possess over one year of industry experience [4] - The company offers competitive salaries and resource sharing, with opportunities for part-time involvement for interested candidates [6] - Specific requirements for collaboration are outlined, indicating a focus on expertise and experience in the relevant fields [7]