RoboBrain
Search documents
2025,中国大模型不信“大力出奇迹”?
3 6 Ke· 2025-12-19 11:06
2025年12月,在腾讯科技HiTechDay上,以《模型再进化:2025,智能重新定义世界》为主题的圆桌论坛,正是围绕大模型进化的深度、维度、效率三条 线索展开。 华中师范大学人工智能教育学部助理教授熊宇轩为嘉宾主持,三位嘉宾北京智源人工智能研究院院长王仲远、面壁智能联合创始人、首席科学家刘知远、 峰瑞资本投资合伙人陈石分别从各自的领域,解读2025对于大模型进化的深入观察。 王仲远指出,大模型的进化正在经历"从Learning from Text到Learning from Video"的质变。视频数据中蕴含了丰富的时空信息与动态交互线索,为模型学 习物理世界动态演变规律提供了关键的数据来源,同时也是当前最容易规模化获取的一类多模态数据,是AI"从数字世界迈向物理世界"的关键桥梁,也为 具身智能(Embodied AI)的爆发提供了构建"世界模型"的底座。 刘知远提出的"密度法则"(Densing Law)认为,如同芯片摩尔定律,AI的未来在于不断提升单位参数内的"智能密度"。他大胆预言,未来的算力格局将 是"云端负责规划,端侧负责做事(执行)",到2030年,我们甚至有望在端侧设备上承载GPT-5级别的 ...
具身智能商业化大单“含金量”几何?从业者也看不明白
Nan Fang Du Shi Bao· 2025-11-23 05:50
今年下半年以来,具身智能机器人行业连续宣布亿元级商业化大单,营造出一派乐观的落地前景。但也 有从业者直言,看不懂一些订单背后的虚实。 在11月20日的智源研究院"具身开放日"上,原力灵机创始人兼CEO唐文斌抛出诸多疑问:"这些订单它 到底解决了什么问题?真的(商业)闭环了吗?它创造的场景价值是真实的吗?"原力灵机是一家成立 于2025年3月的具身智能初创公司,11月中旬刚完成由阿里巴巴领投的数亿元A+轮融资。 王仲远建议,政府层面应更多从政策上给予支持与引导,避免直接提需求,因为真正的需求始终来自企 业和用户侧。 具身智能模型"难产"背后,数据短缺是一个老生常谈的问题。业内为此爆发了至今仍在持续的真机数据 与仿真数据路线之争。 智源研究院的具身训练场。图:智源研究院 尽管技术仍不成熟,但具身智能公司在今年纷纷发力商业化布局。这背后,既有投资人对创业公司"造 血"能力或跑通商业闭环能力的考验压力,同时也源于机器人企业在真实场景中发现问题、迭代产品的 现实需求。从应用场景来看,众多公司集中涌入工业和物流领域的搬运、分拣、安防,以及商用领域的 导览、导购和文娱表演等方向。 机器人能力的有限性,也在李凯的预期之中。作 ...
100亿都不够烧!机器人公司CEO们给出新判断:具身智能不能再照搬LLM
Sou Hu Cai Jing· 2025-11-22 02:41
Core Insights - The event highlighted the latest advancements in embodied intelligence by the Zhiyuan Research Institute, focusing on the importance of world models and the development of a comprehensive embodied brain system [2][3] Group 1: Zhiyuan's Full-Stack Layout - Zhiyuan introduced the native multimodal world model Emu3.5, which expanded training data from 15 years of video to 790 years and increased parameter size from 8 billion to 34 billion, enhancing video and image generation speed [5] - The institute is constructing a cross-heterogeneous ontology embodied intelligence system, including RoboBrain, RoboOS, and RoboBrain-0, deployed across various robotic forms for tasks ranging from navigation to complex interactions [5] Group 2: Key Elements of Embodied Intelligence - The role of world models in embodied intelligence was debated, with experts emphasizing the need for models that predict the next state based on the robot's form and goals, rather than merely generating videos [7][10] - There is a consensus that embodied intelligence should not follow the current language-first paradigm but rather adopt a structure centered on action and perception [10][12] - The importance of real data was highlighted, with discussions on the necessity of combining real, simulated, and video data for effective learning in robots [15][17] Group 3: Investment Priorities - When asked how to allocate 10 billion, experts prioritized talent acquisition, computational power, and data engines as key investment areas [19][21] - There were differing views on the importance of infrastructure versus model development, with some advocating for a focus on creating a comprehensive data engine for continuous digitalization [21][22] Group 4: Human-like Robots and Hardware Limitations - The debate on whether human-like robots represent the ultimate form of embodied intelligence concluded that neither models nor hardware define each other; rather, the specific application scenarios dictate the requirements [22][24] - Experts suggested that a layered structure for embodied intelligence should be adopted, where higher-level models can be reused across different robotic forms, but lower-level models must be tailored to specific hardware [23][24] Conclusion - The discussions at the event signaled a proactive search for solutions to achieve a closed-loop system in embodied intelligence, emphasizing the need for models, hardware, and scaling to evolve together [24]
VLA的基础模型与大规模训练任务汇总
具身智能之心· 2025-10-08 02:49
Core Insights - The article summarizes several research papers related to Vision-Language-Action (VLA) models and their training strategies, highlighting advancements in embodied intelligence and robotics [2][3][5][7][9][11][13][15][17][19]. Group 1: Training Strategies and Model Improvements - The paper "Training strategies for efficient embodied reasoning" discusses the use of Chain of Thought (CoT) reasoning to enhance the performance and generalization of VLA models, achieving a threefold increase in reasoning speed compared to standard methods [3]. - "CAST: Counterfactual labels improve instruction following in vision-language-action models" introduces a method to generate counterfactual labels, which significantly improves the instruction-following capabilities of VLA models, with a 27% increase in navigation task success rates [5]. - "RoboBrain: A unified brain model for robotic manipulation" presents a new dataset, ShareRobot, which enhances the planning and trajectory prediction capabilities of robots, leading to state-of-the-art performance in various tasks [7]. Group 2: Dataset Development and Evaluation - The "DROID" dataset is introduced as a large-scale, diverse dataset for robot manipulation, containing 76,000 demonstration trajectories collected over 350 hours, which improves performance and generalization of trained strategies [9]. - "ViSA-Flow" proposes a framework for learning from large-scale video data, achieving state-of-the-art performance in robot skill learning, particularly in low-data scenarios [11]. - The "CORTEXBENCH" benchmark evaluates pre-trained visual representations for embodied AI, revealing that no single representation excels across all tasks, but task-specific adaptations can lead to significant performance improvements [13]. Group 3: Generalist Robot Policies and Learning Frameworks - "Effective tuning strategies for generalist robot manipulation policies" identifies key factors influencing the performance of Generalist Manipulation Policies (GMPs) during fine-tuning, establishing a new benchmark for future research [15]. - The "CACTI" framework focuses on scalable multi-task learning in robotic systems, demonstrating effective training across various kitchen tasks in both real and simulated environments [17]. - "R3m: A universal visual representation for robot manipulation" shows that pre-trained visual representations can enhance data-efficient learning in real-world environments, improving task success rates by over 20% compared to training from scratch [19].
对话智源王仲远:具身智能“小组赛”才刚刚开打,机器人需要“安卓”而非 iOS
AI科技大本营· 2025-06-07 09:42
悟道 1.0 发布时,学术界对" 大模型是通往 AGI 的技术路线 "尚未得出统一结论。 现在的具身智能,也处于这个阶段。 作者 | 王启隆 出品丨AI 科技大本营(ID:rgznai100) 大模型的热潮之下,一种微妙的瓶颈感,正成为行业共识。 "过往所说的 '百模大战',更多是大语言模型的竞争," 智源大会前夕, 智源研究院院长王仲远 在 与 CSDN 的对话中,开门见山地指出了问题的核 心,"而大语言模型受限于互联网数据的使用,性能虽然还在提升,但速度已大不如前。" 出路何在?在王仲远看来,AI 要突破天花板,就必须在"读万卷书"(互联网数据)后,去"行万里路"(物理世界)。 这并非孤立的判断。今年三月, 英伟达 CEO 黄仁勋就在 GTC 大会上为 AI 的下半场指明了方向 :打造"AI 工厂",迎接"物理 AI"时代,让 AI 走出屏 幕,与现实世 界交互。 思考趋于一致,行动便接踵而至。6 月 6 日,CSDN 在北京智源大会现场,见证了王仲远在他的主题演讲中给出的答案。如果说 2021 年的"悟道"系列 代表着对技术路径的探索(" 道 "),那么他所揭晓的全新"悟界"系列,则亮明了新的野心——用 ...
智源研究院院长王仲远:多模态大模型会给具身智能带来新变量
Xin Jing Bao· 2025-03-30 10:00
Core Insights - The topic of embodied intelligence is a major focus at the 2025 Zhongguancun Forum, with the introduction of the RoboOS framework and the open-source RoboBrain model [1][3] - Multi-modal large model technology is expected to enhance the intelligence of robots, allowing them to better understand and interact with the physical world [2][3] Group 1: Multi-modal Large Models - Multi-modal large models enable AI to perceive and understand the world through various data types, such as medical imaging and sensor data, facilitating the transition from digital to physical environments [2] - The performance improvement of large language models has slowed due to the exhaustion of available internet text data, necessitating the integration of multi-modal capabilities [2] Group 2: RoboBrain and RoboOS - RoboBrain and RoboOS are designed to support cross-scenario, multi-task deployment and collaboration among different types of robots, enhancing their general intelligence [3] - RoboBrain can interpret human commands and visual inputs to generate actionable plans based on real-time feedback, supporting various robotic configurations [3] Group 3: Industry Development and Challenges - The open-source approach is seen as a key driver for rapid development in the AI industry, allowing for collaboration among hardware, model, and application vendors [4] - Despite the potential of humanoid robots, there are significant challenges in their industrial application, with many still in the early stages of development [5] - The realization of Artificial General Intelligence (AGI) is projected to take an additional 5-10 years, influenced by advancements in embodiment capabilities and data accumulation [5]