具身智能(Embodied AI)
Search documents
全面梳理 VLA 20大挑战的深度综述,方向清晰可见,每周更新,助力时刻掌握最新突破!
AI科技大本营· 2025-12-25 01:18
【编者按】 Vision-Language-Action(VLA)正在把"看得懂、说得明白、做得出来"的机器人从演示推向真实系统。但模型、数据、范式爆发式增长的同 时,也带来一个现实困境:新入门者不知道从哪里学起,从业者也难以判断该从哪些维度系统性提升能力。这篇由树根科技、三一集团耘创新实验室、伦敦 国王学院、港理工、达姆施塔特工业大学,挪威阿哥德大学,帝国理工大学等单位联合完成的最新综述,给出了一张清晰的"问题全景图"和学习路线,并提 供一个持续更新的在线参考框架。 近期,具身智能(Embodied AI)已成为人工智能与机器人领域最活跃、同时也最具探索空间的前沿方向之一。从类 GPT 机器人助手的演示,到多模 态大模型逐步走向真实机器人平台,"让机器看得见、听得懂、会行动"正从概念验证走向系统化探索。 然而,随着模型规模迅速膨胀、数据集与方法不断涌现,领域内部也愈发显现出一种结构性的困惑:刚进入这一方向的研究者往往难以判断应当从何入 手;而已身处其中的从业者也常常面临一个更具体的问题——究竟该从哪些维度、以什么顺序系统性提升 VLA 的能力?在快速扩张与路径分化并存的当 下,单纯罗列模型与方法已难以提供有 ...
华人博士在英国做出颠覆性人机交互“皮肤”,已在汽车、医疗行业应用
创业邦· 2025-12-20 01:09
以下文章来源于快鲤鱼 ,作者卜松 快鲤鱼 . 创业邦旗下AGI矩阵号,寻找海内外创新性的AGI高成长公司,记录AGI商业领袖的成长轨迹。 创业哲学: 以零为始,大道至简 郭留成的履历 , 是一条典型的精英学霸路线:博士毕业于英国帝国理工学院,硕士毕业于北京大学,后来赴斯坦福大学商学院进修。博士毕业后,郭留成 选择在英国创业。 2015 年的伦敦, AlphaGo 还没战胜李世石, Transformer 架构还要等两年才问世,但敏锐的嗅觉让郭留成意识到, AI 将是下一场风暴的核心。 "当时大家做 AI ,都是不惜代价的。算力不够就加显卡,精度不够就加数据,没人关心功耗,也没人关心成本。"郭留成 说,他 把目光投向了端侧设备 , 他认为, 如果 用 AI 赋能 小型 硬件设备 , 在 日常生活 中 会有很大的 应用 空间。 作者丨 卜松 编辑丨 刘恒涛 图源 丨 触零科技 2025 年 5 月,英国唐宁街 10 号,首相府邸。 作为 TG0 (触零科技)的联合创始人兼 CTO , 郭留成博士 站在" Future Fifty "的 舞台 。这是英国科技界的最高荣誉之一,旨在选拔全英最具潜力的 25 到 50 ...
“木头姐”站队:不是泡沫!AI正在复制互联网的财富爆炸时刻
Jin Shi Shu Ju· 2025-11-26 04:13
Core Viewpoint - The current AI wave is not a bubble but a technological revolution similar to the early internet era, expected to drive global GDP growth to 7% to 8% over the next decade [1][8]. Group 1: AI Bubble Assessment - The market is not in a bubble as there is significant demand for AI products, with around 1 billion AI chatbot users, projected to grow to 4 to 5 billion by the end of the decade [2][3]. - The underlying tools for knowledge workers are expected to become ten times more powerful in the coming years, leading to a 50-fold increase in user capabilities [2]. - Current revenue for AI foundational model companies is approximately $30 billion, with a potential monetization scale of about $1.5 trillion [2]. Group 2: Historical Context and Comparisons - The current situation is compared to the 1995 internet moment, where significant growth potential existed before the market correction [3]. - Historical examples include the cost of sequencing a human genome, which was $2.7 billion and took 13 years, contrasting with today's technological readiness [3]. Group 3: Valuation and Growth Justification - Companies in exciting fields are expected to see their current premiums diminish significantly within five years due to overwhelming revenue growth and profit margin expansion [4]. - Palantir's U.S. commercial revenue growth reached 123%, exceeding aggressive expectations based on cost reduction and scaling [4]. - OpenAI is projected to reach an annualized revenue of approximately $20 billion by the end of this year, potentially growing to $40 to $50 billion next year, and $100 billion by 2027 [5]. Group 4: Major Opportunities in Technology - The largest opportunity lies in embodied AI, with projected revenues from Robotaxi services expected to grow from under $1 billion to $8 to $10 trillion in the next 5 to 10 years [6]. - The software stack's PaaS layer is expected to be as large as the foundational model layer, with companies like Palantir encroaching on SaaS players [6]. Group 5: Market Impact and Investment Strategy - Many non-AI companies are being penalized by the market for not accelerating revenue growth, indicating a shift in market dynamics [7]. - Companies with significant cash reserves are increasing capital expenditures, while those showing revenue growth are being rewarded [7]. - The transportation cost of autonomous trucks is expected to be lower than rail, potentially leading to stranded assets in traditional sectors [7]. Group 6: Future Growth Projections - The market is expected to grow at a compounded annual growth rate of over 10% until the end of the decade, with disruptive innovations growing at rates of 50% [8]. - If the current technological revolution is accurate, actual GDP growth could accelerate to around 5% over the next 5 to 10 years, contributing to global GDP growth of 7% to 8% [8].
DeepMind招募波士顿动力前CTO,哈萨比斯点赞宇树
机器之心· 2025-11-22 07:03
Core Insights - Google DeepMind has hired Aaron Saunders, former CTO of Boston Dynamics, indicating a strategic move into robotics and a notable talent return [2][3][6] - Saunders aims to address foundational hardware issues for achieving AGI's potential in the physical world [3][9] Historical Context - Boston Dynamics is currently owned by Hyundai, which acquired it from SoftBank, who purchased it from Alphabet in 2017 due to a lack of short-term commercialization prospects [6] - The return of a key figure from Boston Dynamics to Google highlights a cyclical relationship in the tech industry, emphasizing the importance of understanding both "brain" and "body" in embodied intelligence [6][9] Industry Shift - Saunders notes a paradigm shift in robotics from high mobility to general operational capabilities, emphasizing the need for robots to perform a wider range of tasks [9] - The focus is on responsibly solving embodied AI challenges through collaboration with partners to overcome hardware limitations [9] Strategic Vision - DeepMind's CEO, Demis Hassabis, envisions Gemini as an operating system for physical robots, akin to Android for smartphones [11][13] - The goal is to create a versatile AI system that can operate across various robotic forms, including humanoid and non-humanoid robots [13] Competitive Landscape - The components and expertise required for building bipedal robots have become more accessible, with companies like Agility Robotics and Figure AI emerging in the market [14] - Chinese company Unitree Technology has surpassed Boston Dynamics in supplying quadrupedal robots for industries like manufacturing and construction [14] Future Outlook - Hassabis expresses confidence in a breakthrough moment for AI-driven robotics in the coming years, with Saunders' return seen as a crucial addition to achieving this vision [15]
ICCV 2025 Highlight | 大规模具身仿真平台UnrealZoo
具身智能之心· 2025-11-13 02:05
Core Insights - The article introduces UnrealZoo, a high-fidelity virtual environment platform designed to enhance research in embodied AI by providing over 100 diverse and realistic 3D scenes [5][12][72] - UnrealZoo aims to address the limitations of existing simulators by offering a flexible and rich training environment that supports various tasks and enhances the adaptability of AI agents in complex, dynamic settings [7][8][72] Summary by Sections Introduction to UnrealZoo - UnrealZoo is developed using Unreal Engine and includes over 100 high-quality, realistic scenes, ranging from indoor settings to large-scale industrial environments [5][12] - The platform features 66 customizable embodied entities, including humans, animals, and vehicles, allowing for diverse interactions and training scenarios [5][12] Purpose and Necessity - The rapid development of embodied AI necessitates a platform that can simulate diverse and high-fidelity environments to improve the adaptability and generalization of AI agents [7][8] - Existing simulators often limit the scope of AI training to specific tasks, hindering the development of agents capable of functioning in unpredictable real-world scenarios [7][8] Features of UnrealZoo - UnrealZoo provides a comprehensive set of tools, including an optimized Python API and enhanced communication protocols, to facilitate data collection, environment customization, and multi-agent interactions [5][48] - The platform supports various tasks such as visual navigation and active target tracking, demonstrating the importance of diverse training environments for improving model generalization [5][72] Experimental Results - Experiments conducted using UnrealZoo highlight the significant impact of environment diversity on the performance and robustness of AI agents, particularly in complex navigation and social interaction tasks [72] - Results indicate that while reinforcement learning methods show promise, there remains a substantial gap between AI agents and human performance in navigating intricate environments [72] Future Directions - The ongoing development of UnrealZoo will focus on expanding the variety of scenes, entities, and interaction tasks to further enhance the capabilities of embodied AI in real-world applications [72]
ICCV 2025 Highlight | 大规模具身仿真平台UnrealZoo
机器之心· 2025-11-11 17:11
Core Insights - UnrealZoo is a high-fidelity virtual environment platform designed to enhance research in embodied AI by providing over 100 diverse and realistic 3D scenes, facilitating various research needs [2][5][9] - The platform has been recognized with a Highlight Award at ICCV 2025, indicating its significance in the field [2] Group 1: Platform Features - UnrealZoo includes more than 100 high-quality, realistic scenes ranging from indoor settings to urban landscapes and natural environments, supporting a wide range of research applications [5][13] - The platform features 66 customizable embodied entities, including humans, animals, vehicles, and drones, allowing for interaction with both the environment and other agents [5][24] - It provides an easy-to-use Python interface and tools for data collection, environment enhancement, and distributed training, optimizing rendering and communication efficiency [7][15][42] Group 2: Research Implications - The platform addresses the limitations of existing simulators by offering a diverse and high-fidelity environment that enhances the adaptability and generalization capabilities of embodied agents in complex, dynamic settings [8][9] - Experiments conducted using UnrealZoo demonstrate the importance of environmental diversity in improving the generalization and robustness of agents, particularly in navigation and social interaction tasks [64][55] - The research highlights the challenges faced by current reinforcement learning and visual-language model-based agents in open-world scenarios, emphasizing the need for further development in these areas [8][64] Group 3: Future Directions - Future work will focus on expanding the variety of scenes, entities, and interaction tasks within UnrealZoo to further support the application of embodied AI in real-world scenarios [64]