Vision Language Models
Search documents
Humanoid robots take over CES in Las Vegas as tech industry touts future of AI
CNBC· 2026-01-09 13:00
Core Insights - The CES trade show in Las Vegas showcased advancements in humanoid robots, indicating a significant year for physical artificial intelligence [3][4] - Nvidia announced new vision language models for humanoid robots, highlighting the potential for robots to achieve human-level capabilities [4][5] - The market for general-purpose robotics is projected to reach $370 billion by 2040, with applications in various sectors [7] Company Developments - Nvidia introduced Gr00t, a vision language model for humanoid robots, and emphasized partnerships with companies like Boston Dynamics and Caterpillar [4][5] - AMD showcased the GENE.01 robot, which utilizes its chips and AI technology, and plans to deploy it in industrial settings [10] - Qualcomm presented a new line of robot chips called Dragonwing, aimed at enhancing robot capabilities through vision language models [14] Industry Trends - The humanoid robotics sector is experiencing rapid growth, with 40 companies mentioning humanoid robots at CES [9] - Generative AI technologies, such as those used in ChatGPT, are being leveraged to enhance robot functionalities [6][13] - Experts caution that while humanoid robots are gaining attention, practical commercial implementation remains a significant challenge [8][12]
Robotics lab tour with Hannah Fry | Bonus episode!
Google DeepMind· 2025-12-10 16:20
Robotics Advancements - Google DeepMind's robotics research has achieved significant progress in the last four years, particularly in visual generalization, enabling robots to operate effectively in diverse lighting and backgrounds [3][4] - The integration of large vision language models (VLMs) has enabled robots to understand general human concepts and improve their ability to generalize in new scenes, visuals, and instructions [5] - Vision Language Action Models (VALAs) allow robots to model sequences of physical actions, enabling action generalization and longer-horizon tasks, such as packing luggage by checking the weather in the destination [7][9] - Robotics is applying the principle of "thinking before acting," similar to language models, to improve performance in basic manipulation tasks [12][13] Capabilities and Demonstrations - Robots are now capable of performing complex, long-horizon tasks that require millimeter-level precision, such as packing a lunchbox, demonstrating improved dexterity [15][17] - Robots can now perform general tasks by understanding spoken instructions, interacting with novel objects, and chaining together short tasks into longer, more useful sequences [24][29] - Demonstrations include sorting trash according to San Francisco rules and sorting laundry, showcasing the ability to reason and act in complex scenarios [30][33] - The "thinking and acting" model allows robots to output their thoughts before taking action, providing insight into their decision-making process [34][36] Future Directions - The progress in robotics is currently limited by the amount of physical interaction data available, which is not as vast as the data available for language models [45][48] - A major breakthrough is needed to enable robots to learn more efficiently with data, potentially through learning from human manipulation videos [43][46] - The current advancements are considered foundational blocks towards achieving general-purpose robotics, but further development is needed to ensure safety and task mastery [42][45]