Workflow
世界模型
icon
Search documents
DeepMind哈萨比斯:智能体可以在Genie实时生成的世界里运行
量子位· 2025-08-13 07:02
Core Insights - The article discusses the advancements in AI, particularly focusing on DeepMind's Genie 3 and its capabilities in creating a "world model" that understands physical laws [4][5][10] - The conversation highlights the rapid development pace at DeepMind, with new releases almost daily, indicating a significant momentum in AI research and applications [9][18][19] - The need for improved evaluation benchmarks for AI models is emphasized, as current models show inconsistent performance across different tasks [11][45][46] Group 1: Genie 3 and World Models - Genie 3 is designed to generate virtual worlds that operate in a realistic manner, aiming to create a comprehensive understanding of the physical world [4][5][33] - The model's ability to generate and interact with its own environments allows for innovative training methods, where one AI operates within another AI's generated world [38][39] - The development of Genie 3 is seen as a step towards achieving AGI, as it requires a deep understanding of physical interactions and behaviors [33][34] Group 2: DeepMind's Development Pace - DeepMind is experiencing a rapid release cycle, with significant advancements in AI technologies such as DeepThink and Gemini [15][19] - The excitement surrounding these developments is palpable, with internal teams struggling to keep up with the pace of innovation [18][19] - The focus on creating models that can think, plan, and reason is crucial for advancing towards AGI [10][25] Group 3: Evaluation and Benchmarking - There is a pressing need for new and more challenging evaluation benchmarks to accurately assess AI capabilities, particularly in understanding physical and intuitive reasoning [45][46] - The introduction of the Kaggle Game Arena aims to provide a platform for testing AI models in various games, which could lead to significant improvements in their performance [41][50] - The article suggests that traditional evaluation methods are becoming saturated, and innovative approaches are necessary to measure AI's cognitive abilities effectively [45][56]
创投月报 | 锡创投:管理20亿低空经济母基金 时隔四年再投3D图形引擎研发商粒界科技
Xin Lang Zheng Quan· 2025-08-13 04:29
Group 1 - The number of newly registered private equity and venture capital fund managers in July 2025 surged by 77.8% compared to June, reaching four times the number in July 2024 [1] - A total of 552 financing events occurred in the domestic primary equity investment market, with a year-on-year growth of 5.1% and a month-on-month increase of 11.7%, disclosing a total financing amount of approximately 71.756 billion yuan, which is a 142.0% increase compared to July 2024 [1] - The average single financing amount reached nearly 130 million yuan, marking the highest point in nearly seven months [1] Group 2 - Xichuang Investment, a state-owned equity investment institution, manages over 240 billion yuan in capital and has invested nearly 90 billion yuan across more than 1,000 companies, focusing on strategic emerging industries such as biomedicine and advanced manufacturing [3] - The Jiangsu Wuxi Low-altitude Economy and Aerospace Industry Special Mother Fund has a registered capital of 2 billion yuan, focusing on low-altitude economy and commercial aerospace sectors [4] - Xichuang Investment disclosed six equity investment events during the reporting period, representing a 200% increase year-on-year, although there was a slight decrease of 25% compared to the previous month [4] Group 3 - Xichuang Investment's investment focus is primarily on early-stage investments, with over 66% of investments in angel and A-round financing [6] - Approximately two-thirds of the projects invested by Xichuang Investment are located in Wuxi, Jiangsu, while one-third are registered in Shanghai [8] - Particle Boundary Technology, a 3D graphics engine provider, completed a multi-million dollar B3 round financing led by Xichuang Investment, which had previously invested in the company during its A3 round [10]
专访星海图赵行:热闹的Demo不等于泛化能力,具身智能胜负仍在数据量
3 6 Ke· 2025-08-13 03:37
Core Insights - The demonstration of bed-making by the robot at the 2025 WRC highlights the complexity of seemingly simple tasks, showcasing the robot's capabilities in flexible object manipulation and full-body control [1][2][4] - The newly released G0 model by the company aims to enhance generalization capabilities in embodied intelligence, moving beyond previous smaller models that struggled with scalability [2][4][11] - The company emphasizes the importance of high-quality data collection and engineering processes to support the development of robust models, with a focus on real-world data [4][19][28] Group 1: Technology and Model Development - The G0 model utilizes a three-stage training framework that has shown a 20% improvement over the previous PI 0 model in average metrics [9][10] - The company plans to open-source a dataset of 500 hours of real-world data to establish a high-quality benchmark for the industry, facilitating comparisons and algorithm validation [5][30] - The focus on data collection involves training personnel and addressing various challenges in real-time data acquisition, which is considered foundational for model training [19][22][24] Group 2: Industry Context and Future Directions - The company believes that the scaling laws observed in large language models can also apply to embodied intelligence, suggesting a potential for significant advancements in the field [14][16] - The VLA paradigm is seen as a primary industrial path, with ongoing exploration of additional technologies such as tactile sensing and world modeling for future applications [32][39] - The collaboration between academia and industry is viewed as beneficial, with the potential for academic insights to drive industrial advancements and vice versa [45][46]
VLA:何时大规模落地
Core Viewpoint - The discussion around VLA (Vision-Language-Action model) is intensifying, with contrasting opinions on its short-term feasibility and potential impact on the automotive industry [2][12]. Group 1: VLA Technology and Development - The Li Auto i8 is the first vehicle to feature the VLA driver model, positioning it as a key selling point [2]. - Bosch's president for intelligent driving in China, Wu Yongqiao, expressed skepticism about the short-term implementation of VLA, citing challenges in multi-modal data acquisition and training [2][12]. - VLA is seen as an "intelligent enhanced version" of end-to-end systems, aiming for a more human-like driving experience [2][5]. Group 2: Comparison of Driving Technologies - There are two main types of end-to-end technology: modular end-to-end and one-stage end-to-end, with the latter being more advanced and efficient [3][4]. - The one-stage end-to-end model simplifies the process by directly mapping sensor data to control commands, reducing information loss between modules [3][4]. - VLA is expected to outperform traditional end-to-end models by integrating multi-modal capabilities and enhancing decision-making in complex scenarios [5][6]. Group 3: Challenges and Requirements for VLA - The successful implementation of VLA relies on breakthroughs in three key areas: cross-modal feature alignment, world model construction, and dynamic knowledge base integration [7][8]. - Current automotive chips are not designed for AI large models, leading to performance limitations in real-time decision-making [9][11]. - The industry is experiencing a "chip power battle," with companies like Tesla and Li Auto developing their own high-performance AI chips to meet VLA's requirements [11][12]. Group 4: Future Outlook and Timeline - Some industry experts believe 2025 could be a pivotal year for VLA technology, while others suggest it may take 3-5 years for widespread adoption [12][13]. - Initial applications of VLA are expected to be in controlled environments, with broader capabilities emerging as chip technology advances [14]. - Long-term projections indicate that advancements in AI chip technology and multi-modal alignment could lead to significant breakthroughs in VLA deployment by 2030 [14][15].
热爆了!中国机器人企业近100万家、融资超240亿,但仍有三大具身智能“非共识”争论
Tai Mei Ti A P P· 2025-08-12 23:25
优必选Walker机器人展示 中国机器人行业真的热爆了。 "人,实在是太多了。"这是今年世界机器人大会上,几乎每个人见面的第一句开场白。30多度高温下, 很多大人带着孩子去展区看,这证明着中国对于机器人赛道,尤其是人形机器人和具身智能赛道关注度 显著增加。 首先,机器人企业规模增长较快。笔者从企查查方面了解到,截至今年8月12日,中国现存机器人相关 企业有95.8万家,接近100万家。其中,2024年注册量为19.32万家,同比增长4.59%;而2025年前7个 月,机器人相关企业的注册量已达15.28万家,同比增长43.81%,大幅超过去年全年新增企业增速。 从地域分布来看,华东地区机器人相关企业占全国的39.64%。产业链方面,中国人形机器人整机平台 超过160家,占据全球50%以上;核心零部件供应链企业逾600家。 其次,融资端火热。今年1-7月,具身智能和机器人领域投资事件数超过200起,融资总额已超过240亿 元,远超过2024年全年总和。预计2025年全年,中国人形机器人市场规模将超过82亿元,占全球的50% 以上。 最后,市场前景广阔,中国正逐步成为全球人形机器人市场焦点。据花旗预测,到2050 ...
拐点已现:"人工智能+"的价值70%来自物联网,AI归位物理世界
3 6 Ke· 2025-08-12 11:07
Core Insights - The recent advancements in AI, particularly with the release of Google’s Genie 3 and OpenAI’s GPT-5, highlight the increasing importance of the Internet of Things (IoT) in driving AI applications and capabilities [1][2] - The prediction that 70% of the value from "Artificial Intelligence+" will ultimately belong to IoT is gaining validation as the AI industry matures [1][19] - IoT is becoming a crucial driver for AI deployment across various sectors, providing 67%-72% of the raw data necessary for AI applications [1][2] AI and IoT Integration - IoT is not just a data collector but a vital bridge for AI to interact with the real world, enabling continuous learning and feedback [2][7] - The latest AI models, such as GPT-5 and Genie 3, are transitioning from relying solely on virtual data to actively perceiving and interacting with the physical world [2][7] - The limitations of large models in virtual environments are prompting a shift towards utilizing real-world data for AI advancements [7][11] Data Quality Over Quantity - The focus is shifting from merely accumulating large datasets to acquiring high-quality, structured data that accurately reflects physical realities [11][12] - "Good data" must be physically authentic, semantically understandable, and capable of covering diverse scenarios to enhance AI's generalization and reasoning abilities [11][12] Evolution of AI Models - The trend of scaling AI models has reached a point where mere increases in parameters and computational power are yielding diminishing returns [5][11] - The emergence of AIoT (Artificial Intelligence of Things) is seen as essential for overcoming the limitations of current AI models and enabling them to operate effectively in complex real-world environments [7][12] Future of AI and Industry - The AI industry is at a pivotal moment where the competition is shifting from model capabilities to integrated platforms that encompass hardware and software solutions [15][16] - AIoT is redefining its role from a simple connectivity tool to a foundational element that empowers physical devices to become intelligent agents [16][18] - The integration of AI and IoT is expected to drive significant advancements in various sectors, leading to a new era of intelligent economic systems [16][19]
理想汽车的VLA“长征”
Jing Ji Guan Cha Wang· 2025-08-12 10:04
Core Insights - The core philosophy of Li Auto's CEO, Li Xiang, emphasizes a long-term approach to success, advocating for patience and resilience in the face of industry challenges [1] - The launch event for the Li Auto i8 highlighted the introduction of the VLA driver model, which reflects the company's commitment to long-term innovation rather than short-term gains [1][3] Group 1: VLA Driver Model - The VLA driver model distinguishes itself from traditional end-to-end architectures by utilizing reinforcement learning to enhance machine understanding of driving decisions [4][11] - The goal for VLA is to significantly improve safety metrics, aiming for an accident rate of one in 600 million kilometers, compared to current figures of 350-400 million kilometers for Li Auto's assisted driving [4][8] - VLA's ability to adapt to individual driving styles through continuous learning is a key feature, allowing for a personalized driving experience [4][8] Group 2: Testing and Efficiency - Li Auto has opted for simulation testing over extensive real-world testing, achieving over 40 million kilometers of simulated driving by mid-2025, with daily peaks of 300,000 kilometers [5][9] - The company has focused on creating a robust simulation environment to address the limitations of real-world testing, which cannot fully replicate extreme driving scenarios [9][10] - The efficiency of VLA's testing process is a critical factor in its development, with a strong emphasis on transforming research and development workflows [5][9] Group 3: Technical Challenges - Li Auto's approach to developing the VLA model involves overcoming significant challenges in data, algorithms, computing power, and engineering capabilities [19] - The company has accumulated 4.3 billion kilometers of assisted driving data and 1.2 billion kilometers of valid feedback data, which are essential for refining the VLA model [9] - The VLA model's architecture is designed to provide logical reasoning capabilities, addressing the shortcomings of traditional end-to-end models [11][12] Group 4: Market Response and Future Goals - The market response to the VLA model has been positive, with a 72.4% trial rate and a 92% satisfaction rate reported for Li Auto's intelligent driving features [8] - Li Auto aims to enhance its MPI takeover mileage to 400-500 kilometers by the end of 2025, with aspirations to reach 1,000 kilometers in the near future [8] - The company's commitment to long-term innovation is reflected in its strategic decisions, prioritizing safety and effective computing power over immediate performance metrics [25][26]
对话星动纪元陈建宇:人形机器人的通途与征途
Huan Qiu Wang Zi Xun· 2025-08-12 10:01
Core Insights - The core viewpoint of the article is that the robotics industry is experiencing a significant convergence towards the "end-to-end" VLA (Vision-Language-Action) paradigm, which is becoming the foundational technology for embodied intelligence [1][2]. VLA Paradigm - The VLA paradigm is defined as a complete closed loop encompassing perception (Vision), understanding (Language), and action (Action), allowing robots to perform tasks in the physical world [2]. - The recent focus on "world models" is seen as an important evolution within the VLA framework, aimed at enhancing robots' precision, generalization, and cognitive abilities [2]. Efficiency and Collaboration - Current humanoid robots still lag behind human efficiency, but there is optimism as some industrial applications have achieved over 70% efficiency compared to humans, with expectations to reach 90% next year [3]. - The end-to-end architecture facilitates real-time feedback and control, breaking the traditional phase delays in recognition, planning, and execution, which is crucial for efficiency improvements [3]. - Deep collaboration between software and hardware is emphasized, with a focus on self-developed dexterous hands that have achieved stable mass production and significant cost reductions [3]. Application Pathway - The pathway to killer applications for humanoid robots is outlined as starting with B-end (business applications) before moving to household applications, with industrial scenarios serving as a necessary phase for technology validation and data accumulation [4]. - The next five years are predicted to be a critical window for the explosion of household robots, with simple forms expected to become widespread and high-net-worth families potentially being the first to adopt general-purpose humanoid robots [4]. Ecosystem Development - The company advocates for a "software defines hardware" approach, where models can adapt to different hardware, but hardware sets the upper limits of model capabilities [5]. - Open-source initiatives are highlighted as a strategic choice, with the company's humanoid robot reinforcement learning framework "Humanoid Gym" and generative large model "VPP" gaining significant attention in the community [5]. - The belief in ecosystem co-prosperity is emphasized, suggesting that improvements made by others on their work will ultimately benefit the company as well [5]. Future Aspirations - The company continues to strive for world-class achievements, with the founder expressing humility about not yet reaching the set standards [6].
商汤王晓刚:世界模型将加快AI从数字空间进入物理世界,「悟能」想做那个桥梁
机器之心· 2025-08-12 07:34
Core Viewpoint - The article discusses the emergence of embodied intelligence and the significance of the "world model" as a core component in advancing AI towards human-like intelligence, highlighting the competitive landscape in the AI industry as it evolves towards embodied intelligence [1][2]. Industry Developments - Major companies like Google, Huawei, and ByteDance are launching various embodied intelligence platforms and models, indicating a rapid evolution in this field [3]. - SenseTime, leveraging its expertise in computer vision and multi-modal large models, aims to empower the industry through its "Wuneng" embodied intelligence platform, which integrates years of technological accumulation [3][5]. Technical Challenges - The industry faces challenges such as data scarcity, difficulty in large-scale production, and the need for generalization in embodied intelligence applications [5][13]. - The reliance on computer vision expertise is seen as a potential solution to enhance the learning of world models and improve the capabilities of embodied intelligence [14]. World Model Significance - The world model is recognized as a crucial element for predicting and planning in autonomous systems, enabling robots to interact intelligently with their environments [12][17]. - SenseTime's "Kaigu" world model is designed to provide extensive data and facilitate simulation-based learning, significantly reducing data collection costs [17][20]. Platform Features - The "Wuneng" platform offers a comprehensive approach by combining first-person and third-person perspectives for robot learning, enhancing the understanding of robot behavior [27][29]. - The platform aims to address the data challenges in the industry by providing synthetic data and facilitating the development of various robotic applications [26][31]. Future Implications - As embodied intelligence matures, it is expected to transform human-robot interactions and create new social networks involving robots, enhancing their roles in daily life [36][37]. - The integration of embodied intelligence into common environments like homes and workplaces is anticipated to unlock significant value and functionality [39].
WRC 2025聚焦(2):人形机器人临近“CHATGPT时刻” 模型架构成核心突破口
Xin Lang Cai Jing· 2025-08-12 06:33
Core Insights - The humanoid robot industry is on the brink of a "ChatGPT moment," with significant breakthroughs expected within 1-2 years driven by policy and demand [1] - The average growth rate for domestic humanoid robot manufacturers and component suppliers is projected to be between 50-100% in the first half of 2025 [1] - The main challenge in the industry is not hardware but the architecture of embodied intelligent AI models, with the VLA model having inherent limitations [1][4] Short-term Outlook (1-2 years) - The domestic market is expected to maintain rapid growth due to policy subsidies and the expansion of application scenarios, with high visibility of orders for complete machines and core components [2] - Key players like Tesla and Figure AI could accelerate global supply chain division and standardization once they achieve mass production [2] Mid-term Outlook (2-5 years) - The integration of end-to-end embodied intelligent models with world models and RL Scaling Law could become the mainstream architecture, facilitating the transition from prototype to large-scale commercialization [2] - Distributed computing is anticipated to become a critical supporting infrastructure, collaborating with 5G/6G and edge computing providers [2] - Investment opportunities include hardware manufacturers entering the mass production phase, AI companies with video generation world model capabilities, and distributed computing centers and edge cloud service providers [2] Long-term Outlook (5+ years) - If end-to-end embodied intelligence and low-latency distributed computing are realized, the market for household and industrial humanoid robots could expand rapidly, potentially reaching annual shipment volumes in the millions [2] - The focus of competition is expected to shift from technological breakthroughs to cost control and ecosystem development [2] Hardware Status - Current humanoid robot hardware can meet most application needs, although optimization is still required in mass production and engineering [3] AI Model Challenges - The VLA model is considered a "foolproof architecture" but struggles with real-world interactions due to insufficient data, and its effectiveness remains limited even after reinforcement learning training [4] - The video generation/world model approach is seen as more promising, allowing for task simulation before real-world application, which may lead to faster convergence [4] RL Scaling Law - Current reinforcement learning training lacks transferability, requiring new tasks to be trained from scratch, which is inefficient [5] - Achieving a scaling law similar to that of language models could significantly accelerate the learning speed of new skills [5] Distributed Computing Trends - Humanoid robots are limited by size and power consumption, with onboard computing equivalent to a few smartphones [6] - Future developments will rely on localized distributed servers to reduce latency, ensure safety, and lower the cost of individual computing units [6]