Workflow
世界模型
icon
Search documents
震撼,世界模型第一次超真实地模拟了真实世界:谷歌Genie 3昨晚抢了OpenAI风头
3 6 Ke· 2025-08-06 03:17
昨晚十点,谷歌 DeepMind 重磅宣布其 Genie 世界模型系列正式来到了第 3 代。 「Genie 3是我们突破性的世界模型,可以通过单个文本提示词创建交互式、可玩的环境。从照片般逼真的风景到奇幻的境界,可能性无穷无尽。」 相比于前一代 Genie 2 世界模型、使用扩散模型的游戏生成引擎 GameNGen 以及视频生成模型 Veo,最新的 Genie 3 在多个特性上都具有明显优势。 | | GameNGen | Genie 2 | Veo | Genie 3 | | --- | --- | --- | --- | --- | | Resolution | 320p | 360p | 720p to 4K | 720p | | Domain | Game-specific | 3D Environments | General | General | | Control | Game-specific | Limited keyboard / mouse actions | Video-level description* | Navigation; Promptable world events ...
六年来首次!OpenAI发布两款开放权重AI推理模型!奥尔特曼称其为“全球最佳开放模型”
Mei Ri Jing Ji Xin Wen· 2025-08-05 22:57
Core Insights - OpenAI has made a significant move towards open-source models by releasing the GPT-OSS, marking the first time in six years that the company has introduced open-weight models [1][5] Model Details - OpenAI released two open-weight AI inference models on August 5: the gpt-oss-120b with 117 billion parameters, which can be run on a single NVIDIA professional data center GPU, and the gpt-oss-20b with 21 billion parameters, which can operate on consumer-grade laptops with 16GB of memory [3][6] - Both models are released under a permissive Apache 2.0 license, allowing businesses to use them commercially without prior payment or licensing [5] Performance Evaluation - The gpt-oss-120b model performs comparably to OpenAI's o4-mini in core inference benchmarks, while the gpt-oss-20b model matches or exceeds the performance of o3-mini [7] - The gpt-oss-120b model activates 510 million parameters per token, while the gpt-oss-20b activates 3.6 billion parameters, supporting context lengths of up to 128k [6][7] Market Context - OpenAI's release of open-weight models is largely driven by competitive pressure in the market, with the company emphasizing the importance of safety and security in the deployment of these models [12] - Amazon has announced it will offer OpenAI's models on its Bedrock and SageMaker platforms, marking the first time Amazon provides OpenAI products [6] Technical Architecture - Both models utilize advanced pre-training and post-training techniques, focusing on inference efficiency and practicality across deployment environments, employing a mixture of experts (MoE) architecture [6][7] Limitations - The smaller models are noted to produce more "hallucinations" due to their limited world knowledge compared to larger models, with gpt-oss-120b and gpt-oss-20b generating hallucinations in 49% and 53% of questions, respectively [11]
小马智行(PONY):革新交通运输,Robotaxi驶向未来
Soochow Securities· 2025-08-05 13:30
Investment Rating - The report assigns a "Buy" rating for the company, marking its first coverage [1]. Core Insights - The company is positioned as a leader in the Robotaxi sector, expected to benefit from improved policy frameworks, breakthroughs in autonomous driving technology, and cost reductions across the industry. The unit economic model is anticipated to turn positive, enabling rapid scaling and profitability [9][14]. - The company has a strong technical foundation and is actively expanding its market presence both domestically and internationally, with significant partnerships and operational licenses in key cities [9][14]. Summary by Sections 1. Company Overview - The company was established in December 2016 and focuses on providing safe and advanced autonomous driving technology. Its core businesses include autonomous ride-hailing services, autonomous truck logistics, and intelligent driving solutions [14]. - The company launched the first Robotaxi service in China in 2018 and has since achieved significant milestones, including being the first to receive a taxi operating license for autonomous vehicles [14][18]. 2. Financial Projections - Revenue projections for the company are as follows: - 2023: $71.90 million - 2024: $75.03 million - 2025: $77.58 million - 2026: $104.91 million - 2027: $342.42 million - The company is expected to experience a revenue growth rate of 226.39% from 2026 to 2027 [1]. 3. Cost Reduction and Safety Improvements - The company has achieved significant cost reductions in its Robotaxi operations, with the BOM cost decreasing to around 300,000 yuan. This is attributed to mass production and advancements in technology [9][57]. - The safety of the autonomous driving system has been enhanced through a multi-sensor fusion approach, which significantly reduces accident rates compared to human drivers [44][52]. 4. Market Expansion and Partnerships - The company is focusing on expanding its operations in major cities like Beijing, Shanghai, Guangzhou, and Shenzhen, while also pursuing international opportunities in markets such as the United States and Singapore [9][14]. - Strategic partnerships with major players like Uber and local transportation companies are being leveraged to enhance market penetration and operational efficiency [9][14]. 5. Technical Advancements - The company has developed a robust technical framework, including the PonyWorld system, which has generated over 10 billion kilometers of testing data, contributing to the safety and reliability of its autonomous driving solutions [9][14]. - The seventh-generation autonomous driving system is set to enter mass production, further solidifying the company's position in the market [9][14].
Scaling Law再遭质疑:“退化式AI”竟成终局?
Hu Xiu· 2025-08-04 12:14
Group 1 - The large model industry is experiencing a "scaling law" trend, with tech companies and research institutions investing heavily to achieve better model performance through larger data scales [1][2] - Scholars P.V. Coveney and S. Succi warn that the scaling law has significant flaws in improving the predictive uncertainty of large language models (LLMs), suggesting that blindly expanding data may lead to "Degenerative AI," characterized by catastrophic accumulation of errors and inaccuracies [2][4] - The core mechanism supporting LLM learning, which generates non-Gaussian output from Gaussian input, may be the fundamental cause of error accumulation and information disasters [5] Group 2 - Current LLMs exhibit impressive capabilities in natural language processing, but the research team argues that machine learning fundamentally operates as a "black box" and lacks understanding of underlying physics, which limits its application in scientific and social fields [7][9] - Only a few AI tech companies can train large state-of-the-art LLMs, with their energy demands being extremely high, yet performance improvements appear to be limited [10][11] - The research team identifies a low scaling exponent as a root cause of poor LLM performance, indicating that the ability to improve with larger datasets is extremely limited [14] Group 3 - Despite the hype surrounding large models, even advanced AI chatbots produce significant errors, which do not meet the precision standards required in most scientific applications [15][23] - The research team illustrates that even with increased computational resources, accuracy may not improve and could significantly decline once a certain threshold is crossed, indicating the presence of "barriers" to scalability [16][17] - The accuracy of machine learning applications is highly dependent on the homogeneity of training datasets, and issues with accuracy can arise even in homogeneous training scenarios [18][19] Group 4 - The limitations of LLMs in reliability and energy consumption are evident, yet discussions on their technical details are scarce [24] - The tech industry is exploring the use of large reasoning models (LRMs) and agentic AI to enhance output credibility, although these approaches still rely heavily on empirical foundations [25][26] - The research team suggests that a more constructive direction would be to leverage LLMs for generative tasks, guiding uncertainty into exploratory value [27][28] Group 5 - The concept of "Degenerative AI" poses a significant risk, particularly in LLMs trained on synthetic data, leading to catastrophic error accumulation [29][30] - While the current scaling exponent is low but positive, indicating that the industry has not yet entered a phase where more data leads to less information, it is in a stage of "extreme diminishing returns" [32] - The research team emphasizes that relying solely on brute force and unsustainable computational expansion could lead to the reality of Degenerative AI [33][34]
AI大潮下的具身和人形,中国在跟跑还是并跑?
Guan Cha Zhe Wang· 2025-08-03 05:35
Group 1 - The core theme of the discussion revolves around "embodied intelligence" and its significance in the development of humanoid robots and AGI (Artificial General Intelligence) [1][2] - The conversation highlights the advancements in humanoid robots, particularly focusing on companies like Tesla and Boston Dynamics, and their impact on the global robotics landscape [1][2][3] - The panelists discuss China's position in the AI race, questioning whether it is merely following the US or is on the verge of overtaking it [1][2] Group 2 - Midea's entry into humanoid robotics is driven by its existing technological advantages in components and a complete product line, marking a strategic shift from its traditional home appliance business [4][5] - The acquisition of KUKA Robotics in 2016 has allowed Midea to expand its capabilities in industrial technology and automation, serving various sectors including automotive and logistics [4][5] - The discussion emphasizes the importance of application-driven development in humanoid robotics, with Midea exploring both full humanoid and wheeled robots for different use cases [13][15] Group 3 - The panelists from various companies, including Grasping Deep Vision and Zhenge Fund, share insights on the evolution of AI and robotics, focusing on the integration of computer vision and machine learning in their products [5][6][8] - Grasping Deep Vision, as a pioneer in AI computer vision, has developed applications across finance, security, and education, showcasing the versatility of AI technologies [5][6] - Zhenge Fund's investment strategy emphasizes early-stage funding in cutting-edge technology sectors, including AI and robotics, aiming to support innovative startups [6][8] Group 4 - The discussion on humanoid robots highlights the historical context, mentioning significant milestones like Honda's ASIMO and Boston Dynamics' Atlas, and contrasting them with recent advancements in China and the US [8][10] - The panelists note that the complexity of humanoid robots, with an average of 40 joints, poses significant engineering challenges, but advancements in reinforcement learning are simplifying the development process [9][10] - The future of humanoid robots is seen as promising, with expectations of rapid advancements in the next 5 to 10 years driven by technological breakthroughs and application-driven demands [9][10] Group 5 - The conversation touches on the debate between wheeled versus bipedal humanoid robots, with arguments for the practicality of wheeled robots in industrial settings and the necessity of bipedal robots for complex environments [13][16] - The panelists discuss the potential of "super humanoid robots" designed for specific industrial applications, aiming to exceed human efficiency in tasks like assembly and logistics [15][16] - The importance of dexterous hands in humanoid robots is emphasized, with a focus on the trade-offs between complexity, cost, and functionality in various applications [21][25] Group 6 - The concept of "embodied intelligence" is defined as the ability of robots to interact with the physical world, moving beyond traditional control methods to achieve more autonomous decision-making [28][30] - The panelists explore the role of world models and video models in enhancing the capabilities of humanoid robots, suggesting that these models can improve the robots' understanding of dynamic environments [35][39] - Reinforcement learning is highlighted as a crucial component in the development of humanoid robots, with discussions on optimizing reward systems to enhance learning outcomes [41][42]
赛道Hyper | 小鹏机器人中心成立智能拟态部
Hua Er Jie Jian Wen· 2025-08-03 03:44
Core Viewpoint - Xiaopeng Motors has established a new Intelligent Mimetic Department focusing on the multimodal field of robotics, aiming to develop cutting-edge technologies such as embodied intelligent native multimodal large models, world models, and spatial intelligence [1][11]. Group 1: Department Leadership and Structure - The department is led by Ge Yixiao, a notable figure with a strong background in multimodal research, previously serving as a technical expert at Tencent [2]. - Currently, the department has three members and is actively recruiting for positions such as "Research Scientist (Multimodal Direction)" to expand its team [2]. Group 2: Research Directions - The first research direction is the development of embodied intelligent native multimodal large models, which aim to enhance robots' perception and interaction capabilities by processing multiple sensory inputs simultaneously [4][5]. - The second focus is on constructing world models that allow robots to understand the operational rules of their environment, improving their adaptability to new tasks and environments [6][7]. - The third area of research is spatial intelligence, which emphasizes the precise understanding and efficient use of three-dimensional spatial information by robots [7][9]. Group 3: Strategic Value of Multimodal Technology - Xiaopeng Motors has been investing in humanoid robotics for five years and plans to invest up to 100 billion yuan in the future, with a goal to mass-produce L3 humanoid robots by 2026 [10]. - The establishment of the Intelligent Mimetic Department is a critical strategic move for Xiaopeng, as multimodal technology is seen as a core element in enhancing robotic intelligence and expanding application scenarios [11]. Group 4: Technical Challenges - The development of these advanced models faces significant technical challenges, including the need for algorithm optimization, enhanced computational power, and high-quality data acquisition [12]. - The competitive landscape in the robotics field is intense, with many companies and research institutions vying for advancements, making Xiaopeng's focus on multimodal technology a potentially differentiating factor [13].
智元机器人罗剑岚老师专访!具身智能的数采、仿真、场景与工程化~
自动驾驶之心· 2025-08-01 16:03
1. 大家都知道数数据是提升智能燃料,然后传感器又是采集数据的关键,想问一下智元在传感器的研发采 购上有什么规划?如何增加产品数据的使用性? 罗剑岚:我们已与多家传感器供应商展开合作,重点聚焦视觉触觉与高密度传感器的联合研发。同时,我 们正在构建跨平台的数据采集 API,实现任务语义的统一映射,为模型训练提供标准化、可训练的数据输 入。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 具身智能之心受邀参加WAIC 2025智启具身论坛,并有幸采访到了智元机器人首席科学家罗剑岚博 士。以下为采访过程中罗博重点提到和探讨的问题。 具身智能数据讨论 2. 因为你刚才说的世界模型挺有用的,加入世界模型以后,加一些采集数据可以让它变好了,我想知道完 成这一步之后距离应用还有多远,从采集完数据到应用之间还有什么门槛? 罗剑岚:还有性能,机器人的性能要很高,真正变得有用,在你家里,给一个机器人扫地也好,或者装洗 碗机的机器人,要有95%的成功率,在100万家庭里面,这是很难的问题。 3. Sergey Levine他有发过最新的一篇文章,提出了一个Sporks of AGI观点。仿真会阻碍具身智能的scale。 我想知 ...
AI新战场世界模型:中国已经先行一步!
Sou Hu Cai Jing· 2025-08-01 08:14
风险提示:以上内容仅作为作者或者嘉宾的观点,不代表和讯的任何立场,不构成与和讯相关的任何投 资建议。在作出任何投资决定前,投资者应根据自身情况考虑投资产品相关的风险因素,并于需要时咨 询专业投资顾问意见。和讯竭力但不能证实上述内容的真实性、准确性和原创性,对此和讯不做任何保 证和承诺。 和讯自选股写手 本文由 AI 算法生成,仅作参考,不涉投资建议,使用风险自担 股票名称 板块名称 世界模型、AI发展阶段、商汤科技 ...
ChatGPT见顶后,AI新战场世界模型:中国已经先行一步!
老徐抓AI趋势· 2025-07-31 01:03
Core Viewpoint - The article discusses the transition from large language models (LLMs) to "world models" as the next competitive focus in AI, highlighting the limitations of LLMs and the potential of world models to reshape AI's future and drive economic growth [2][5][28]. Summary by Sections AI's Evolution - AI development is categorized into three stages: perceptual AI, generative AI, and embodied AI, with each stage representing significant technological advancements [5][18]. Stage One: Perceptual AI - The breakthrough in perceptual AI occurred in 2012 when Geoffrey Hinton's team surpassed human image recognition accuracy, but its capabilities were limited to recognition without reasoning or cross-domain learning [7][9]. Stage Two: Generative AI - The introduction of the Transformer architecture in 2017 marked a qualitative leap, enabling AI to train on vast amounts of text data, significantly increasing its knowledge base [12][13]. However, this growth is nearing a limit, with predictions that usable internet data for training will peak around 2028 [15]. Stage Three: Embodied AI - The next phase involves embodied AI, where AI learns through interaction with the real world rather than just textual data, necessitating the development of world models [16][18]. What is a World Model? - A world model is a high-precision simulator that adheres to physical laws, allowing AI to learn through trial and error in a virtual environment, significantly reducing the data collection costs associated with real-world training [19][20]. Challenges of World Models - Unlike simple video generation, world models must ensure consistency with physical laws to be effective for training AI, addressing issues like physical inconsistencies in generated scenarios [20][22]. Breakthroughs by SenseTime - SenseTime's "KAIWU" world model allows users to describe scenarios in natural language, generating videos that comply with physical laws, thus revolutionizing training for autonomous driving and robotics [22][24]. Implications of World Models - The shift to world models will change data production methods, enhance training efficiency, and transform industries such as autonomous driving, robotics, manufacturing, healthcare, and education [28]. Future Outlook - The emergence of world models is anticipated to accelerate economic growth, with the potential for a "ChatGPT moment" in the next 1-2 years, driven by unprecedented investment and innovation in the AI sector [28][29].
端到端/大模型/世界模型秋招怎么准备?我们建了一个求职交流群...
自动驾驶之心· 2025-07-30 23:33
Core Viewpoint - There is a growing gap between academic knowledge and practical skills required in the workplace, particularly for job seekers preparing for campus recruitment [1] Group 1: Industry Observations - Many individuals with work experience are exploring opportunities in large models and world models, indicating a shift in industry focus [1] - Traditional regulatory frameworks are being reconsidered as the industry moves towards more embodied approaches [1] Group 2: Community Building - The company aims to create a comprehensive platform that connects talent across the industry, facilitating growth and collaboration [1] - A new community has been established to discuss industry-related topics, including company developments, product research, and job seeking [1] - The community encourages networking among industry peers and aims to provide timely insights into industry trends [1]