Workflow
世界模型
icon
Search documents
机器人何时能迎来自己的“DeepSeek时刻”?
虎嗅APP· 2025-10-24 09:53
Core Viewpoint - The article discusses the evolution of AI from "cognition" to "action," emphasizing the importance of experience-driven control in achieving practical applications in autonomous driving and robotics [5][6]. Group 1: Experience-Driven Control - The transition from traditional mathematical modeling to experience-driven control is highlighted as essential for real-world applications in complex environments [9][10]. - Experience-driven control allows AI systems to learn from historical data, enabling effective decision-making without precise mathematical models [10][11]. Group 2: Embodied Intelligence - The complexity of embodied intelligence is noted, with a focus on its higher dimensionality compared to autonomous driving, requiring advanced understanding and generalization capabilities [12][14]. - The current state of embodied intelligence is compared to the "DeepSeek moment," indicating that while significant progress has been made, a breakthrough akin to ChatGPT has not yet occurred [15][16]. Group 3: World Models - World models are identified as crucial for enabling robots to understand and interact with the physical world, serving as a foundational element for embodied intelligence [21][25]. - The article outlines three primary uses of world models: facilitating a feedback loop with the robot's brain, generating trajectory data, and integrating physical understanding into robot operations [25][26]. Group 4: Future Directions - The need for world models in the industry is emphasized, particularly for enhancing the generalization capabilities of robots in complex environments [28][31]. - The article suggests that the evolution of world models is still in its early stages, with ongoing developments aimed at improving their application in robotic training and task execution [29][30].
美国AI,踏入“旋转门”
Hu Xiu· 2025-10-23 09:56
创造神话,Sora2只用了一个晚上。 在"邀请码+iOS系统限定+仅开放美加地区"的三重高门槛情况下,Sora上线即刷屏,不到五天就登顶美国App Store应用榜榜首。 生物伦理混乱、历史覆盖现实、人类极限消失。 AI赛道,被Sora2推上新的赛点。 不过这次,全球网友有点过于热情了。 来自好莱坞、任天堂的顶级IP一个接一个地被玩坏,日本动漫也被网友"大杂烩",不光柯南能与路飞打棒球,路飞还能把悟空一拳打 飞。 版权纠纷带来的法律风险自不必说,这么一搞,Sora吸引用户的策略也难免受到影响。 对OpenAI来说,迪士尼们不给版权,它当然也可以不给用户开放经典IP的二创权。 反正盈利已经如此拉胯,虱子多了不怕痒,罐子破了不怕摔。 这次Sora2不仅原生音频,还实现了音画同步和一定的故事性。 至此,全球网友终于集齐TikTok、ChatGPT、Sora三大神技,可以在网络上再造一个虚拟的平行世界。 新手玩家们用Sora2让猫咪开赛车、拖拉机,老选手已经把奥特曼的虚拟形象带到中国,让他在各大视频网站上说相声、搞穿越。 有人在视频里养老虎和恐龙,就有狗在违章驾驶时试图逃逸,还有老人在"单手举老伴"运动后,发现自己是 ...
预见未来,《Al Car的初步畅想与探索实践》白皮书发布
Core Insights - The article discusses the release of the first white paper themed "AI Car" in the automotive industry, which outlines the product definition and key technological foresight for the transition to the AI Car era [3][4]. Group 1: AI Car Definition and Key Technologies - The white paper defines AI Car as a super intelligent entity composed of multiple sub-intelligent agents, including driving, cabin, chassis, and power agents [3][4]. - It emphasizes that AI technology will fundamentally reshape the development paradigm and user experience of smart terminals [3]. - The paper identifies ten key judgments regarding the future of AI Cars, including the transformation of autonomous driving system design logic and capabilities through VLA [3][4]. Group 2: Future Directions and Strategic Implications - AI will enable the formation of a larger end-to-end system combining intelligent driving and chassis, thereby redefining the driving experience [8][9]. - The transition of power batteries towards intelligent battery systems capable of real-time perception and autonomous decision-making is highlighted [9]. - The white paper suggests that the product transformation driven by AI will alter the survival and development logic of enterprises, shifting their strategic goals from "making good cars" to "operating intelligent entities" [10][11]. Group 3: Recommendations for Enterprises - Companies are advised to define the unique personality and value proposition of their intelligent entities to rejuvenate brand identity [10]. - It is recommended that enterprises enhance their data value across the entire process and establish a cross-functional AI development team to ensure systematic research and development of AI Cars [11]. - The white paper proposes that automakers should accelerate the construction of comprehensive ecological resource integration capabilities to strengthen user engagement and create competitive barriers in the AI era [11].
人工最高节省90%,AI制作游戏被批“没有灵魂”
第一财经· 2025-10-22 10:12
Core Viewpoint - The article discusses the significant impact of AI on the gaming industry, particularly in enhancing development efficiency and changing production methods. It highlights the ongoing exploration of AI tools that can streamline game creation processes, potentially reducing the time and cost involved in game development [3][4][5]. Group 1: AI's Impact on Game Development Efficiency - AI tools can save approximately 70% to 80% of the workload in game development, especially in art asset processing, with animation and modeling being the most labor-intensive areas [5]. - The use of AI in animation can reduce the time required for tasks such as skinning from 1.5 to 3.5 days to just 1 to 3 hours, achieving a labor saving of 70% to 90% [5]. - AI can also automate the generation of smooth animations from keyframes, increasing efficiency by 3 to 5 times compared to traditional methods [5][6]. Group 2: Adoption and Implementation of AI Tools - Tencent has reported a 40% reduction in character animation production cycles due to AI tools, with some projects reducing prototype validation time from 2 weeks to 3 days [6]. - Over 50 external companies, including major players in the gaming industry, are currently utilizing Tencent's AI tools, which are also being tested by companies in Japan, South Korea, and Europe [6]. - AI tools are particularly beneficial for small to medium-sized teams, allowing them to achieve results that previously required larger teams [11]. Group 3: Cost Reduction and Production Quality - AI can significantly lower production costs; for example, in high-quality 3D games like "Black Myth: Wukong," AI tools can handle 20% to 30% of secondary resources, potentially saving millions in production costs [7]. - The cost of using AI tools is relatively low compared to human labor, making them an attractive option for game developers [7]. Group 4: Industry Perspectives on AI - There are mixed opinions within the industry regarding AI's ability to replace human creativity, with some believing that AI lacks the "soul" necessary for compelling game design [8][10]. - However, some industry professionals have begun to recognize the potential of AI to enhance creativity and provide new avenues for game development [10][11]. Group 5: Future of Game Development with AI - The integration of AI tools is expected to evolve the game development pipeline without completely disrupting existing workflows [12]. - New technologies, such as Google's interactive world models, are emerging, which could further enhance game development processes by allowing for quicker and more effective communication of game concepts [13][14]. - The future may see a convergence of different AI paths, leading to unique workflows in game development over the next few years [14].
人工最高节省90%,AI制作游戏被批“没有灵魂”
Di Yi Cai Jing· 2025-10-22 09:15
Core Insights - The gaming industry is experiencing significant efficiency improvements due to AI tools, which can reduce art production costs by 20% to 30% in high-budget 3D games, leading to savings of millions [5][6][11] - AI is transforming game development processes, allowing for faster production timelines and reducing the reliance on traditional labor-intensive methods [3][4][10] Group 1: AI Impact on Game Development - AI tools can handle 70% to 80% of the art asset processing workload in game development, particularly in animation and modeling [3][4] - The use of AI in animation can reduce the time required for tasks such as skinning from 1.5 to 3.5 days down to just 1 to 3 hours, achieving a labor savings of 70% to 90% [3][4] - AI-generated animations can enhance efficiency by producing 60 frames of smooth animation from just 5 to 10 keyframes, increasing productivity by 3 to 5 times [3][4] Group 2: Adoption and Usage of AI Tools - Tencent has developed and opened its AI tools to over 50 external companies, including major players in the gaming industry [4][11] - The tools have been successfully implemented in Tencent's internal projects, resulting in a 40% reduction in character animation production cycles [4][11] - Smaller teams are more likely to adopt AI tools, as they can significantly enhance workflow efficiency and reduce production costs [10][11] Group 3: Industry Perspectives on AI - There are mixed opinions within the industry regarding AI's ability to replace human creativity, with some believing AI lacks the "soul" necessary for compelling game design [8][9] - Despite skepticism, some industry professionals have noted AI's surprising advancements in generating engaging narratives and creative content [9][10] - AI is seen as a tool that democratizes game development, enabling smaller teams to achieve results that previously required larger, more experienced groups [10][11] Group 4: Future of AI in Gaming - The integration of AI tools is expected to evolve, with new technologies like interactive world models potentially reshaping game production workflows [11][12] - The gaming industry is likely to see a coexistence of various AI tools for the foreseeable future, as companies explore different approaches to automation and intelligence in game development [12]
智能驾驶深度报告:世界模型与VLA技术路线并行发展
Guoyuan Securities· 2025-10-22 08:56
Investment Rating - The report does not explicitly state an investment rating for the smart driving industry Core Insights - The smart driving industry is experiencing rapid evolution driven by "end-to-end" and "smart driving equity" concepts, with significant growth in both new energy vehicle sales and smart driving functionalities [3][4][9] - The penetration rate of L2-level smart driving in new energy vehicles in China has increased from approximately 7% in 2019 to around 65% by the first half of 2025, indicating a strong correlation between new energy vehicle sales and the adoption of smart driving technologies [9][10] - The smart driving market is projected to exceed 5 trillion yuan by 2030, with a compound annual growth rate driven by technological advancements and increased consumer acceptance [15][16] Summary by Sections 1. "Equity + End-to-End" Accelerating Smart Driving Evolution - The smart driving industry has seen a significant increase in new energy vehicle sales, which has created a positive feedback loop for the adoption of smart driving technologies [9][10] - The penetration of L2-level smart driving features in new energy vehicles has rapidly increased, reflecting the growing consumer acceptance and market expansion of smart driving technologies [9][10] 2. End-to-End Smart Driving Review - The evolution of end-to-end smart driving can be categorized into four main stages, with advancements in perception, decision-making, and control processes [30][32] - The introduction of the "occupancy network" has enhanced environmental perception capabilities, allowing for more accurate and stable decision-making in complex driving scenarios [46][47] 3. VLA Technology Route - The VLA (Vision-Language-Action) model is emerging as a key driver of paradigm shifts in autonomous driving, integrating visual, linguistic, and action modalities into a cohesive framework [70][71] - The VLA model's development is divided into four stages, with significant advancements in task understanding and execution capabilities [76][77] 4. World Model Technology Route - The world model approach emphasizes physical reasoning and spatial understanding, representing a long-term evolution path for smart driving technologies [69][70] - The integration of world models with cloud computing is expected to enhance the iterative optimization of end-to-end smart driving systems [65][66]
特斯拉最新技术分享,FSD核心架构曝光了
3 6 Ke· 2025-10-22 08:00
Core Insights - Tesla has publicly shared its FSD (Full Self-Driving) core architecture at the ICCV conference, indicating a significant development in its autonomous driving technology [1][4] - The presentation by Ashok Elluswamy has sparked discussions about Tesla's potential use of VLA (Vision-Language Architecture) in its systems, amidst an ongoing debate in the industry between VLA and world models [1][7] Technical Developments - The FSD architecture integrates a large neural network capable of processing multimodal inputs, including camera video, navigation data, vehicle motion status, and sound, with outputs that include panoramic segmentation, 3D occupancy networks, and language [6][10] - The architecture's ability to output language information suggests a shift towards a more advanced model capable of understanding and reasoning with long-term data [7][10] Industry Context - The debate between VLA and world models is prominent, with VLA proponents arguing for its ability to leverage vast internet data for knowledge accumulation and reasoning, while world model advocates claim it addresses the core challenges of autonomous driving more effectively [7][10] - The industry is moving towards larger model parameters, with Tesla's upcoming smart driving chip expected to reach 2000 TOPS, indicating a significant increase in computational power and model capabilities [10][12] Recent Updates - The latest FSD update (V14.1.3) includes enhancements for safety and personalization, improving obstacle avoidance and navigation capabilities [12] - Tesla has reintroduced the "Mad Max Mode," which allows for a more aggressive driving style, showcasing the system's adaptability in various driving scenarios [11][14]
哈佛&MIT:AI能预测,但它还解释不了“why”
3 6 Ke· 2025-10-22 00:56
Core Insights - The core question in the field of Artificial General Intelligence (AGI) is whether large language models (LLMs) can learn a "world model" or if they are merely playing a "next word prediction" game [1][2] - A recent experiment by Harvard and MIT tested LLMs using orbital mechanics to determine if they could derive the underlying laws of physics, specifically Newton's laws, from their predictions [2][4] - The results indicated a disconnection between prediction and explanation, as the AI models could accurately predict planetary trajectories but failed to encode the underlying physical laws [4][6] Experiment Design and Findings - The research utilized 10 million simulated solar system coordinate sequences (totaling 20 billion tokens) to train a small Transformer model [4] - The hypothesis was that if the model could make accurate predictions without understanding Newton's laws, it would not possess a complete "world model" [2][4] - The findings showed that while the AI could predict trajectories well, the derived force vectors were chaotic and unrelated to Newton's laws, indicating a lack of a stable guiding framework [6][8] Implications for AI Development - The inability of AI models to maintain consistent errors across different samples suggests they do not possess a stable world model, which is essential for scientific discovery [8][9] - The research highlights a fundamental limitation in current AI models, as they can achieve high accuracy in predictions but lack the capability to construct a reality-based world model [10][11] - Future AI development may require a combination of larger models and new methodologies to enhance understanding and generalization capabilities [12][13] Broader Context - The study reflects a classic scientific debate about whether the essence of science lies in precise predictions or in understanding the underlying reasons for phenomena [12][14] - The quest for AI to evolve from being merely a "prediction machine" to a "thinker" capable of understanding the logic of the world is crucial for its future impact on scientific discovery [14]
从地平线自动驾驶2025年的工作,我们看到了HSD的野心......
自动驾驶之心· 2025-10-22 00:03
Core Insights - Horizon is advancing in the autonomous driving sector by focusing on large-scale production of the new HSD system and reshaping the foundational logic of autonomous driving through cutting-edge research papers [2][3] - The company is transitioning from a technology supplier to a standard-defining entity in the industry, supported by capital influx following its Hong Kong listing [2] Group 1: End-to-End Autonomous Driving - ResAD introduces a normalized residual trajectory modeling framework that simplifies the learning task and enhances model performance, achieving a PDMS score of 88.6 in NAVSIM benchmark tests [8] - CorDriver enhances safety in end-to-end autonomous driving by explicitly defining safe passage areas, resulting in a 66.7% reduction in collision rates with traffic participants [11] - TTOG unifies motion prediction and path planning tasks, demonstrating a 36.06% reduction in average L2 error on the nuScenes dataset [15] - MomAD addresses trajectory prediction consistency and stability issues by introducing momentum mechanisms, showing significant improvements in collision rates and trajectory smoothness [19] - GoalFlow generates high-quality multimodal trajectories by using precise target point guidance, achieving a PDMS score of 90.3 in NavSim benchmark tests [22] - RAD employs a large-scale 3DGS-based reinforcement learning framework to enhance safety, reducing collision rates by three times compared to pure imitation learning methods [26] - DiffusionDrive utilizes a truncated diffusion model for real-time end-to-end autonomous driving, achieving an 88.1 PDMS score and significantly improving planning quality [30] Group 2: Autonomous Driving Scene Generation & World Models - Epona is a self-regressive diffusion world model that achieves high-resolution, long-term future scene generation and trajectory planning, outperforming existing methods in the NuScenes dataset [33] - UMGen generates diverse, multimodal driving scenes, supporting user-controlled scenario generation and demonstrating superior authenticity and controllability compared to existing methods [38] - DrivingWorld constructs a world model for autonomous driving via a video GPT framework, generating high-fidelity videos with strong temporal consistency and structural integrity [41] Group 3: Autonomous Driving VLM & VLA - AlphaDrive integrates reinforcement learning and reasoning into visual language models for high-level planning in autonomous driving, improving planning accuracy by 25.52% compared to standard fine-tuning models [45] - The company has established a community of nearly 4,000 members and over 300 autonomous driving companies and research institutions, focusing on various autonomous driving technology stacks [49]
我们正在寻找自动驾驶领域的合伙人...
自动驾驶之心· 2025-10-22 00:03
Group 1 - The article announces the recruitment of 10 outstanding partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2] - The main areas of expertise sought include large models, multimodal models, diffusion models, end-to-end systems, embodied interaction, joint prediction, SLAM, 3D object detection, world models, closed-loop simulation, and model deployment and quantization [3] - Candidates are preferred from QS200 universities with a master's degree or higher, especially those with significant contributions to top conferences [4] Group 2 - The compensation package includes resource sharing for job seeking, doctoral recommendations, and study abroad opportunities, along with substantial cash incentives and collaboration on entrepreneurial projects [5] - Interested parties are encouraged to add WeChat for consultation, specifying "organization/company + autonomous driving cooperation inquiry" [6]