Workflow
世界模型
icon
Search documents
从世界模型到VLA再到强化,具身大小脑算法原来是这样的!
具身智能之心· 2025-10-26 04:02
Core Insights - The article discusses the evolution and current state of embodied intelligence, focusing on the roles of the brain and cerebellum in robotics, where the brain handles perception and planning, while the cerebellum is responsible for execution [3][10]. Technical Evolution - The development of embodied intelligence has progressed through several stages, starting from grasp pose detection, moving to behavior cloning, and now advancing to diffusion policy and VLA models [7][10]. - The first stage focused on static object grasping with limited decision-making capabilities [7]. - The second stage introduced behavior cloning, allowing robots to learn from expert demonstrations but faced challenges in generalization and error accumulation [8]. - The third stage, marked by the introduction of diffusion policy, improved stability and generalization by modeling action sequences [8]. - The fourth stage, emerging in 2025, explores the integration of VLA models with reinforcement learning and world models to enhance robots' predictive and interactive capabilities [9][10]. Current Trends and Applications - The integration of VLA with reinforcement learning enhances robots' trial-and-error learning and self-improvement abilities, while the combination with world models allows for future prediction and better planning [10]. - The article highlights the growing demand for embodied intelligence applications across various sectors, including industrial, home, restaurant, and medical rehabilitation, leading to increased job opportunities and research interest in the field [10]. Educational Initiatives - The article outlines a structured learning program aimed at equipping individuals with comprehensive knowledge of embodied intelligence algorithms, including practical applications and real-world projects [11][14]. - The course targets individuals with a foundational understanding of embodied intelligence and aims to bridge the gap between theoretical knowledge and practical deployment [18][24].
Tesla终于分享点东西了,世界模型和闭环评测都强的可怕......
自动驾驶之心· 2025-10-25 16:03
Core Insights - Tesla has shared insights into its architecture, emphasizing the use of a large model and extensive data, which allows for a fixed computation time and high-frequency actions in its Full Self-Driving (FSD) system [5][6]. Group 1: Reasons for End-to-End Approach - The complexity of human driving behavior makes it difficult to define a single evaluation function, leading to challenges in rule-based optimization [8]. - The interface definition between perception, prediction, and planning is problematic, resulting in information loss [8]. - An end-to-end approach is better suited for scalability and addressing long-tail problems [8]. - Fixed computation time based on neural networks reduces latency compared to traditional methods [8]. - Philosophically, reliance on computational power and data is preferred over human experience [8]. Group 2: Challenges of End-to-End Systems - The three main challenges faced by end-to-end systems include evaluation, the curse of dimensionality, and ensuring interpretability and safety [19][20]. - The curse of dimensionality leads to insufficient supervisory signals when transitioning from high-dimensional to low-dimensional spaces [21]. - Ensuring interpretability and safety is crucial, as the model must genuinely understand driving behavior rather than just fitting shortcuts [23]. Group 3: Evaluation Challenges - High-quality datasets cannot solely describe performance through loss metrics, indicating a need for more comprehensive evaluation methods [39]. - Open-loop evaluations cannot replace closed-loop assessments, highlighting the necessity for real-world testing [39]. - Driving behavior is multimodal, requiring evaluation metrics that encompass various driving actions [39]. - One proposed method involves predicting the consequences of actions, potentially using a critic to assess model performance [39]. - Balancing the evaluation dataset is essential for accurate assessments [39]. Group 4: World Model Simulator - Tesla introduced a world model simulator that generates subsequent videos based on real scenarios, indicating a high barrier to entry for this technology [41]. - The simulator allows for replaying previous issues to assess improvements, akin to two-stage simulations [44]. - This technology can also be applied to humanoid robots, enabling reinforcement training and simulation [46].
VLA/世界模型/WA/端到端是宣传分歧, 不是技术路线分歧
理想TOP2· 2025-10-25 05:21
Core Viewpoints - Many people are unaware that there is no universally accepted definition of VLA/world model/end-to-end [1] - Leading autonomous driving companies share more commonalities in their exploration of autonomous driving than the differences portrayed online, with the core being promotional divergence rather than technical route divergence [1][2] - Language plays a significant role in autonomous driving, particularly in long reasoning, user interaction value alignment, and understanding the world [1] - Those who believe that predicting the next token is more than just a probability distribution are more likely to accept that language can understand the world [1] Group 1: VLA/World Model/End-to-End - VLA, world model, and end-to-end all require the ability to generate road video data that appears real, focusing on visual information input and ultimately controlling vehicle actions [2] - The distinction lies in the involvement of language, its depth of participation, and the architectural form it takes, with future language-related tokens potentially being LLM's text tokens or photon tokens [2] - The narrative that VLA and world models represent different technical routes is misleading, as both need to generate a world model and understand the physical world [4] Group 2: End-to-End Definitions - The definition of end-to-end is often debated, with some believing it requires a core framework where input and output are clearly defined [5] - Tesla's approach, which involves visual input and outputting trajectory rather than direct control signals, raises questions about the true nature of their end-to-end definition [5][6] - The output of precise trajectories is preferred over direct control signals, suggesting a more effective design approach [6] Group 3: Tesla's Approach and Future Directions - Tesla's historical context and style suggest that their approach to end-to-end definitions may not have a universally accepted exclusivity [7] - Long-term predictions indicate that AI model inputs and outputs may predominantly involve photons, which could significantly reduce computational loads [10] - The ideal VLA model is defined as having visual or multimodal input, language participation, and ultimately directing actions in a broad sense [11] Group 4: Understanding Language and AI Potential - There are fundamental differences in views regarding LLM, particularly concerning the understanding of predicting the next token [12] - Those who see predicting the next token as more than mere statistics are more inclined to recognize the potential of LLM and AI [12][19] - The ability to predict the next token effectively implies an understanding of the underlying reality that generates the token, which is a deeper question than it appears [18]
CVPR 2026倒计时Day21,冲这个方向简直降维打击!
自动驾驶之心· 2025-10-24 16:03
Core Viewpoint - The article emphasizes the importance of targeted guidance and mentorship for students aiming to publish high-quality research papers in top conferences like CVPR and ICRA, highlighting the need for strategic focus in the final stages of the submission process [2][3]. Group 1: Submission Insights - The current submission volume for CVPR 2026 has exceeded 2000, indicating a competitive landscape similar to ICLR [1]. - Historical trends show that successful submissions often focus on specific breakthroughs and verifiable improvements rather than broad themes, aligning closely with the main topics of the conference [1]. - The anticipated main theme for CVPR 2026 is likely to revolve around "world models," suggesting a strategic direction for potential submissions [1]. Group 2: Mentorship and Guidance - The organization offers specialized mentorship programs aimed at helping students navigate the complexities of research paper writing and submission, particularly for those in the fields of autonomous driving and AI [2][3]. - With over 300 dedicated instructors from top global universities, the organization provides a wealth of academic resources and expertise to assist students in producing high-quality research [3]. - The mentorship program includes personalized guidance through the entire research process, from topic selection to submission, ensuring that students are well-prepared for the rigorous demands of top-tier conferences [11]. Group 3: Student Support and Outcomes - The organization addresses common challenges faced by students, such as lack of guidance, fragmented knowledge, and difficulties in understanding the research process [5]. - Students are encouraged to develop a systematic understanding of both classic and cutting-edge algorithms, enhancing their practical skills and research capabilities [5]. - Successful participants in the program may receive recommendations from prestigious institutions and direct job placements in leading tech companies, emphasizing the program's potential impact on students' academic and professional trajectories [16].
自动驾驶之心合伙人招募!
自动驾驶之心· 2025-10-24 16:03
Group 1 - The article announces the recruitment of 10 outstanding partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2] - The main areas of expertise sought include large models, multimodal models, diffusion models, end-to-end systems, embodied interaction, joint prediction, SLAM, 3D object detection, world models, closed-loop simulation, and model deployment and quantization [3] - Candidates are preferred from QS200 universities with a master's degree or higher, especially those with significant contributions to top conferences [4] Group 2 - The compensation package includes resource sharing for job seeking, doctoral studies, and overseas study recommendations, along with substantial cash incentives and opportunities for entrepreneurial project collaboration [5] - Interested parties are encouraged to add WeChat for consultation, specifying "organization/company + autonomous driving cooperation inquiry" [6]
机器人何时能迎来自己的“DeepSeek时刻”?
虎嗅APP· 2025-10-24 09:53
Core Viewpoint - The article discusses the evolution of AI from "cognition" to "action," emphasizing the importance of experience-driven control in achieving practical applications in autonomous driving and robotics [5][6]. Group 1: Experience-Driven Control - The transition from traditional mathematical modeling to experience-driven control is highlighted as essential for real-world applications in complex environments [9][10]. - Experience-driven control allows AI systems to learn from historical data, enabling effective decision-making without precise mathematical models [10][11]. Group 2: Embodied Intelligence - The complexity of embodied intelligence is noted, with a focus on its higher dimensionality compared to autonomous driving, requiring advanced understanding and generalization capabilities [12][14]. - The current state of embodied intelligence is compared to the "DeepSeek moment," indicating that while significant progress has been made, a breakthrough akin to ChatGPT has not yet occurred [15][16]. Group 3: World Models - World models are identified as crucial for enabling robots to understand and interact with the physical world, serving as a foundational element for embodied intelligence [21][25]. - The article outlines three primary uses of world models: facilitating a feedback loop with the robot's brain, generating trajectory data, and integrating physical understanding into robot operations [25][26]. Group 4: Future Directions - The need for world models in the industry is emphasized, particularly for enhancing the generalization capabilities of robots in complex environments [28][31]. - The article suggests that the evolution of world models is still in its early stages, with ongoing developments aimed at improving their application in robotic training and task execution [29][30].
美国AI,踏入“旋转门”
Hu Xiu· 2025-10-23 09:56
创造神话,Sora2只用了一个晚上。 在"邀请码+iOS系统限定+仅开放美加地区"的三重高门槛情况下,Sora上线即刷屏,不到五天就登顶美国App Store应用榜榜首。 生物伦理混乱、历史覆盖现实、人类极限消失。 AI赛道,被Sora2推上新的赛点。 不过这次,全球网友有点过于热情了。 来自好莱坞、任天堂的顶级IP一个接一个地被玩坏,日本动漫也被网友"大杂烩",不光柯南能与路飞打棒球,路飞还能把悟空一拳打 飞。 版权纠纷带来的法律风险自不必说,这么一搞,Sora吸引用户的策略也难免受到影响。 对OpenAI来说,迪士尼们不给版权,它当然也可以不给用户开放经典IP的二创权。 反正盈利已经如此拉胯,虱子多了不怕痒,罐子破了不怕摔。 这次Sora2不仅原生音频,还实现了音画同步和一定的故事性。 至此,全球网友终于集齐TikTok、ChatGPT、Sora三大神技,可以在网络上再造一个虚拟的平行世界。 新手玩家们用Sora2让猫咪开赛车、拖拉机,老选手已经把奥特曼的虚拟形象带到中国,让他在各大视频网站上说相声、搞穿越。 有人在视频里养老虎和恐龙,就有狗在违章驾驶时试图逃逸,还有老人在"单手举老伴"运动后,发现自己是 ...
预见未来,《Al Car的初步畅想与探索实践》白皮书发布
Core Insights - The article discusses the release of the first white paper themed "AI Car" in the automotive industry, which outlines the product definition and key technological foresight for the transition to the AI Car era [3][4]. Group 1: AI Car Definition and Key Technologies - The white paper defines AI Car as a super intelligent entity composed of multiple sub-intelligent agents, including driving, cabin, chassis, and power agents [3][4]. - It emphasizes that AI technology will fundamentally reshape the development paradigm and user experience of smart terminals [3]. - The paper identifies ten key judgments regarding the future of AI Cars, including the transformation of autonomous driving system design logic and capabilities through VLA [3][4]. Group 2: Future Directions and Strategic Implications - AI will enable the formation of a larger end-to-end system combining intelligent driving and chassis, thereby redefining the driving experience [8][9]. - The transition of power batteries towards intelligent battery systems capable of real-time perception and autonomous decision-making is highlighted [9]. - The white paper suggests that the product transformation driven by AI will alter the survival and development logic of enterprises, shifting their strategic goals from "making good cars" to "operating intelligent entities" [10][11]. Group 3: Recommendations for Enterprises - Companies are advised to define the unique personality and value proposition of their intelligent entities to rejuvenate brand identity [10]. - It is recommended that enterprises enhance their data value across the entire process and establish a cross-functional AI development team to ensure systematic research and development of AI Cars [11]. - The white paper proposes that automakers should accelerate the construction of comprehensive ecological resource integration capabilities to strengthen user engagement and create competitive barriers in the AI era [11].
人工最高节省90%,AI制作游戏被批“没有灵魂”
第一财经· 2025-10-22 10:12
Core Viewpoint - The article discusses the significant impact of AI on the gaming industry, particularly in enhancing development efficiency and changing production methods. It highlights the ongoing exploration of AI tools that can streamline game creation processes, potentially reducing the time and cost involved in game development [3][4][5]. Group 1: AI's Impact on Game Development Efficiency - AI tools can save approximately 70% to 80% of the workload in game development, especially in art asset processing, with animation and modeling being the most labor-intensive areas [5]. - The use of AI in animation can reduce the time required for tasks such as skinning from 1.5 to 3.5 days to just 1 to 3 hours, achieving a labor saving of 70% to 90% [5]. - AI can also automate the generation of smooth animations from keyframes, increasing efficiency by 3 to 5 times compared to traditional methods [5][6]. Group 2: Adoption and Implementation of AI Tools - Tencent has reported a 40% reduction in character animation production cycles due to AI tools, with some projects reducing prototype validation time from 2 weeks to 3 days [6]. - Over 50 external companies, including major players in the gaming industry, are currently utilizing Tencent's AI tools, which are also being tested by companies in Japan, South Korea, and Europe [6]. - AI tools are particularly beneficial for small to medium-sized teams, allowing them to achieve results that previously required larger teams [11]. Group 3: Cost Reduction and Production Quality - AI can significantly lower production costs; for example, in high-quality 3D games like "Black Myth: Wukong," AI tools can handle 20% to 30% of secondary resources, potentially saving millions in production costs [7]. - The cost of using AI tools is relatively low compared to human labor, making them an attractive option for game developers [7]. Group 4: Industry Perspectives on AI - There are mixed opinions within the industry regarding AI's ability to replace human creativity, with some believing that AI lacks the "soul" necessary for compelling game design [8][10]. - However, some industry professionals have begun to recognize the potential of AI to enhance creativity and provide new avenues for game development [10][11]. Group 5: Future of Game Development with AI - The integration of AI tools is expected to evolve the game development pipeline without completely disrupting existing workflows [12]. - New technologies, such as Google's interactive world models, are emerging, which could further enhance game development processes by allowing for quicker and more effective communication of game concepts [13][14]. - The future may see a convergence of different AI paths, leading to unique workflows in game development over the next few years [14].
人工最高节省90%,AI制作游戏被批“没有灵魂”
Di Yi Cai Jing· 2025-10-22 09:15
Core Insights - The gaming industry is experiencing significant efficiency improvements due to AI tools, which can reduce art production costs by 20% to 30% in high-budget 3D games, leading to savings of millions [5][6][11] - AI is transforming game development processes, allowing for faster production timelines and reducing the reliance on traditional labor-intensive methods [3][4][10] Group 1: AI Impact on Game Development - AI tools can handle 70% to 80% of the art asset processing workload in game development, particularly in animation and modeling [3][4] - The use of AI in animation can reduce the time required for tasks such as skinning from 1.5 to 3.5 days down to just 1 to 3 hours, achieving a labor savings of 70% to 90% [3][4] - AI-generated animations can enhance efficiency by producing 60 frames of smooth animation from just 5 to 10 keyframes, increasing productivity by 3 to 5 times [3][4] Group 2: Adoption and Usage of AI Tools - Tencent has developed and opened its AI tools to over 50 external companies, including major players in the gaming industry [4][11] - The tools have been successfully implemented in Tencent's internal projects, resulting in a 40% reduction in character animation production cycles [4][11] - Smaller teams are more likely to adopt AI tools, as they can significantly enhance workflow efficiency and reduce production costs [10][11] Group 3: Industry Perspectives on AI - There are mixed opinions within the industry regarding AI's ability to replace human creativity, with some believing AI lacks the "soul" necessary for compelling game design [8][9] - Despite skepticism, some industry professionals have noted AI's surprising advancements in generating engaging narratives and creative content [9][10] - AI is seen as a tool that democratizes game development, enabling smaller teams to achieve results that previously required larger, more experienced groups [10][11] Group 4: Future of AI in Gaming - The integration of AI tools is expected to evolve, with new technologies like interactive world models potentially reshaping game production workflows [11][12] - The gaming industry is likely to see a coexistence of various AI tools for the foreseeable future, as companies explore different approaches to automation and intelligence in game development [12]