世界模型
Search documents
小米社招&校招 | 自动驾驶与机器人具身智能算法研究员 (VLA方向)
具身智能之心· 2025-07-01 12:07
核心职责包括 前沿算法研究与构建:负责设计和实现领先的具身多模态大模型。您的研究将不仅限于现有的VLA框架,更将 探索如何构建能够理解复杂三维世界、并进行长时序、多步骤任务规划的世界模型 (World Model)。 核心模型能力攻关:主导模型在以下关键能力上的突破: 多模态场景理解:融合视觉、语言、雷达等多源信息,实现对动态、开放环境的深刻理解和空间感知。 职位描述 我们正在寻找一位杰出的研究员/科学家,加入我们的前沿探索团队,共同定义和构建下一代自动驾驶与机器人 的"大脑"。您将致力于突破性的具身基座模型 (Embodied Foundation Model) 的研究,该模型将深度融合视觉-语 言-行动 (VLA) 能力,并具备卓越的空间感知与空间推理能力。 复杂语义推理与决策:让模型能够理解模糊、抽象的人类指令,并结合对物理世界的空间推理,生成安全、合 理、可解释的行动序列。 学习与适应机制:深入研究强化学习 (RL)、模仿学习 (IL) 及自监督学习方法,使模型能从海量数据和与环境的 交互中持续学习和进化。 技术愿景与路线图:主导构建可泛化、高效率的具身智能基座模型,为未来1-3年的技术演进提供核心支 ...
“三年实现商业化”,哈啰如何跑通Robotaxi?
2 1 Shi Ji Jing Ji Bao Dao· 2025-07-01 10:03
Core Insights - The article discusses the competitive landscape of the Robotaxi industry, highlighting the shift from technology development to commercialization and scaling [1] - Ha Luo's entry into the Robotaxi market is supported by its user data and local operational experience, as well as a significant investment partnership with Ant Group and CATL [2][6] - The company aims to achieve commercialization within three years, focusing initially on the domestic market before expanding internationally [9][15] Company Strategy - Ha Luo plans to adopt a differentiated competition strategy by creating a multi-layered, accessible operational platform that integrates various car manufacturers and technology partners [4] - The platform will allow for resource sharing among partners, reducing operational costs and lowering the barriers for cities to implement Robotaxi services [4] - The company emphasizes the importance of data acquisition, particularly focusing on long-tail data to enhance model training for autonomous driving [5] Investment and Partnerships - The joint venture with Ant Group and CATL involves an initial investment of over 3 billion yuan, aimed at advancing L4 autonomous driving technology [2][6] - Ant Group will contribute to AI infrastructure and algorithm research, while CATL will provide battery technology and operational support [7] Technical Development - Ha Luo acknowledges the challenges in developing L4 technology, particularly in acquiring functional cases and long-tail data [9] - The company is exploring a dual approach to technology, combining AI-driven methods with traditional sensor technologies like LiDAR for enhanced reliability [13][14] Market Positioning - The company positions itself as a latecomer with unique advantages, leveraging the maturity of the industry to make targeted investments [3] - Ha Luo aims to create a commercially viable L4 product that is not only technologically sound but also economically feasible for consumers [8][12]
AI下半场,大模型要少说话,多做事
Hu Xiu· 2025-07-01 01:33
Core Insights - The article discusses the rapid advancements in AI models in China, particularly highlighting the performance improvements of DeepSeek and other models over the past year [1][3][5] - The establishment of the "Fangsheng" benchmark testing system aims to standardize AI model evaluations and address issues of cheating in rankings [2][44] - The competitive landscape of AI models is characterized by frequent updates and rapid changes in rankings, with Chinese models increasingly dominating the top positions [4][5][8] Group 1: AI Model Performance - DeepSeek has shown significant performance improvements, moving from a lower ranking in April 2024 to becoming the top model by December 2024 [1] - The current landscape features approximately six Chinese models in the top ten, indicating a strong domestic presence in AI development [3] - The frequency of updates has increased, leading to shorter durations for models to maintain top positions, with rankings changing as often as every few days [5][7] Group 2: Benchmark Testing - The "Fangsheng" benchmark testing system was introduced to provide a standardized method for evaluating AI models, addressing the lack of consistency in existing tests [2][44] - The testing framework includes a diverse set of questions, focusing on real-world applications rather than traditional academic assessments [43][46] - The system aims to enhance the practical capabilities of AI models, ensuring they can effectively contribute to the economy [44][53] Group 3: Future of AI and Agents - The concept of Agents, which operate on top of AI models, is gaining traction, allowing for more autonomous and intelligent functionalities [20][21] - Future developments may lead to the emergence of specialized Agents for various tasks, potentially transforming individual productivity and collaboration with AI [25][26] - The integration of databases and knowledge repositories with AI models is essential for improving accuracy and reducing misinformation [17][19] Group 4: Industry Implications - The advancements in AI models and the establishment of benchmark testing are expected to drive significant changes in various industries, enhancing operational efficiency and innovation [35][52] - Companies are encouraged to focus on the practical applications of AI, moving beyond mere content generation to deeper analytical capabilities [52][53] - The competitive landscape remains fluid, with no single company holding a definitive advantage, as multiple players vie for user engagement and market share [28]
头部Robotaxi专家小范围交流
2025-07-01 00:40
Summary of Key Points from the Conference Call Industry Overview - The conference call primarily discusses the **L4 level autonomous driving** industry, focusing on various companies and their technological approaches, including **Tesla**, **Vivo**, **Baidu**, and **Pony** [1][2][6][7]. Core Insights and Arguments - **Current Autonomous Driving Models**: The mainstream approach for autonomous driving combines local end-to-end two-stage models, utilizing CNN and LLM for perception and prediction, while planning and control rely on rule-based methods to ensure safety [1][2]. - **Tesla's Technology**: Tesla employs a pure end-to-end visual model, which offers fast response times and excels in complex scenarios. However, it faces challenges such as complex training processes and difficulties in data labeling, leading to potential dangerous behaviors in unseen data [3][4]. - **Domestic L4 Systems**: Domestic L4 autonomous driving systems outperform Tesla in driving comfort, safety in complex road conditions, and path planning in sharp turns. Companies like Baidu and Pony enhance perception capabilities through multi-sensor fusion, making them more suitable for complex domestic traffic environments [6][7]. - **Lidar Necessity**: Lidar is deemed essential for L4 autonomous driving, especially in low visibility conditions, as it effectively identifies object shapes, addressing the shortcomings of pure visual systems [9]. - **Cost and Performance of Chips**: The performance and stability of chips are critical for L4 functionality. While domestic chips are improving, they still lag behind Nvidia in peak performance and ecosystem support. However, U.S. sanctions are driving a trend towards domestic alternatives, significantly reducing costs [12][13]. - **Testing and Simulation**: L4 companies utilize extensive testing and simulation technologies to address common issues, moving away from solely relying on real-world testing, which is labor-intensive and limited [14]. Additional Important Points - **Regulatory Environment**: The operation of Robotaxi services requires prior data submission to government authorities for area approval, indicating a structured regulatory framework [17][18]. - **Challenges in Scaling**: The high cost of individual vehicles, regulatory restrictions, and the need for infrastructure development are significant barriers to scaling operations for companies like Pony and WeRide [16]. - **Talent Acquisition**: Companies are focusing on recruiting high-end talent from both domestic and international sources, with a strong emphasis on graduates from top Chinese universities [25][26]. - **Future Technological Iterations**: While no major technological shifts are expected in the short term, the integration of large language models into autonomous driving systems is anticipated to significantly enhance capabilities [28]. This summary encapsulates the key discussions and insights from the conference call, highlighting the current state and future prospects of the L4 autonomous driving industry.
AI专家给奥特曼泼凉水:纯LLM从未真正理解世界,以此构建AGI没希望
3 6 Ke· 2025-06-30 09:29
划重点: 6月29日消息,OpenAI首席执行官山姆・奥特曼(Sam Altman)满怀憧憬,认为通用人工智能的曙光已近在咫尺,其观点如同一剂强心 针,让众多追随者热血沸腾,对未来的智能时代充满无尽遐想。然而,美国认知科学家、人工智能专家加里・马库斯(Gary Marcus)却 如同一盆冷水,无情地泼向这看似热烈的憧憬之中。 马库斯日前发表长文《生成式AI的致命缺陷:缺乏稳健的世界模型》(Generative AI's crippling and widespread failure to induce robust models of the world),在学术与科技界引发强烈共鸣。这篇文章从一个荒诞的AI生成视频切入——视频中,一名国际象棋选手竟将对方 的棋子横向移动数格——引出他对当前生成式人工智能最深层的批判:这些模型虽然能"模仿思考",但从未真正建立起对世界的稳定、 可靠理解。 这并不是第一次有人指出大语言模型在推理方面存在严重缺陷。苹果公司本月发布的研究论文《思维的幻觉》(Illusion of Thinking) 中,就系统记录了大语言模型在逻辑推理和数学计算中频繁出错的实例。然而,正如马库斯 ...
LeCun发布最新世界模型:首次实现16秒连贯场景预测,具身智能掌握第一视角!还打脸用了VAE
量子位· 2025-06-30 06:38
Core Viewpoint - Yann LeCun, a prominent figure in AI and deep learning, is focusing on developing a new model called PEVA, which aims to enhance embodied agents' predictive capabilities, allowing them to anticipate actions similarly to humans [2][10]. Group 1: PEVA Model Development - The PEVA model enables embodied agents to learn predictive abilities, achieving coherent scene predictions for up to 16 seconds [2][6]. - The model integrates structured action representation with 48-dimensional kinematic data of human joints and a conditional diffusion Transformer [3][20]. - PEVA utilizes first-person perspective video and full-body pose trajectories as inputs, moving away from abstract control signals [4][12]. Group 2: Technical Innovations - The model addresses computational efficiency and delay issues in long-sequence action prediction through random time jumps and cross-historical frame attention [5][24]. - PEVA captures both "overall movement" and "fine joint movements" using high-dimensional structured data, which traditional models fail to represent accurately [16][18]. - The architecture employs a hierarchical tree structure for motion encoding, ensuring translation and rotation invariance [25]. Group 3: Performance Metrics - PEVA outperforms baseline models in various tasks, showing lower LPIPS and FID values, indicating higher visual similarity and better generation quality [33][35]. - In single-step predictions, PEVA's LPIPS value is 0.303, and FID is 62.29, demonstrating its effectiveness compared to the CDiT baseline [33][35]. - The model's ability to predict visual changes within 2 seconds and generate coherent videos for up to 16 seconds marks a significant advancement in embodied AI [40]. Group 4: Practical Applications - PEVA can intelligently plan actions by evaluating multiple options and selecting the most appropriate sequence, mimicking human trial-and-error planning [42]. - The model's capabilities could lead to more efficient robotic systems, such as vacuum cleaners that can anticipate obstacles and navigate more effectively [51].
AI 开始「自由玩电脑」了!吉大提出「屏幕探索者」智能体
机器之心· 2025-06-27 04:02
Core Viewpoint - The article discusses the development of a vision-language model (VLM) agent named ScreenExplorer, which is designed to autonomously explore and interact within open graphical user interface (GUI) environments, marking a significant step towards achieving general artificial intelligence (AGI) [2][3][35]. Group 1: Breakthroughs and Innovations - The research introduces three core breakthroughs in the training of VLM agents for GUI exploration [6]. - A real-time interactive online reinforcement learning framework is established, allowing the VLM agent to interact with a live GUI environment [8][11]. - The introduction of a "curiosity mechanism" addresses the sparse feedback issue in open GUI environments, motivating the agent to explore diverse interface states [10][12]. Group 2: Training Methodology - The training involves a heuristic and world model-driven reward system that encourages exploration by providing immediate rewards for diverse actions [12][24]. - The GRPO algorithm is utilized for reinforcement learning training, calculating the advantage of actions based on rewards obtained [14][15]. - The training process allows for multiple parallel environments to synchronize reasoning, execution, and recording, enabling "learning by doing" [15]. Group 3: Experimental Results - Initial experiments show that without training, the Qwen2.5-VL-3B model fails to interact effectively with the GUI [17]. - After training, the model demonstrates improved capabilities, successfully opening applications and navigating deeper into pages [18][20]. - The ScreenExplorer models outperform general models in exploration diversity and interaction effectiveness, indicating a significant advancement in autonomous GUI interaction [22][23]. Group 4: Skill Emergence and Conclusion - The training process leads to the emergence of new skills, such as cross-modal translation and complex reasoning abilities [29][34]. - The research concludes that ScreenExplorer effectively enhances GUI interaction capabilities through a combination of exploration rewards, world models, and GRPO reinforcement learning, paving the way for more autonomous agents and progress towards AGI [35].
具身世界模型新突破,地平线 & 极佳提出几何一致视频世界模型增强机器人策略学习
机器之心· 2025-06-26 04:35
近年来,随着人工智能从感知智能向决策智能演进, 世界模型 (World Models) 逐渐成为机器人领域的重要研究方向。世界模型旨在让智能体对环境进行建模并 预测未来状态,从而实现更高效的规划与决策。 与此同时,具身数据也迎来了爆发式关注。因为目前具身算法高度依赖于大规模的真实机器人演示数据,而这些数据的采集过程往往成本高昂、耗时费力,严重 限制了其可扩展性和泛化能力。尽管仿真平台提供了一种相对低成本的数据生成方式,但由于仿真环境与真实世界之间存在显著的视觉和动力学差异(即 sim-to- real gap),导致在仿真中训练的策略难以直接迁移到真实机器人上,从而限制了其实际应用效果。 因此如何高效获取、生成和利用高质量的具身数据,已成为当 前机器人学习领域的核心挑战之一 。 项目主页: https://horizonrobotics.github.io/robot_lab/robotransfer/ 模仿学习(Imitation Learning)已成为机器人操作领域的重要方法之一。通过让机器人 "模仿" 专家示教的行为,可以在复杂任务中快速构建有效的策略模型。然 而,这类方法通常依赖大量高质量的真实机器 ...
特文特大学Vanessa Evers:构建机器人的“世界模型”是实现社交智能的关键
Qi Lu Wan Bao· 2025-06-25 06:38
Group 1 - The event "Dancing with Social Robots" was held at the National Exhibition and Convention Center in Tianjin, focusing on the cultural phenomenon of robots entering various domains such as classrooms and public spaces [1] - Experts discussed the coexistence with social intelligent robots and the underlying reasons for their integration into society [1] Group 2 - Professor Vanessa Evers from Twente University emphasized the need to build a "world model" for achieving social intelligence in robots, using the example of fishing to illustrate the complexity of sensory inputs required for decision-making [3] - Current limitations include the need for digitalizing the entire world, as existing trials are confined to limited environments like classrooms and hospitals, making implementation challenging despite the availability of various sensors [3] - Evers highlighted that robots can learn human expressions and etiquette by analyzing YouTube videos, but their operational methods do not need to mimic humans exactly, suggesting the use of optimized mechanical arms instead of human-like ones [3] - The ultimate goal of developing social robots raises questions about their integration into human life versus providing a space for self-expression, with concerns about misuse prompting a call for public and governmental discussions on technology's development and application boundaries [3] - Evers pointed out that energy issues pose significant challenges in the laboratory, particularly for soft robots that require efficient energy transmission similar to human blood, while battery technology is progressing slowly [3]
【私募调研记录】深圳领峰资产调研四维图新
Zheng Quan Zhi Xing· 2025-06-25 00:10
Group 1: Company Insights - Shenzhen Lingfeng Asset recently conducted research on the listed company Siwei Tuxin, highlighting the trend of intelligent driving equality becoming a key industry focus [1] - The company noted that mid-to-high-level assisted driving functions are gradually being integrated into lower-end models, establishing intelligent driving as a leading business segment [1] - Siwei Tuxin's data compliance business shows a clear growth trend, with AI-enhanced data loops aiding automakers in rapid algorithm iteration and optimization [1] Group 2: Product Development and Market Trends - The world model is being utilized for behavior prediction and trajectory generation, with productization aimed at OEMs and Tier 1 suppliers [1] - The company emphasized the need for intelligent driving orders to achieve certain sales volumes to realize economies of scale, alongside internal cost control and operational efficiency improvements positively impacting profitability [1] - The implementation of new national standards for two-wheeled vehicles is expected to create new market demands for Jiefa Technology's SoC cockpit products, aligning with leading automakers' overseas expansion needs [1] Group 3: Financial Projections and Growth - Jiefa Technology anticipates a revenue growth of over 12% in 2024, with an additional 3 million sets of basic driving point products and 600,000 sets of cockpit products expected to be secured by Q1 2025 [1] - The company is confident in achieving significant loss reduction by 2025, supported by the successful launch of its fifth-generation SoC product, the AC8025AE [1] - Jiefa Technology's automotive-grade MCU chip AC7870 has been successfully launched, meeting ISO 26262 ASIL-D functional safety standards, applicable across various scenarios [1]