世界模型

Search documents
打算招聘几位大佬共创平台(4D标注/世界模型/VLA等方向)
自动驾驶之心· 2025-09-23 23:32
QS200以内高校,硕士及以上学历,手握顶会的大佬优先。 待遇说明 自动驾驶资源共享(求职、读博、出国留学推荐等); 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 业务合伙人 自动驾驶之心业务合伙人招募来啦!我们团队今年计划向国内外招募10名优秀的合伙人,负责自动驾驶相 关课程研发、论文辅导业务开发、硬件研发; 主要方向 如果您是大模型/多模态大模型、扩散模型、VLA、端到端、具身交互、联合预测、SLAM、3D目标检测、 世界模型、闭环仿真3DGS、大模型部署与量化感知推理等方向,欢迎加入我们; 岗位要求 丰厚的现金激励; 创业项目合作与推荐; 联系我们 更多欢迎添加微信咨询,备注" 机构/公司 + 自动驾驶合作咨询 "。 ...
3DGS重建!gsplat 库源码解析
自动驾驶之心· 2025-09-23 23:32
作者 | 微卷的大白 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1952449084788029155 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 前两天看到李飞飞 Worldlabs 新工作Mrable的时候,提到后面想多看一看 3DGS / 重建相关的工作。 不过如果真的有小白要踩坑 ,gsplat 的文档和维护其实比gaussian-splatting 要稍微好一些,个人更推荐这个库。 相比3DGS 论文对应的 gaussian-splatting 库,nerfstudio-projectgsplat 是对官方库做了一些优化,可参考https://docs.gsplat.studio/main/migration/migration_inria.html 的 说明。 但是知乎搜了一下发现,讲 3DGS 论文原理、改进的不少,我自己上半年也回顾过cuda kernel 源码:重温经典之 3DGS CUDA 源码解析 ,但是另一个常用的gsplat ...
AI技术未来发展趋势预测
Sou Hu Cai Jing· 2025-09-21 13:31
Group 1: Technological Breakthroughs - The emergence of native multimodal large models will replace piecemeal multimodal systems, achieving a 300% improvement in inference efficiency through deep integration of text, images, audio, and 3D data [1] - The acceleration of world models will establish a core technology foundation for embodied intelligence by 2025 [1] - The training paradigm will shift towards post-training scaling laws, optimizing reinforcement learning to reduce computational power consumption by 50% [4] Group 2: Industry Restructuring Trends - AI agents will provide hyper-personalized product customization, increasing customer satisfaction by 40% [6] - Real-time decision systems will enhance the speed of market response by three times in logistics and marketing [6] - The penetration of humanoid robots in industrial scenarios will achieve millimeter-level control precision, with smart factory coverage exceeding 80%, reducing manufacturing R&D cycles by 28.4% [6] Group 3: Social Integration Challenges - "Responsible AI" will become a mandatory standard, with non-compliant companies facing regulatory penalties and user attrition risks [8] - The automation rate of repetitive jobs will exceed 30%, while demand for creative and emotionally interactive roles will grow by 200% [8] - New mechanisms for privacy and copyright will emerge, with blockchain-enabled AI data rights technology addressing content ownership disputes [8] Group 4: Future Milestones - By 2027, general artificial intelligence (AGI) is expected to pass the Turing test in closed environments, and by 2030, neuromorphic chips will achieve a 1000-fold increase in energy efficiency [12] - By 2035, AI is projected to contribute over 40% to global GDP growth [12]
打算招聘几位大佬共创平台(世界模型/VLA等方向)
自动驾驶之心· 2025-09-21 06:59
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2] - The recruitment targets individuals with expertise in various advanced technologies such as large models, multimodal models, and 3D target detection [3] - Candidates from QS200 universities with a master's degree or higher are preferred, especially those with significant conference contributions [4] Group 2 - The compensation package includes resource sharing for job seeking, PhD recommendations, and study abroad opportunities, along with substantial cash incentives [5] - The company encourages potential partners to reach out via WeChat for collaboration inquiries, specifying the need to mention their organization or company [6]
无需训练的世界模型?西湖大学WorldForge开启空间智能新路径,让AI读懂3D世界
量子位· 2025-09-21 06:36
Core Viewpoint - The article discusses the advancements in AI-generated video content, highlighting the challenges of controllability in video generation models and introducing WorldForge as a solution to enhance precision in video creation without altering the model's weights [1][2]. Group 1: Challenges in Video Generation - AI-generated videos have gained significant attention due to their realistic visuals, but the lack of precise control over generated content remains a major limitation [1]. - Current models often require extensive retraining to improve controllability, which can be costly in terms of time and computational resources, potentially degrading the model's generalization ability [1]. Group 2: Introduction of WorldForge - WorldForge offers an innovative approach by guiding existing video generation models during the inference phase, allowing for precise control without modifying the model's weights [2][14]. - The framework consists of three collaborative modules designed to enhance the generation process [4]. Group 3: Key Modules of WorldForge - **Intra-step Recursive Refinement (IRR)**: This module sets boundaries for the AI's imagination by implementing a "predict-correct" micro-loop, allowing for timely corrections after each prediction to ensure adherence to a predefined trajectory [4][5]. - **Flow-Gated Latent Fusion (FLF)**: This module separates appearance and motion features, injecting motion signals only into relevant channels to maintain the quality of the generated content while controlling the perspective [6][7]. - **Dual-Path Self-Correcting Guidance (DSG)**: DSG addresses the imperfections in injected guidance signals by utilizing two parallel denoising paths to ensure high-quality output while adhering to trajectory constraints [7]. Group 4: Applications of WorldForge - WorldForge demonstrates remarkable capabilities, such as reconstructing 3D static scenes from a single image and generating 360° surround videos, indicating its potential for efficient world model exploration [9][8]. - The system allows users to design new camera trajectories for existing videos, executing complex movements and intelligently filling in newly exposed areas, outperforming traditional models that require extensive training [11]. - Additionally, WorldForge supports video content editing, including subject replacement and object manipulation, enabling creative modifications [12]. Group 5: Future Implications - WorldForge introduces a novel interactive and control approach in video generation, paving the way for the development of controllable world models without increasing training costs or losing prior knowledge [14]. - The potential for future advancements includes more natural interactions through language or gestures, allowing models to better understand and execute creative visions [14].
开放几个自动驾驶技术交流群(世界模型/端到端/VLA)
自动驾驶之心· 2025-09-20 16:03
欢迎大家加入一起交流相关的内容。感兴趣的同学欢迎添加小助理微信进群:AIDriver005, 备注:昵称 +方向加群。 自动驾驶之心技术交流群成立了,开学季&秋招期我们开放了几个技术交流群(世界模型/端到端/VLA等方 向)。 ...
黄仁勋随特朗普访英:26亿美元下注英国AI,智驾公司Wayve或获5亿美元加码
Sou Hu Cai Jing· 2025-09-20 09:57
Core Insights - NVIDIA's CEO Jensen Huang announced a £2 billion (approximately $2.6 billion) investment in the UK to catalyze the AI startup ecosystem and accelerate the creation of new companies and jobs in the AI sector [1] - Wayve, a UK-based autonomous driving startup, is expected to secure one-fifth of this investment, with NVIDIA evaluating a $500 million investment in its upcoming funding round [1][2] - Wayve's upcoming Gen 3 hardware platform will be built on NVIDIA's DRIVE AGX Thor in-vehicle computing platform [1] Company Overview - Wayve was founded in 2017 with the mission to reimagine autonomous mobility using embodied AI [3] - The company has developed a unique technology path focused on embodied AI and end-to-end deep learning models, distinguishing itself from mainstream autonomous driving companies [3][8] - Wayve is the first company in the world to deploy an end-to-end deep learning driving system on public roads [3] Technology and Innovation - Embodied AI allows an AI system to learn tasks through direct interaction with the physical environment, contrasting with traditional systems that rely on manually coded rules [8] - Wayve's end-to-end model, referred to as AV2.0, integrates deep neural networks with reinforcement learning, processing raw sensor data to output vehicle control commands [8][10] - To address the challenges of explainability in end-to-end models, Wayve developed the LINGO-2 model, which uses visual and language inputs to predict driving behavior and explain actions [10][12] Data and Training - Wayve has created the GAIA-2 world model, a video generation model designed for autonomous driving, which generates realistic driving scenarios based on structured inputs [14][15] - GAIA-2 is trained on a large dataset covering various geographical and driving conditions, allowing for effective training without extensive real-world driving data [16][17] - The model's ability to simulate edge cases enhances training efficiency and scalability [18] Strategic Partnerships - Wayve's technology does not rely on high-definition maps and is hardware-agnostic, allowing compatibility with various sensor suites and vehicle platforms [20] - The company has established partnerships with Nissan and Uber to test its autonomous driving technology [20] Leadership and Team - Wayve's leadership team includes experienced professionals from leading companies in the autonomous driving sector, enhancing its strategic direction and technological capabilities [25][26]
任少卿加入中科大......
自动驾驶之心· 2025-09-20 05:35
参考 | 量子位 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 任少卿去中科大了! AI大神任少卿开始在母校中国科学技术大学,开班招生了。 任少卿,曾任Momenta联合创始人、蔚来汽车副总裁,07级中科大本硕博(微软亚洲研究院联合培养),ResNet和Faster R-CNN作者。学术高被引超44 万,是全球中国籍学者高被引第一。ResNet也是21世纪全球最高被引论文。获未来科学大奖-数学与计算机科学奖。 招生方向为AGI、世界模型、具身智能、AI4S等。 硕士、博士生都在招。有推免资格的学生,下周一(22日)开启紧急面试。 更多内容 自动驾驶产业和学术最新咨询,欢迎加入自动驾驶之心知识星球,国内最大的自驾社区平台。 ...
任少卿在中科大招生了!硕博都可,推免学生下周一紧急面试
量子位· 2025-09-20 05:12
Core Viewpoint - Ren Shaoqing, a prominent figure in AI and computer vision, is starting a recruitment program at his alma mater, the University of Science and Technology of China, focusing on advanced topics in AI such as AGI, world models, embodied intelligence, and AI for Science [1][2]. Group 1: Recruitment Details - The recruitment is open for both master's and doctoral students, with emergency interviews starting on the upcoming Monday for students with recommendation qualifications [3]. - Interested students can send their resumes to Ren Shaoqing's email for inquiries regarding the application process and interview details [16]. Group 2: Background of Ren Shaoqing - Ren Shaoqing is an expert in computer vision and autonomous driving, having graduated from the University of Science and Technology of China and obtained a joint PhD with Microsoft Research Asia [4][5]. - He has been recognized as one of the most influential scholars in AI, ranking 10th in the AI 2000 list, and received the Future Science Prize in Mathematics and Computer Science in 2023 [6]. Group 3: Contributions to AI - Ren is a co-author of ResNet, a groundbreaking work in deep learning that addresses the vanishing gradient problem, significantly impacting fields requiring high perception capabilities like computer vision and autonomous driving [7]. - ResNet has received over 290,000 citations and won the Best Paper Award at CVPR 2016 [8]. - He also contributed to Faster R-CNN, an efficient two-stage object detection algorithm that balances speed and accuracy [10]. Group 4: Role in NIO - After completing his PhD, Ren co-founded Momenta and later joined NIO, where he played a key role in developing autonomous driving algorithms and leading the smart driving R&D team [13]. - At NIO, he developed the NIO World Model (NWM), which integrates spatiotemporal cognition and generative capabilities, allowing for high-fidelity scene reconstruction and long-term scenario simulation [14][15].
具身的这几个方向,组成了所谓的大小脑算法
具身智能之心· 2025-09-19 00:03
Core Viewpoint - The article discusses the evolution and current trends in embodied intelligence technology, emphasizing the integration of various models and techniques to enhance robotic capabilities in real-world environments [3][10]. Group 1: Technology Development Stages - The development of embodied intelligence has progressed through several stages, starting from grasp pose detection to behavior cloning, and now to diffusion policy and VLA models [7][10]. - The first stage focused on static object grasping with limited decision-making capabilities [7]. - The second stage introduced behavior cloning, allowing robots to learn from expert demonstrations but faced challenges in generalization and error accumulation [7]. - The third stage, marked by the introduction of diffusion policy methods, improved stability and generalization by modeling action sequences [8]. - The fourth stage, beginning in 2025, explores the integration of VLA models with reinforcement learning and world models to enhance predictive capabilities and multi-modal perception [9][10]. Group 2: Key Technologies and Techniques - Key technologies in embodied intelligence include VLA, diffusion policy, and reinforcement learning, which collectively enhance robots' task execution and adaptability [5][10]. - VLA models combine visual perception, language understanding, and action generation, enabling robots to interpret human commands and perform complex tasks [8]. - The integration of tactile sensing with VLA models expands the sensory capabilities of robots, allowing for more precise operations in unstructured environments [10]. Group 3: Industry Implications and Opportunities - The advancements in embodied intelligence are leading to increased demand for engineering and system capabilities, transitioning from theoretical research to practical deployment [10][14]. - There is a growing interest in training and deploying various models, including diffusion policy and VLA, on platforms like Mujoco and IsaacGym [14]. - The industry is witnessing a surge in job opportunities and research interest, prompting many professionals to shift focus towards embodied intelligence [10].