COSMOS

Search documents
LeCun出手,造出视频世界模型,挑战英伟达COSMOS
机器之心· 2025-07-29 09:58
Core Viewpoint - The article discusses the development and advantages of a new video world model called DINO-world, which aims to improve the efficiency and effectiveness of predicting future frames in various environments, particularly in the context of artificial intelligence and machine learning [9][10]. Data Challenges - The acquisition of large-scale, high-quality video datasets is costly, especially when action annotations are required. Current successful applications of world models are limited to specific fields like autonomous driving and video games [5]. - Accurately modeling physical laws and behaviors in unconstrained, partially observable environments remains a significant challenge, even for short time scales. Advanced pixel-based generative models consume enormous computational resources, with training times reaching up to 22 million GPU hours for models like COSMOS [6]. Model Development - DINO-world utilizes a frozen visual encoder (DINOv2) to pre-train the video world model in a latent space, followed by fine-tuning with action data for planning and control [9]. - The architecture of DINO-world significantly reduces resource consumption during both training and inference phases compared to current state-of-the-art models [10]. Training and Evaluation - DINO-world was trained on a large dataset of approximately 60 million uncleaned network videos, enabling it to learn transferable features across different domains [11]. - In the VSPW segmentation prediction task, DINO-world achieved a mean Intersection over Union (mIoU) improvement of 6.3% when predicting future frames, outperforming the second-best model [13]. Methodology - The model employs a frame encoder that does not directly model pixels but instead uses latent representations based on video patches, which significantly lowers the computational cost of training the predictor [19]. - The training objective is set as "next frame prediction," allowing for efficient parallelization and focusing on the most relevant tokens for loss calculation [27]. Action-Conditioned Fine-Tuning - DINO-world can be adapted for action-conditioned tasks by incorporating an action module that updates the query vector based on the corresponding actions, which can be trained on a small dataset of action-conditioned trajectories [30][33]. Experimental Results - DINO-world demonstrated superior performance in dense prediction tasks across various datasets, including Cityscapes, VSPW, and KITTI, validating the effectiveness of the proposed paradigm [37][38]. - The model's performance in intuitive physics tests showed a strong understanding of physical behaviors, comparable to larger models like V-JEPA [40][41]. Planning Evaluation - The action-conditioned model was trained on offline trajectories, showing significant performance improvements compared to models trained from scratch, particularly in more complex environments [44].
隔夜美股全复盘(6.26) | 英伟达涨逾4%,股价创新高再度成为全球市值最高的公司,黄仁勋称机器人技术是英伟达下一个万亿美元级别的增长机会
Sou Hu Cai Jing· 2025-06-25 23:04
01 大盘 昨夜美股三大股指震荡走低。道指跌 0.25%,纳指涨 0.31%,标普平收 0%。恐慌指数VIX跌4.12%至 16.76。美元指数昨日跌 0.28%,报97.7。美国十年国债收益率跌0.116%,收报4.292%,相较两年期国债 收益率差50.7个基点。现货黄金昨日涨0.27%,报3332.02美元/盎司。布伦特原油收跌0.61%至66.4。 02 行业&个股 行业板块方面,除半导体、科技和医疗分别收涨0.9%、0.85%和0.09%外,标普其他7大板块悉数收跌: 房地产、日常消费、公用事业、原料、工业、能源和通讯分别收跌2.44%、1.34%、1.34%、0.96%、 0.88%、0.44%和0.02%。 中概股涨跌互现,台积电涨 1.2%,台积电海外子公司计划发行价值100亿美元的新股,以加强其外汇套 期保值业务。阿里跌 2.1%,拼多多跌 0.02%,京东涨 1.1%,理想跌 1.47%,小鹏跌 3.27%, 富途涨 5.99%,蔚来跌 0.86%,小马智行跌 1.81%,小马智行被纳入纳斯达克中国金龙指数。 大型科技股多数收涨。英伟达涨 4.33%,股价创新高再度成为全球市值最高的公司。 ...