Training Data
Search documents
Bridging the Sim-to-Real Gap for Accelerated Robot Training
NVIDIA· 2025-08-12 02:07
Core Technology & Solution - NVIDIA Cosmos is a world foundation model platform designed for developers to generate training data at an industrial scale [2] - Cosmos Predict generates realistic training data from an initial observation, creating diverse action variations using text prompts or action triggers [2] - Cosmos supports multi-view outputs, providing different perspectives from a single frame, which is especially useful for autonomous vehicles and multi-camera robots [3] - Cosmos Transfer applies appearance variations to 3D renders or real-world video, adjusting materials, lighting, weather, and environments to train models that generalize across domains while preserving physical accuracy [3] - Cosmos Reason, a vision language model, filters low-quality samples, annotates scenes, and supports policy training, enabling safe, efficient decision-making [4] - Cosmos World Foundation models are adaptable and can be post-trained to fit different sensors and perspectives [5] Industry Application & Impact - The fusion of AI and computer graphics, exemplified by Cosmos, enables robots and autonomous machines to safely operate in the real world [5] - The technology addresses the challenge of expensive and time-consuming real-world training data capture or manual synthetic data creation for robotics [1]
The Global Race for AI Adoption
Bloomberg Technology· 2025-07-28 19:45
AI Race & Adoption - The US AI action plan aims to compete with China, focusing on both innovation and adoption of AI [1] - Winning the AI race depends on which countries can best utilize AI for economic benefit [2] - The US has an advantage in AI adoption, but the race is still open [3] - AI adoption requires focus on talent, infrastructure, data, and governance frameworks [5][6] US AI Exportation - The US aims to be a net exporter of AI technology, including hardware and software [7] - AI adoption relies on cutting-edge cloud services and software, much of which originates in the US [9] Copyright & Training Data - Access to training data is crucial for the US to stay ahead in the AI race [11][12] - The US government acknowledges the importance of training data for AI development [11] EU Competitiveness - The EU has significant potential to benefit from AI if it focuses on adoption [13] - Addressing digital sovereignty barriers and streamlining regulations are important for the EU to effectively adopt and use AI [13][14]
一招缓解LLM偏科!调整训练集组成,“秘方”在此 | 上交大&上海AI Lab等
量子位· 2025-06-10 07:35AI Processing
IDEAL团队 投稿 量子位 | 公众号 QbitAI 大幅缓解LLM偏科,只需调整SFT训练集的组成。 本来不擅长coding的Llama 3.1-8B,代码能力明显提升。 上海交大&上海AI Lab联合团队提出创新方法 IDEAL ,可显著提升LLM在多种不同领域上的综合性能。 此外,研究还有一些重要发现,比如: 具体来看—— SFT后LLM部分能力甚至退化 大型语言模型 (LLM) 凭借其强大的理解和逻辑推理能力,在多个领域展现了惊人的能力。除了模型参数量的增大, 高质量的数据是公认的LLM性能提升最关键的影响因素。 当对模型进行监督微调(SFT)时,研究人员发现 LLM在多任务场景下常出现"偏科"现象 ——部分能力突出而部分 能力并未涨进,甚至退化。这种不平衡的现象导致大模型在不同的领域上能力不同,进而影响用户体验。 上海交大和上海AI Lab的研究者迅速将目光聚焦到SFT训练的训练集上,是否可以通过调整训练集的组成来缓解LLM 偏科的情况?直觉上来看,直接将LLM的弱势科目的训练数据增加一倍,就可以让最后的结果发生变化。但是,由于 训练数据之间的耦合关系,研究者通过建模量化每个领域数据对于最终结果的 ...