Workflow
Training Data
icon
Search documents
X @Forbes
Forbes· 2026-04-19 21:00
AI’s New Training Data: Your Old Work Slacks And EmailsDefunct startups are being liquidated for their Slack archives, Jira tickets, and email threads—operational exhaust that AI labs now treat as premium training data. Learn more: https://t.co/DN1FgPGnU8 (Photo: SimpleClosure) ...
Amazon Discovered Child Sex Abuse Content in AI Training Data
Bloomberg Technology· 2026-01-29 20:46
Let's start with the very basics of what this entity that we reported with saw in the data. When it came to to specifically relate to reports of child sexual abuse material. Just start with that, please.Yes. So there's an organization called the National Center for Missing and Exploited Children, and it effectively serves as a clearinghouse fielding tips from industry of child sexual abuse material to law enforcement. This is a really important part of the process because this is the connectivity that allow ...
Bridging the Sim-to-Real Gap for Accelerated Robot Training
NVIDIA· 2025-08-12 02:07
Core Technology & Solution - NVIDIA Cosmos is a world foundation model platform designed for developers to generate training data at an industrial scale [2] - Cosmos Predict generates realistic training data from an initial observation, creating diverse action variations using text prompts or action triggers [2] - Cosmos supports multi-view outputs, providing different perspectives from a single frame, which is especially useful for autonomous vehicles and multi-camera robots [3] - Cosmos Transfer applies appearance variations to 3D renders or real-world video, adjusting materials, lighting, weather, and environments to train models that generalize across domains while preserving physical accuracy [3] - Cosmos Reason, a vision language model, filters low-quality samples, annotates scenes, and supports policy training, enabling safe, efficient decision-making [4] - Cosmos World Foundation models are adaptable and can be post-trained to fit different sensors and perspectives [5] Industry Application & Impact - The fusion of AI and computer graphics, exemplified by Cosmos, enables robots and autonomous machines to safely operate in the real world [5] - The technology addresses the challenge of expensive and time-consuming real-world training data capture or manual synthetic data creation for robotics [1]
The Global Race for AI Adoption
Bloomberg Technology· 2025-07-28 19:45
AI Race & Adoption - The US AI action plan aims to compete with China, focusing on both innovation and adoption of AI [1] - Winning the AI race depends on which countries can best utilize AI for economic benefit [2] - The US has an advantage in AI adoption, but the race is still open [3] - AI adoption requires focus on talent, infrastructure, data, and governance frameworks [5][6] US AI Exportation - The US aims to be a net exporter of AI technology, including hardware and software [7] - AI adoption relies on cutting-edge cloud services and software, much of which originates in the US [9] Copyright & Training Data - Access to training data is crucial for the US to stay ahead in the AI race [11][12] - The US government acknowledges the importance of training data for AI development [11] EU Competitiveness - The EU has significant potential to benefit from AI if it focuses on adoption [13] - Addressing digital sovereignty barriers and streamlining regulations are important for the EU to effectively adopt and use AI [13][14]
一招缓解LLM偏科!调整训练集组成,“秘方”在此 | 上交大&上海AI Lab等
量子位· 2025-06-10 07:35AI Processing
IDEAL团队 投稿 量子位 | 公众号 QbitAI 大幅缓解LLM偏科,只需调整SFT训练集的组成。 本来不擅长coding的Llama 3.1-8B,代码能力明显提升。 上海交大&上海AI Lab联合团队提出创新方法 IDEAL ,可显著提升LLM在多种不同领域上的综合性能。 此外,研究还有一些重要发现,比如: 具体来看—— SFT后LLM部分能力甚至退化 大型语言模型 (LLM) 凭借其强大的理解和逻辑推理能力,在多个领域展现了惊人的能力。除了模型参数量的增大, 高质量的数据是公认的LLM性能提升最关键的影响因素。 当对模型进行监督微调(SFT)时,研究人员发现 LLM在多任务场景下常出现"偏科"现象 ——部分能力突出而部分 能力并未涨进,甚至退化。这种不平衡的现象导致大模型在不同的领域上能力不同,进而影响用户体验。 上海交大和上海AI Lab的研究者迅速将目光聚焦到SFT训练的训练集上,是否可以通过调整训练集的组成来缓解LLM 偏科的情况?直觉上来看,直接将LLM的弱势科目的训练数据增加一倍,就可以让最后的结果发生变化。但是,由于 训练数据之间的耦合关系,研究者通过建模量化每个领域数据对于最终结果的 ...