数据增强

Search documents
三万字解读:数据采集革命,决定机器人走向大规模落地|假期充电
锦秋集· 2025-10-03 04:03
⚡️ 假期充电系列继续 今天为大家整理 2025 年 CoRL 期间举办的首届 "Making Sense of Data in Robotics" Workshop,一起探究: 在机器人技术飞速发展的今天,人 们常常把目光聚焦在算法与模型上,是否忽视了真正决定"能否走出实验室、实现大规模落地"的底层变量——数据。 数据不仅是训练基础模型的燃料,更是 支撑策略泛化、稳定运行与安全可控 的地基。没有高质量、场景匹配的数据,再先进的模型也只能停留在论文 与Demo里。 此次Workshop正是一次针对这一"被低估的核心要素"的集体深思。会议聚焦于数据构成、数据筛选与数据可解释性三大命题,试图回答机器人行业最 迫切的问题: 1. 机器人真正需要什么样的数据? 2. 如何从海量原始信息中提炼出能提升策略表现的数据? 3. 又该如何理解数据对机器人决策与行为的实际影响? 锦秋基金(公众号:锦秋集,ID:jqcapital)认为, 这场 Workshop 的价值不只是学术交流,而是揭示了实体智能走向产业化过程中的"关键一 环"。 无论是 Joseph Lim 团队提出的"任务拆解 + 模块复用"式数据高效利用,还是 Ke ...
超低标注需求,实现医学图像分割,UCSD提出三阶段框架GenSeg
3 6 Ke· 2025-08-12 03:24
Core Insights - GenSeg utilizes AI to generate high-quality medical images and corresponding segmentation labels, significantly reducing the manual labeling burden on medical professionals [1][20] - The framework addresses the critical challenge of dependency on large amounts of high-quality annotated data in medical image semantic segmentation [1][20] Summary by Sections Technology Overview - GenSeg is a three-stage framework that tightly couples data augmentation model optimization with semantic segmentation model training, ensuring that generated samples effectively enhance segmentation model performance [2][10] - It can be applied to various segmentation models, such as UNet and DeepLab, improving their performance in both in-domain and out-of-domain scenarios [4][20] Methodology - The framework consists of two main components: a semantic segmentation model that predicts segmentation masks and a mask-to-image generation model that predicts corresponding images [9] - The training process involves three phases: training the generation model with real image-mask pairs, augmenting real segmentation masks to create synthetic image-mask pairs, and evaluating the segmentation model on a validation set to update the generation model [9][10] Experimental Results - GenSeg demonstrates significant sample efficiency, achieving comparable or superior segmentation performance while drastically reducing the number of training samples required [11][20] - In in-domain experiments, GenSeg-UNet requires only 50 images to achieve a Dice score of approximately 0.6, compared to 600 images for standard UNet, representing a 12-fold reduction in data [13] - In out-of-domain tasks, GenSeg-DeepLab achieves a Jaccard index of 0.67 using only 40 images, while standard DeepLab fails to reach this level with 200 images [13] Comparative Analysis - The end-to-end data generation mechanism of GenSeg outperforms traditional separate training strategies, as evidenced by improved performance metrics in various segmentation tasks [15] - Regardless of the type of generation model used, the end-to-end training strategy consistently outperforms the separate training strategy [17] Generalization and Efficiency - GenSeg exhibits strong generalization capabilities across 11 medical image segmentation tasks and 19 datasets, achieving absolute performance improvements of 10-20% while requiring only 1/8 to 1/20 of the training data compared to existing methods [20]
ERMV框架:针对操作任务的数据增强,显著提升VLA模型跨场景成功率
具身智能之心· 2025-07-28 13:19
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Chang Nie等 编辑丨具身智能之心 数学表达: 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 研究背景 机器人模仿学习高度依赖4D多视图序列图像(包含多视角、时间维度的图像),但高质量数据收集成本 高、数量稀缺,严重限制了视觉-语言-动作(VLA)等具身智能策略的泛化与应用。数据增强是缓解数据 稀缺的有效手段,但目前缺乏针对操作任务的4D多视图序列图像编辑方法。 现有方法存在明显的局限:传统数据增强方法(如CACTI、ROSIE)仅针对单张静态图像编辑,无法满足 VLA模型对时空连续4D数据的需求;多视图编辑方法依赖固定相机位置,难以处理机器人操作中动态变化 的多相机系统;视频生成模型因密集时空注意力机制,受限于计算成本,工作窗口小,且难以处理长序列 中的误差累积。 核心挑战与解决方案 ERMV(Editing Robotic Multi-View 4D data)是一种新型数据增强框架,基于单帧 ...