Core Insights - Apple has been perceived as lagging in the development and application of large models, particularly in the field of visual generation [1][2] - The company has made significant strides in research, recently introducing the Pico-Banana-400K dataset, which consists of 400,000 images for instruction-based image editing [6][9] Dataset Overview - The Pico-Banana-400K dataset is built using Google's Nano-banana model and aims to provide a comprehensive resource for training and evaluating text-guided image editing models [6][9] - The dataset includes a variety of subsets: - 258,000 single-turn editing examples covering 35 editing categories [12] - 72,000 multi-turn editing examples for studying sequential modifications [13] - 56,000 preference samples for alignment research [14] - Instruction pairing sets for developing instruction rewriting and summarization capabilities [15] Quality Control and Methodology - The dataset emphasizes quality and diversity through a systematic design, ensuring comprehensive coverage of editing types and balancing content consistency with instruction fidelity [9][16] - Apple has implemented a self-editing and evaluation process where the Nano-banana model performs edits, and Gemini 2.5 Pro assesses the results, allowing for automatic retries until successful [17] Editing Types and Success Rates - The dataset categorizes editing instructions into 35 types, covering a wide range of operations from color adjustments to object manipulation [21][22] - Success rates vary by editing type, with global appearance and style edits being the easiest, while precise geometric and text edits are the most challenging [31][32][34] Contributions to the Field - The release of Pico-Banana-400K represents a significant contribution to the field of multimodal learning, providing a large-scale, shareable dataset that supports various training objectives [40][41] - The dataset not only facilitates the training of models but also demonstrates the capability of AI to generate and validate training data autonomously, without human supervision [41][42]
打造图像编辑领域的ImageNet?苹果用Nano Banana开源了一个超大数据集