自博弈

Search documents
Vision-Zero:零数据VLM自我进化!陈怡然团队提出零监督训练新范式
机器之心· 2025-10-11 03:29
Core Insights - The article discusses the development of Vision-Zero, a self-play framework designed for Vision-Language Models (VLM), which aims to overcome the limitations of traditional training methods that rely heavily on human-annotated data and reinforcement learning rewards [6][7][26]. Background - VLMs have shown impressive performance in multimodal tasks, but they face challenges such as data scarcity due to high annotation costs and a knowledge ceiling that limits model capabilities [6]. - The Vision-Zero framework introduces a self-play strategy that allows VLMs to generate complex reasoning data autonomously, eliminating the need for manual annotation [6]. Framework Characteristics - Vision-Zero employs a self-play framework based on social reasoning games, enabling agents to generate high-complexity reasoning data during self-play [6]. - It allows any form of image as input, enhancing the model's ability to generalize across various domains [6]. - The framework incorporates an iterative self-play policy optimization algorithm that addresses performance bottlenecks common in traditional self-play methods [7]. Game Design - Inspired by social reasoning games, Vision-Zero includes a set of rules where agents must deduce hidden roles based on subtle differences in images, fostering complex reasoning chains [12][15]. - The game requires only two images with slight differences, making data construction simple and cost-effective [17]. Training Methodology - The framework utilizes a dual-phase alternating training approach to avoid local equilibrium and knowledge saturation, enhancing the model's ability to explore new reasoning paths [20]. - This method has shown to significantly outperform single-phase training in various tasks [20]. Experimental Results - Vision-Zero demonstrates strong task generalization capabilities, outperforming state-of-the-art methods that require annotated data across multiple benchmark datasets [22]. - The models trained under Vision-Zero effectively mitigate negative transfer issues commonly seen in VLMs, maintaining performance across different tasks [24]. Implications - Vision-Zero illustrates the feasibility and potential of self-play in transitioning from single-task to general-task applications, breaking free from the constraints of manual annotation and knowledge limitations [26].
OpenAI拿下IOI金牌,仅次于前五名人类选手!参赛推理模型才夺得IMO金牌
创业邦· 2025-08-12 03:33
Core Viewpoint - OpenAI's reasoning model achieved a gold medal score at the 2025 International Olympiad in Informatics (IOI), ranking first among AI participants and demonstrating significant advancements in general reasoning capabilities [2][9][16]. Group 1: Competition Performance - OpenAI participated in the online AI track of IOI 2025, scoring just behind five human competitors among 330 participants, securing the top position among AI competitors [6][8]. - The model used by OpenAI was not specifically trained for IOI but was based on a general reasoning model that performed exceptionally well [8][14]. - Compared to last year's performance, OpenAI's score improved dramatically from the 49th percentile to the 98th percentile, showcasing a leap in capabilities [9]. Group 2: Model and Strategy - OpenAI utilized the same model that won gold at the International Mathematical Olympiad (IMO) 2025 without any modifications for the IOI competition [14][15]. - The strategy involved sampling answers from different models and using a heuristic method to select submissions, which contributed to the successful outcome [14]. Group 3: Community Reaction and Future Implications - The achievement has sparked excitement in the community, highlighting the growing strength of general reasoning abilities without specialized training [16]. - There is anticipation for OpenAI to release a public version of the technology that led to the gold medal performance, indicating potential for further advancements in AI capabilities [18].