Workflow
两阶段训练策略
icon
Search documents
开源复现o3图像思考!快手让AI不再被动看图,模型自主生成代码调用工具
量子位· 2025-08-21 04:23
Kwai Keye 团队 投稿 量子位 | 公众号 QbitAI 在Openai 发布o3后,think with image功能得到了业界和学术界的广泛关注。 Kwai Keye团队提出 Thyme (Think Beyond Images) 的新范式,并围绕它构建了一整套技术方案。旨在突破现有方法的限制,赋予开源 模型一种更强大、更自主、功能更全面的"超越图像思考"的能力。 其主要贡献可以概括为以下几点: 提出了一个全新的多模态交互范式Thyme: 核心思想: 让多模态大模型不再局限于被动地"看图",而是能够主动地通过生成并执行代码,来调用各种工具完成复杂的图像处理和数学计 算。 功能丰富: 模型可以即时进行裁剪、旋转、缩放、对比度增强等多种图像操作,还能处理复杂的数学问题。 高度自主: 模型能自主判断何时需要使用工具、使用何种工具,并动态生成代码来执行,无需人工为特定任务进行干预。 设计了一套高效的两阶段训练策略 SFT + RL: 监督微调 (SFT) 阶段: 利用精心构建的约 50 万条高质量样本数据集,快速教会模型生成代码来执行各种操作。这个阶段仅需约 200 GPU 小时,性价比极高。 强化学习 ...
思维链监督和强化的图表推理,7B模型媲美闭源大尺寸模型
机器之心· 2025-08-01 04:23
Core Viewpoint - The article discusses the emergence of the Chart-R1 model developed by the DocTron team, which utilizes a chain-of-thought supervision and reinforcement learning approach to enhance chart reasoning capabilities, particularly in complex multi-step numerical reasoning tasks [2][20]. Innovation and Technical Breakthroughs - The Chart-R1 model introduces a novel procedural data synthesis technique that generates high-quality reasoning data, resulting in the creation of the ChartRQA dataset containing 258,000 multi-step reasoning samples, ensuring data diversity and authenticity [7][22]. - The model employs a unique two-stage training strategy that utilizes different datasets for each stage, preventing the degradation of the model's exploratory capabilities during reinforcement learning [10][22]. Experimental Results and Performance - Chart-R1 demonstrates superior performance across various public benchmark tests and the self-constructed ChartRQA dataset, outperforming existing chart domain methods and rivaling large closed-source models like GPT-4o and Claude-3.5 in multiple tasks [16][20]. - In complex chart reasoning tasks, while existing visual language models show significant performance drops, Chart-R1 maintains a consistently high level of performance, highlighting its effectiveness in complex reasoning scenarios [17][20]. Research Significance and Application Prospects - The research not only achieves technical breakthroughs but also opens new avenues for chart understanding and reasoning, with potential applications in business intelligence analysis, scientific research data interpretation, and financial report analysis, significantly enhancing automated analysis efficiency [19][20]. - The success of Chart-R1 indicates that even models with relatively smaller parameter scales can achieve performance comparable to large closed-source models in specific domains, providing valuable insights for building efficient, domain-specific AI models and guiding future multi-modal reasoning research [20][21].