Workflow
Large Language Model Fine-tuning
icon
Search documents
开发者狂喜:Thinking Machines发布首款产品Tinker,后训练麻烦全给包了
机器之心· 2025-10-02 03:12
机器之心报道 机器之心编辑部 对于大模型开发者 / 研究者来说,今天是重要的一天。 因为刚刚,OpenAI 前 CTO Mira Murati 创办的 Thinking Machines 推出了首款产品 ——Tinker。 简单来说,Tinker 是一个 API,用于帮开发者 / 研究人员微调语言模型。重要的是,在此过程中, 你只需要专注于训练数据和算法,而你不擅长的关于 Infra 的部 分 —— 调度、调优、资源管理和 Infra 可靠性 —— 统统由 Tinker 来搞定 ,这将大大简化 LLM 的后训练过程。 | You focus on 您关注 | You write 您编写 | We handle 我们处理 | | --- | --- | --- | | I Datasets and RL | | 4 Efficient distributed | | environments | Simple Python script | training of large models | | ■ 数据集和强化学习环境 | 简单的 Python 脚本 | ♦ 高效的大规模模型分布式 | | Your cus ...
基于开源Qwen2.5-VL实现自动驾驶VLM微调
自动驾驶之心· 2025-09-29 23:33
Core Viewpoint - The article discusses the development and application of LLaMA Factory, an open-source low-code framework for fine-tuning large models, particularly in the context of autonomous driving and visual-language models (VLM) [1][2]. Group 1: LLaMA Factory Overview - LLaMA Factory integrates widely used fine-tuning techniques and has become one of the most popular frameworks in the open-source community, with over 40,000 stars on GitHub [1]. - The framework is designed to train models like Qwen2.5-VL-7B-Instruct, which can provide traffic situation assessments through natural language interactions [1]. Group 2: Qwen2.5-VL Model - Qwen2.5-VL is the flagship model in the Qwen visual-language series, achieving significant breakthroughs in visual recognition, object localization, document parsing, and long video understanding [2]. - The model supports dynamic resolution processing and absolute time encoding, allowing it to handle images of various sizes and videos lasting several hours [2]. - It offers three model sizes, with the flagship Qwen2.5-VL-72B performing comparably to advanced models like GPT-4o and Claude 3.5 Sonnet [2]. Group 3: CoVLA Dataset - CoVLA (Comprehensive Vision-Language-Action) is a dataset designed for autonomous driving, containing 10,000 real driving scenes and over 80 hours of video [3]. - The dataset utilizes scalable methods to generate precise driving trajectories from raw sensor data, accompanied by detailed natural language descriptions [3]. - CoVLA surpasses existing datasets in scale and annotation richness, providing a comprehensive platform for training and evaluating visual-language-action models [3]. Group 4: Model and Dataset Installation - Instructions are provided for downloading and installing LLaMA Factory and the Qwen2.5-VL model, including commands for cloning the repository and installing necessary packages [4][5][6]. - The article emphasizes the importance of configuring local paths for images and datasets to ensure proper functionality [7][13]. Group 5: Fine-tuning Process - The fine-tuning process is tracked using SwanLab, an open-source tool for visualizing AI model training [11]. - After fine-tuning, the model's performance is evaluated through a web UI, allowing users to interact with the model and assess its responses to various queries related to autonomous driving [20][21]. - The article notes that the fine-tuned model provides more relevant answers compared to the original model, which may produce less focused responses [22].