多模态训练

Search documents
硅谷改朝换代
Hu Xiu· 2025-08-05 01:40
两三年后,我也像许多同行一样,背起行囊,来到这个距离硅谷最遥远却最痴迷硅谷的城市。 那时的北京,是个野心外溢的地方。中关村一带的咖啡馆常常坐满了人,风投与创业者肩并肩地低声讨 论,时而摊开笔记本电脑演示个产品雏形,时而在纸巾上草拟一份股权结构。空气里夹杂着刚编译完的 代码气息与炒热过的估值焦虑。 那是个记者可以直接拨通创业者手机的年代。一些如今再也联络不到的大佬,那时还会主动请记者吃 饭,滔滔不绝地讲述他们的梦想。有时,讲得还太多了些。 你不知道第二天的发布会上,这家新公司会不会像新浪搜狐一样大获成功,也不知道一顿午餐能不能让 你预见一个产品或技术潮流的开端。 二十多年前,我从中国西部一所大学的新闻专业毕业,和大多数同学一样,进了地方报社,写稿、改 稿、熬夜,很快熟悉了中国新闻行业最底层的节奏。 但北京的吸引力始终存在。 2000年的互联网泡沫破灭,也没有熄灭信仰。我们一边目睹泡沫破裂带来的裁员和倒闭,一边又亲历了 移动互联网重新点燃的火焰。 曾几何时,还处于青春期的Facebook和谷歌,正在取代像Sun Microsystems这样的老牌公司,成为最酷 的工作场所。早会是在彩虹色豆袋椅上进行的,午餐是免费 ...
DeepSeek同款GRPO训练大提速!魔搭开源全流程方案,支持多模态训练、训练加速和评测全链路
量子位· 2025-03-09 04:45
Core Viewpoint - The article discusses the advancements in GRPO training tools from ModelScope, highlighting the introduction of the SWIFT framework and its optimizations to enhance training efficiency and stability in reinforcement learning models [1][10]. Group 1: GRPO Training Enhancements - GRPO training is based on an improved PPO algorithm, focusing on sampling principles to simplify the value model, thereby increasing training stability and maintainability [1]. - The SWIFT framework has been optimized for GRPO training, addressing challenges such as low training speed and complex cluster configurations [3][10]. - The introduction of asynchronous sampling allows for simultaneous sampling and training, significantly reducing training time compared to synchronous methods [4][5]. Group 2: Sampling Efficiency - The sampling time in GRPO training is a critical factor, with single-instance sampling often insufficient for larger models [3]. - By allowing multiple instances for data parallel sampling, the SWIFT framework can effectively allocate resources, improving sampling efficiency [3]. - Experiments show that using asynchronous sampling can reduce training time to about two-thirds compared to synchronous sampling [5]. Group 3: Multi-Round Updates - Multi-round updates enable the reuse of sampled data across multiple iterations, balancing resource allocation between sampling and training [11][12]. - Setting the number of iterations for updates can significantly enhance training speed without adversely affecting model performance [11][14]. Group 4: Performance Comparison - In comparative tests, the SWIFT framework demonstrated a training time of approximately 120 seconds per step, outperforming other frameworks like veRL and TRL [18]. - The integration of various acceleration techniques within SWIFT has led to significant improvements in training efficiency for GRPO in medium and small clusters [18]. Group 5: Multi-Modal GRPO Training - The SWIFT framework supports multi-modal GRPO training, accommodating various data types such as images, videos, and audio [20]. - The framework has been tested with the CLEVR-70k-Counting dataset, achieving high accuracy in multi-modal tasks [20][22]. Group 6: Evaluation Framework - EvalScope is introduced as a comprehensive evaluation tool for large models, providing performance assessment and visualization capabilities [23]. - The framework addresses issues of underthinking and overthinking in reasoning models, enhancing their efficiency in generating correct answers [23][27]. Group 7: Conclusion and Future Directions - SWIFT aims to provide a differentiated technical approach for developers in RL training, with ongoing support for various training domains [26][27]. - Future explorations will focus on reasoning models' thinking efficiency and the emerging paradigm of multi-modal reasoning [27].