苹果光速撤回RLAX论文:用了谷歌TPU和阿里Qwen,作者中还有庞若鸣
AppleApple(US:AAPL) 机器之心·2025-12-13 01:13

Core Viewpoint - The article discusses Apple's recently withdrawn paper on a scalable reinforcement learning framework called RLAX, which utilizes Google's TPU and other cloud services, highlighting the company's engineering capabilities in AI infrastructure despite recent personnel changes [1][35]. Group 1: Paper Overview - The paper titled "RLAX: Large-Scale, Distributed Reinforcement Learning for Large Language Models on TPUs" was submitted on December 6 and quickly withdrawn after being made public [1][7]. - RLAX is designed for efficient execution of advanced reinforcement learning algorithms on large-scale distributed TPU clusters [12]. Group 2: Technical Contributions - RLAX employs a parameter-server architecture, allowing for logical separation of training, inference, and validation components, which enhances resource allocation flexibility [14]. - The framework supports preemptive scheduling, enabling immediate resource recovery for higher-priority tasks without crashing the training process [15]. - RLAX addresses key challenges in post-training reinforcement learning, offering programmable configuration options for managing on-policy and off-policy RL [16]. Group 3: Experimental Results - During experiments, RLAX improved the pass@8 accuracy of the QwQ-32B model by 12.8% in just 12 hours and 48 minutes using 1024 TPU v5p [24]. - The framework's development involved using Google's TPU, Amazon's AWS Lambda for testing, and a Chinese open-source model, showcasing a collaborative approach across different technologies [26]. Group 4: Author Background - The paper lists several authors, including Kelvin Zou, who has transitioned to Meta, and Cheng Leong, a long-time Apple employee, indicating a shift in talent within the AI sector [8][9].