强化学习
Search documents
ChatGPT架构师,刚发布了最新研究成果
量子位· 2025-09-30 12:22
Core Insights - The article discusses the latest research from Thingking Machines on an efficient fine-tuning method called LoRA, co-authored by John Schulman, a co-founder of OpenAI [1][3][27]. Group 1: Research Findings - The research titled "LoRA Without Regret" explores the conditions under which LoRA can match the efficiency of full fine-tuning (FullFT) and provides a simplified approach to reduce the difficulty of hyperparameter tuning [3][7]. - Current large models often have trillions of parameters and are trained on vast datasets, but downstream tasks typically require only small datasets focused on specific domains [6]. - LoRA, as a parameter-efficient fine-tuning method, captures fine-tuning information through low-rank matrices, and the research confirms that LoRA can achieve similar performance to FullFT by focusing on key details [7][12]. Group 2: Performance Comparisons - The optimal learning rate for LoRA is found to be ten times that of FullFT, demonstrating its capability to compete effectively in fine-tuning scenarios with medium to small datasets [9][12]. - Experiments using Llama 3 and Qwen3 models on specific datasets showed that high-rank LoRA's learning curves closely align with FullFT, with both exhibiting logarithmic decreases in loss values during training [10][11]. - In mathematical reasoning tasks, even with a rank of 1, LoRA's performance remains comparable to FullFT, highlighting its efficiency in information absorption during training [13][14]. Group 3: Application Insights - The research emphasizes that applying LoRA across all layers of a model, rather than just focusing on attention layers, is crucial for maximizing its performance [15][19]. - Previous studies often limited LoRA's application to attention matrices, but this research indicates that broader application leads to significant performance improvements [16][19]. - The findings suggest that the dominant gradient control lies with layers that have more parameters, necessitating full-layer coverage for LoRA to approach FullFT performance [21]. Group 4: Hyperparameter Tuning - The research team proposes a simplified approach to reduce the complexity of tuning LoRA's hyperparameters, identifying that the optimal learning rate consistently follows a specific pattern [22][25]. - Out of four potential hyperparameters, two are deemed redundant, allowing users to focus on "initial update scale" and "steps of deviation from initial state" to streamline the tuning process [25][26]. - This simplification effectively reduces the tuning difficulty of LoRA by half, making it more accessible for users [26].
印奇的智驾千里路:浪漫可以,但别浪
Guan Cha Zhe Wang· 2025-09-30 09:49
Core Insights - Chongqing has welcomed a new local intelligent driving supplier, Qianli Technology, which aims to establish itself in the smart driving sector and has garnered significant attention from local government and industry leaders [1][3][6]. Group 1: Company Overview - Qianli Technology held a brand launch event on September 28, where it unveiled its brand identity and future plans, indicating strong local government support for its initiatives [3][6]. - The company has ambitious goals, as expressed by its CEO, who stated a desire to capture a significant market share in the intelligent driving sector [3][6]. - Qianli Technology's "Afari Plan" envisions a platform-level AI ecosystem integrating AI, vehicles, and robotics, aiming to expand into both household and industrial AI applications [3][7]. Group 2: Strategic Partnerships and Investments - Qianli Technology has attracted investment from Mercedes-Benz, which invested 1.3 billion RMB, marking a significant step in its international expansion efforts [6][22]. - The company is targeting overseas automotive manufacturers, having previously sought partnerships in Germany, indicating a strategic focus on global markets [6][22]. Group 3: Technological Development - Qianli Technology's current focus includes developing intelligent driving algorithms ranging from L2 to L4, with plans to release L3 by the end of 2025 and L4 by mid-2026 [9][12]. - The company is also working on a new generation of intelligent cockpit systems and aims to establish a comprehensive Robotaxi service within the next 18 months, targeting deployment in over 10 cities globally [9][12]. - The company emphasizes a pragmatic approach to technology, focusing on high "model content" in its intelligent driving solutions, with a goal to enhance this metric significantly in the coming months [14][18]. Group 4: Market Position and Competition - The intelligent driving market is becoming increasingly competitive, with Qianli Technology positioning itself as a strong contender despite being a newer player [6][22]. - The company recognizes the importance of both AI model development and engineering capabilities, suggesting a dual focus on innovation and practical application [22][27]. - With the penetration of L2+ level intelligent driving systems in new car sales exceeding 50% in China, there remains substantial market potential for suppliers to explore [27].
著名机器人专家:人型机器人的未来是不像人
3 6 Ke· 2025-09-30 08:43
Group 1 - The article discusses the challenges faced by humanoid robots in achieving dexterity despite significant investments from venture capital firms and large tech companies [2][3][5] - Humanoid robots are designed to mimic human body structures and perform tasks in human environments, with the goal of creating versatile robots capable of handling various jobs [5][6] - Companies like Tesla and Figure are optimistic about the economic potential of humanoid robots, with predictions of generating trillions in revenue, but the timeline for achieving human-level dexterity remains uncertain [6][7] Group 2 - The history of humanoid robot development spans over six decades, with significant contributions from various researchers and institutions, including early models from Waseda University and Honda [8][9] - Despite advancements, no humanoid robot has demonstrated significant dexterity comparable to human capabilities, and existing designs have not been successfully applied in practical industrial settings [20][21] - The article emphasizes the importance of tactile feedback and dexterity in humanoid robots, arguing that current training methods relying on visual data are insufficient for achieving the desired level of skill [23][24][44] Group 3 - The article critiques the reliance on "learning from demonstration" methods, highlighting the limitations of current approaches that do not incorporate tactile or force feedback [23][24][25] - Companies like Figure and Tesla are shifting towards training humanoid robots using first-person videos of humans performing tasks, betting on the effectiveness of visual learning [26][27] - The article concludes that achieving true dexterity in humanoid robots will require a deeper understanding of tactile perception and the integration of such feedback into training methodologies [44][45]
著名机器人专家:人型机器人的未来是不像人
阿尔法工场研究院· 2025-09-30 07:18
Core Viewpoint - Despite significant investments from venture capital firms and large tech companies, humanoid robots still struggle to achieve dexterity, which is essential for performing tasks in human environments [2][3][4]. Group 1: Historical Context of Humanoid Robots - The concept of humanoid robots has been explored for over 65 years, with early developments including a computer-controlled robotic arm capable of stacking blocks in 1961 [3]. - The evolution of humanoid robots has seen contributions from various institutions, including WABOT-1 from Waseda University in the 1970s and Honda's ASIMO in 2000 [11][12]. Group 2: Current State and Future Predictions - Humanoid robots are currently in the early stages of development, with Gartner indicating they have not yet reached their peak hype [4]. - Companies like Tesla and Figure are optimistic about the economic potential of humanoid robots, with predictions of creating trillions in revenue [9][10]. Group 3: Challenges in Dexterity - Achieving human-level dexterity in humanoid robots remains a significant challenge, as current robotic hands lack the necessary finesse and adaptability for a wide range of tasks [23][24]. - Existing methods for training robots often rely on visual demonstrations, which do not adequately capture the tactile feedback necessary for dexterous manipulation [27][28]. Group 4: Learning Approaches - The industry has seen a shift towards end-to-end learning methods, where robots learn from observing human actions, but this approach has limitations due to the lack of tactile feedback and precision [30][31]. - Successful applications of end-to-end learning in other fields, such as speech recognition and image labeling, highlight the importance of pre-processing and human-like structures in achieving effective learning outcomes [49][50]. Group 5: Importance of Tactile Feedback - Human dexterity is heavily reliant on rich tactile feedback, which current humanoid robots do not possess, leading to challenges in replicating human-like manipulation [51][52]. - The complexity of human touch perception and the integration of multiple body parts in dexterous tasks further complicate the development of humanoid robots capable of similar actions [52].
DeepSeek新模型降价:优化推理效率,API价格降超50%
YOUNG财经 漾财经· 2025-09-30 06:25
Core Insights - DeepSeek has launched the new DeepSeek-V3.2-Exp model, which significantly reduces API costs by over 50% [2][3][4] Group 1: Model Release and Features - The DeepSeek-V3.2-Exp model is an experimental version that builds on the previous V3.1-Terminus, introducing the DeepSeek Sparse Attention mechanism to enhance training and inference efficiency for long texts [3][4] - The new model maintains performance levels comparable to V3.1-Terminus across various public evaluation datasets, despite the introduction of the sparse attention mechanism [4] Group 2: Cost Reduction and Pricing - The introduction of the new model has led to a substantial decrease in service costs, with API pricing dropping by more than 50%. Specific price changes include input cache hits reduced from 0.5 yuan to 0.2 yuan per million tokens, cache misses from 4 yuan to 2 yuan per million tokens, and output costs from 12 yuan to 3 yuan per million tokens [4] Group 3: Research and Development - The development of the DeepSeek-V3.2-Exp model involved designing new GPU operators and utilizing the TileLang programming language for rapid prototyping, which supports deeper exploration of model capabilities [4] - DeepSeek's research on the DeepSeek-R1 model, which focuses on incentivizing reasoning capabilities in large language models through reinforcement learning, was featured on the cover of the prestigious journal Nature [7]
理想可能发i6战报,可能不发
理想TOP2· 2025-09-30 05:01
Core Viewpoint - The company is likely to release the i6 battle report, but there is a significant chance it may not, with a slightly higher probability leaning towards the release based on recent developments in the industry [1][3]. Group 1: Company Strategy and Market Position - The company is focused on attracting readers who appreciate the analytical value of its insights rather than those seeking non-public information [4]. - The actual operational strategy of the company is driven by the principle of challenging growth limits, which may lead to changes in its product definitions and market approach over time [4]. - The definition of "family car" is broadening, moving away from the previous narrow focus on vehicles suitable for transporting children under 12 years old [4]. Group 2: Product Expectations and Market Dynamics - The i6 is expected to perform significantly better than the L6 in terms of data, but direct comparisons may not be appropriate due to differing market conditions and expectations [5]. - The company is inclined not to release order or large order reports, primarily due to its direct sales model and high level of honesty, which limits the potential for presenting inflated data [4]. - If the data from the i6 proves to be exceptionally strong, there is a possibility that the company will release it to capitalize on the positive market response [4].
纯血VLA综述来啦!从VLM到扩散,再到强化学习方案
具身智能之心· 2025-09-30 04:00
Core Insights - The article discusses the evolution and potential of Vision Language Action (VLA) models in robotics, emphasizing their integration of perception, language understanding, and action generation to enhance robotic capabilities [11][17]. Group 1: Introduction and Background - Robotics has traditionally relied on pre-programmed instructions and control strategies, limiting their adaptability in dynamic environments [2][11]. - The emergence of VLA models marks a significant advancement in embodied intelligence, combining visual perception, language understanding, and executable actions into a unified framework [11][12]. Group 2: VLA Methodologies - VLA methods are categorized into four paradigms: autoregressive, diffusion, reinforcement learning, and hybrid/specialized methods, each with unique strategies and mechanisms [8][10]. - The article highlights the importance of high-quality datasets and realistic simulation platforms for the development and evaluation of VLA models [16][18]. Group 3: Challenges and Future Directions - Key challenges identified include data limitations, reasoning speed, and safety concerns, which need to be addressed to advance VLA models and general robotics [10][17]. - Future research directions focus on enhancing the robustness and generalization of VLA models in real-world applications, emphasizing the need for efficient training paradigms and safety assessments [44][47].
Z Event|SF Tech Week10.8硅谷线下会:为什么是现在?RL 的转折点与未来
Z Potentials· 2025-09-30 03:59
Core Insights - Reinforcement Learning (RL) is transitioning from a niche area to a critical component in advancing reasoning, decision-making, and complex scene interactions, especially as developments in Large Language Models (LLMs) reach a bottleneck [3] - The current moment is pivotal for the cross-disciplinary integration of RL, with academia, industry, and startups collaborating to move RL from research to practical applications [3] Event Details - An event is scheduled for October 8th at 6:30 PM in San Francisco, featuring top-tier guests from academia, industry, and entrepreneurship to discuss the future of RL [4] - Notable speakers include Zeng Dong from UCSB, Qifei Wang from DeepMind, Bill Zhu from Pokee AI, and others who are shaping the next generation of RL [6][7] Organizers and Community - The event is presented by Z Potentials in collaboration with HatTrick Capital and Future Builderz, focusing on supporting early-stage technology entrepreneurs and bridging the gap between research and industry [8][9] - HatTrick Capital is a Silicon Valley fund dedicated to backing new generation technology entrepreneurs, particularly in the AI sector [9] Networking Opportunities - The event will provide a relaxed networking atmosphere, allowing attendees from leading labs like OpenAI, Anthropic, DeepMind, and Meta to engage in deep discussions [10]
限时16.99万~21.59万元,别克至境L7正式上市
Zhong Guo Qi Che Bao Wang· 2025-09-30 02:38
Core Insights - SAIC-GM Buick has officially launched the flagship sedan, the Zhijing L7, with a limited-time price range of 169,900 to 215,900 yuan, targeting the high-end new energy vehicle market [1][2] Pricing and Models - The Zhijing L7 is available in five variants, with the official guide price and limited-time price detailed in a table format [2] - Users can enjoy up to 53,000 yuan in launch benefits and additional cash and upgrade gifts by placing orders before specified deadlines [2] Technology and Performance - The Zhijing L7 features the "Zhenlong" range extender system, providing long-range capabilities and low energy consumption, addressing common industry pain points [4][6] - It boasts a 252 kW range extender single electric drive, equivalent to a 3.0T V6 engine, and achieves a combined fuel consumption as low as 0.5L per 100 km [6] - The vehicle accelerates from 0 to 100 km/h in just 5.9 seconds and maintains performance consistency between charged and uncharged states [6] Battery and Safety - The L7 utilizes the newly developed high-performance Auton 2.0 hybrid battery, offering enhanced safety features and a lifespan of 640,000 km [8] - The battery has undergone rigorous testing, exceeding national standards, and has a track record of 1.6 billion kilometers without self-ignition [8] Intelligent Driving Features - The vehicle is equipped with the "Xiaoyao Zhixing" driver assistance system, featuring the globally first Momenta R6 flywheel model based on end-to-end reinforcement learning [9][12] - It includes advanced features for urban navigation and parking assistance, significantly enhancing user experience [12][14] Interior and Comfort - The Zhijing L7 offers a luxurious interior with high-quality materials and advanced technology, including a 50-inch AR-HUD and a dual-screen design for the driver [19][21] - The vehicle is designed for comfort, featuring multi-functional seats and a high-end sound system with 27 speakers [24][26] Testing and Quality Assurance - The L7 has undergone extensive testing, with over 60 collision tests and a durability testing mileage of nearly 6.5 million kilometers, ensuring high-quality standards [30]
别克至境 L7 正式上市 限时价16.99万 ~21.59万元
Cai Jing Wang· 2025-09-29 23:00
Core Viewpoint - SAIC-GM Buick brand has officially launched the Zhijing L7, offering five models priced between 169,900 to 215,900 yuan, featuring advanced hybrid technology and impressive performance metrics [1][3]. Group 1: Vehicle Specifications - The Zhijing L7 is equipped with the "Zhenlong" range extension system, featuring a 252 kW single electric drive, providing power equivalent to a 3.0T V6 engine [1]. - It includes the industry's strongest 1.5T hybrid dedicated engine, paired with a generator that has a peak power of 100 kW, achieving a combined fuel consumption as low as 0.5L per 100 km [1]. - The vehicle accelerates from 0 to 100 km/h in just 5.9 seconds and has a 302 km pure electric range, with a total range of 1420 km, meeting the needs of over 90% of users for urban commuting [3]. Group 2: Charging and Driving Assistance - The "Zhenlong" system supports the fastest charging in its class at 130 kW, allowing for a 30% to 80% charge in just 18 minutes [3]. - The Zhijing L7 features the Buick "Xiaoyao Zhixing" driver assistance system, which integrates the Momenta R6 flywheel model, providing advanced driving assistance capabilities [3][4]. - The Momenta R6 model utilizes cutting-edge reinforcement learning technology, enabling the vehicle to handle complex driving scenarios smoothly and safely [3]. Group 3: Parking and Technology - The vehicle offers comprehensive parking assistance for various scenarios, including standard, narrow, mechanical, and vertical/horizontal parking, alleviating parking anxiety [4]. - It is equipped with Qualcomm's latest SA8775P chip, providing 72 TOPS of AI computing power for an immersive and natural interaction experience [4]. - The Zhijing L7 features a spacious body design with dimensions of 5032 mm x 1952 mm x 1500 mm, showcasing a luxurious C-class sedan presence [6]. Group 4: Comfort and Design - The vehicle incorporates advanced NVH technology, frameless doors, and high-end lighting features, enhancing its luxurious appeal [6][7]. - It utilizes a sophisticated suspension system with a front double wishbone and rear five-link structure, significantly improving ride comfort and stability [7].