泛化能力

Search documents
对比之后,VLA的成熟度远高于世界模型...
自动驾驶之心· 2025-09-26 16:03
作者 | 周彦武 来源 | 佐思汽车研究 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 首先需要指出VLA和世界模型都是端到端的一种,尽管很多人都认为一段式端到端比分段式优秀,但无论是产业界还是学术界,90%以上都是分段式端到端,纯 粹的VLA和世界模型非常罕见。 代表VLA阵营出战的是高德地图的 模型,地平线的SENNA模型,还有加州大学洛杉矶分校的AutoVLA。代表世界模型出战的有和特斯拉中国 FSD很接近的上海AI实验室的GenAD模型,做重卡自动驾驶的中科慧拓的GenAD模型,华为和浙江大学合作的Drive-OccWorld,还有理想汽车的World4Drive,理 想汽车尽管推崇VLA,但对世界模型的研究水平也是极高的。 | 模型名称 | L2平均距离(米) | 3秒平均碰撞率 | 备注 | | --- | --- | --- | --- | | AutoDrive-R2 | 0.19 | | 70亿参数版 | | AutoDrive-R2 | 0.49 | | 30亿参数版 | | SENNA | 0.22 | 0.08% | 加入自车状态 ...
从 SEAL 自适应学习到 DFT 奖励矫正,LLM 泛化能力的实质提升又有多少?
机器之心· 2025-09-07 01:30
Core Insights - The article discusses the challenges and advancements in the generalization capabilities of large language models (LLMs), highlighting various strategies to improve these capabilities, such as adaptive fine-tuning and dynamic gradient adjustment [7][11]. Group 1: Generalization in LLMs - Generalization in AI refers to a model's ability to apply learned knowledge to new, unseen scenarios, distinguishing it from mere memorization of training data [8]. - Recent studies indicate that as the complexity and scale of models increase, the understanding of "generalization" is being questioned, with some suggesting it may be a form of data memorization rather than true abstraction [9][10]. - Research shows that while increasing model size can enhance performance on reasoning tasks, it may also lead to stronger memorization of factual knowledge, raising concerns about the true nature of generalization [9][10]. Group 2: CoT Reasoning and Its Limitations - Chain-of-Thought (CoT) reasoning has been criticized for its fragility, as performance drops significantly when tested outside the training distribution, suggesting reliance on memory rather than genuine logical reasoning [10]. - Some experts argue that what is perceived as generalization may simply be the result of training data sufficiently covering the test scenarios, challenging the notion of true generalization [10]. Group 3: Research Trends and Focus Areas - The volume of research related to LLMs has surged, with a nearly sixfold increase in relevant studies from 2022 to 2025, particularly focusing on reasoning, generalization, and model safety [11]. - Recent research has shifted from merely examining data distribution and model size to exploring training strategies, model update mechanisms, and data design to enhance generalization capabilities [11].
深度|OpenAI联创:GPT-5的突破在于智能开始触及真正的深度认知领域;理想状态应该是默认使用我们的自动选择,而非手动配置
Z Potentials· 2025-09-06 04:40
Core Insights - OpenAI has released GPT-5 and GPT-OSS, marking significant advancements in AI technology and accessibility [4][3] - GPT-5 is the first hybrid model, designed to enhance user experience by automatically selecting model architectures [5][6] - The evolution of OpenAI's reasoning capabilities has transitioned from simple next-token prediction to more complex reasoning paradigms [9][10] Group 1: OpenAI's Technological Advancements - The release of GPT-5 and GPT-OSS has seen millions of downloads within days, showcasing the demand for these technologies [4] - GPT-5's breakthrough lies in its ability to engage in deep cognitive tasks, surpassing the limitations of its predecessor, GPT-4 [24][25] - The model's training has shifted from a one-time training approach to a more iterative reasoning-training cycle, enhancing its learning efficiency [9][10] Group 2: Learning Mechanisms and Challenges - OpenAI emphasizes the importance of real-world experience for models to develop generalization capabilities, highlighting the limitations of purely theoretical training [6][15] - The company is exploring the potential of real-time online learning, aiming to allow models to adapt continuously during operation [10][11] - Current bottlenecks in AI development are primarily related to computational power, which is essential for enhancing model capabilities [11][12] Group 3: Future Directions and Applications - OpenAI is focused on creating models that can assist in complex problem-solving, with applications in various fields, including mathematics and biology [25][22] - The company aims to improve the integration of AI into real-world applications, ensuring that models can handle the complexities of diverse environments [27][30] - OpenAI's vision includes making AI technology accessible to a broader audience, with plans for aggressive pricing strategies to enhance adoption [39][40]
探究下VLA模型泛化差的原因......
具身智能之心· 2025-08-20 00:03
Core Insights - The article discusses the limitations of generalist robot policies in terms of their generalization capabilities, particularly focusing on the issue of shortcut learning [2][5] - It identifies shortcut learning as a key factor hindering generalization, stemming from the reliance on task-irrelevant features [2] - The research highlights two main reasons for shortcut learning: limited diversity within individual sub-datasets and significant distribution differences between sub-datasets, leading to data fragmentation [2] Dataset Analysis - The study specifically examines the Open X-Embodiment (OXE) dataset, which is composed of multiple sub-datasets collected independently under different environments and robot forms [2][5] - The inherent structure of large-scale datasets like OXE contributes to the challenges in generalization due to the aforementioned issues of diversity and fragmentation [2] Recommendations - The findings provide important insights for improving data collection strategies for robots, aiming to reduce shortcut learning and enhance the generalization capabilities of generalist robot policies [2] - In scenarios where acquiring new large-scale data is impractical, the article confirms that carefully selected data augmentation strategies can effectively mitigate shortcut learning in existing offline datasets [2]
链式思维是幻象吗?从数据分布视角重新审视大模型推理,马斯克回复,Grok破防
机器之心· 2025-08-14 09:11
Core Viewpoint - The research suggests that Chain-of-Thought (CoT) reasoning in large language models (LLMs) may not represent true reasoning but rather a replication of patterns learned from training data, leading to fragility when faced with out-of-distribution tasks [2][10][37]. Data Distribution Perspective on CoT - The effectiveness of CoT is attributed to the "structured inductive bias" learned within the training distribution, indicating that the reasoning chains are merely reproductions of common patterns rather than genuine logical deductions [13][37]. - A theoretical framework is introduced to quantify the relationship between training and testing distributions, highlighting how distribution shifts can impact reasoning performance [15]. Experimental Findings on Generalization - In "task generalization," the model shows nearly 100% accuracy within the training distribution, but accuracy drops to 0.01% with slight distribution shifts, indicating a lack of true generalization [23]. - Supervised fine-tuning on a small amount of new data can restore performance, but this only expands the existing distribution boundaries without enhancing abstract generalization capabilities [24]. - In "length generalization," even minor changes in input sequence length significantly affect model performance, demonstrating a tendency to generate reasoning chains consistent with training lengths [26]. - The model is highly sensitive to format changes, with even minor alterations in input prompts leading to complete reasoning failures [28]. Universal Sensitivity to Distribution Shifts - The study finds that the sensitivity to distribution shifts is a common phenomenon across different sampling temperatures and model sizes, indicating that this issue is not isolated to specific models [31]. Practical Implications - In high-risk fields such as healthcare and finance, reliance on CoT for robust reasoning is cautioned against, as misleading reasoning chains can be more dangerous than outright incorrect answers [34]. - Current evaluation methods that depend on validation sets closely aligned with training distributions may overestimate model robustness, necessitating stricter out-of-distribution testing [35]. - While supervised fine-tuning can quickly enhance performance on specific tasks, it does not equip models with true abstract reasoning capabilities [36].
字节发布全新 VLA 模型,配套机器人化身家务小能手
Sou Hu Cai Jing· 2025-07-23 16:51
Core Insights - ByteDance's Seed team has launched a new VLA model, GR-3, which supports high generalization, long-range tasks, and flexible object manipulation with dual-arm operations [2][4] - The GR-3 model is designed to understand abstract language instructions and can efficiently adapt to new tasks with minimal human data, contrasting with previous models that required extensive training [2][7] - The accompanying robot, ByteMini, is a versatile dual-arm mobile robot specifically designed to work with the GR-3 model, featuring 22 degrees of freedom and advanced sensory capabilities [4][5] Model Features - GR-3 is characterized by its ability to perform complex tasks with high robustness and success rates, effectively following step-by-step human instructions [4][5] - The model utilizes a unique training method that combines data from remote-operated robots, human VR trajectory data, and publicly available visual-language data, enhancing its learning capabilities [7] - GR-3's architecture includes a 4 billion parameter end-to-end model that integrates visual-language and action generation modules [7] Performance Highlights - In tasks such as table organization, GR-3 demonstrates high success rates and can accurately interpret and respond to complex instructions, even when faced with invalid commands [4][5] - The model excels in collaborative dual-arm operations, effectively manipulating deformable objects and recognizing various clothing arrangements [5] - GR-3's generalization ability allows it to handle previously unseen objects and comprehend abstract concepts during tasks, showcasing its adaptability [5][7] Future Plans - The Seed team plans to expand the model's scale and training data while incorporating reinforcement learning methods to further enhance generalization capabilities [7] - Generalization is identified as a key metric for evaluating VLA models, crucial for enabling robots to adapt quickly to dynamic real-world scenarios [7]
Qwen&清华团队颠覆常识:大模型强化学习仅用20%关键token,比用全部token训练还好
量子位· 2025-06-05 10:28
Core Insights - The article discusses a recent breakthrough by the LeapLab team from Tsinghua University, revealing that only 20% of high-entropy tokens can significantly enhance the training effectiveness of large models in reinforcement learning, outperforming the use of all tokens [1][6]. Group 1: Research Findings - The team achieved new state-of-the-art (SOTA) records with the Qwen3-32B model, scoring 63.5 in AIME'24 and 56.7 in AIME'25, marking the highest scores for models with fewer than 600 billion parameters trained directly from the base model [2]. - The maximum response length was extended from 20k to 29k, resulting in a score increase to 68.1 in AIME'24 [4]. - The research challenges the classic Pareto principle, indicating that in large model reinforcement learning, 80% of low-entropy tokens can be discarded without detrimental effects, and may even have adverse impacts [5][6]. Group 2: Token Analysis - The study reveals a unique entropy distribution pattern during chain-of-thought reasoning, where over 50% of tokens have an entropy value below 0.01, while only 20% exceed 0.672 [9][10]. - High-entropy tokens serve as "logical connectors" in reasoning, while low-entropy tokens are often deterministic components, such as affixes or mathematical expressions [11]. - The team conducted experiments showing that increasing the temperature of high-entropy tokens improves reasoning performance, while lowering it decreases performance, underscoring the importance of maintaining high entropy in critical positions [13]. Group 3: Training Methodology - By focusing solely on the top 20% of high-entropy tokens during reinforcement learning training, the Qwen3-32B model saw significant performance improvements, with AIME'24 scores increasing by 7.71 points and AIME'25 by 11.04 points, alongside an average response length increase of approximately 1378 tokens [15][17]. - Similar performance enhancements were observed in the Qwen3-14B model, while the Qwen3-8B model maintained stable performance [16]. - Conversely, training with 80% low-entropy tokens led to a sharp decline in model performance, indicating their minimal contribution to reasoning capabilities [18]. Group 4: Implications and Generalization - The findings suggest that high-entropy tokens facilitate exploration of different reasoning paths, while low-entropy tokens may restrict this exploration due to their determinism [20]. - The advantages of training with high-entropy tokens become more pronounced with larger models, with the 32B model showing the most significant improvements [22]. - Models trained with high-entropy tokens also performed exceptionally well on out-of-domain tasks, indicating a potential link between high-entropy tokens and the model's generalization capabilities [22]. Group 5: Reinforcement Learning Insights - The research indicates that reinforcement learning with verifiable rewards (RLVR) does not completely overhaul the base model but rather fine-tunes it, maintaining a high overlap of 86.67% in high-entropy token positions even after extensive training [24][25]. - The study highlights that higher initial entropy in tokens correlates with greater increases in entropy during RLVR training, while low-entropy tokens remain largely unchanged [25]. - Discussions raised in the article suggest that high-entropy tokens may explain why reinforcement learning can generalize better than supervised fine-tuning, which tends to lead to memorization and overfitting [26][27].
机器人“孝子”解养老困局:技术路径已明,非人形态先行
Zhong Guo Jing Ying Bao· 2025-05-29 12:07
Core Viewpoint - The article discusses the potential of humanoid robots in addressing the growing elderly care needs in the context of an aging population, highlighting advancements in technology and the evolving landscape of the robotics industry [1][3][20]. Industry Overview - The aging population in China is rapidly increasing, with projections indicating that by the end of 2024, there will be 310 million people aged 60 and above, accounting for 22% of the total population [3][20]. - The concept of "elderly care robots" encompasses various forms of robots, including exoskeletons and humanoid robots, with a particular focus on humanoid robots in popular perception [4][21]. Technological Advancements - Recent breakthroughs in robotics include improvements in bionic joints, motion control algorithms, and cognitive decision-making frameworks, which are essential for the development of humanoid robots [1][6]. - The introduction of international standards for elderly care robots aims to guide the design, manufacturing, testing, and certification processes, promoting healthy industry development [7][9]. Market Dynamics - The market for humanoid robots is expected to grow significantly, with estimates suggesting that by 2035, the global market could reach $38 billion, and in China, the market could expand to 500 billion yuan [20][24]. - The current pricing of humanoid robots ranges from approximately 99,000 yuan to 199,000 yuan, with expectations that prices will decrease as technology matures [14][17]. Future Outlook - Experts predict that humanoid robots capable of providing companionship and care for the elderly may enter households within the next three to ten years, although some believe it could take longer [18][21]. - The industry is witnessing a shift towards consumer markets, with companies exploring opportunities in home care and rehabilitation, indicating a potential for growth in the elderly care robotics sector [22][23].
泛化性暴涨47%!首个意图检测奖励范式,AI工具爆炸时代意图识别新解法
机器之心· 2025-05-16 04:39
Core Viewpoint - The rapid development of large language models (LLMs) and the explosion of integrable tools have significantly enhanced the convenience of AI assistants in daily life, but the challenges of intent detection and generalization remain critical issues [1][2]. Group 1: Research and Methodology - Tencent's PCG social line research team has innovatively applied reinforcement learning (RL) methods, specifically the Group Relative Policy Optimization (GRPO) algorithm combined with Reward-based Curriculum Sampling (RCS), to improve intent detection tasks [2]. - The research demonstrated that models trained with RL exhibit significantly better generalization capabilities compared to those trained with supervised fine-tuning (SFT), particularly in handling unseen intents and cross-lingual tasks [4]. - The introduction of a thought process during RL training has been shown to enhance the model's generalization ability in complex intent detection tasks [5]. Group 2: Experimental Results - The experiments revealed that the GRPO method outperformed the SFT method in terms of generalization performance across various datasets, including MultiWOZ2.2 and a self-built Chinese dataset, TODAssistant [17]. - The GRPO method achieved comparable performance to SFT on the MultiWOZ2.2 dataset, indicating its effectiveness in intent detection tasks [14]. - The results from the experiments indicated that the GRPO method, when combined with RCS, further improved the model's accuracy, especially in the second phase of curriculum learning [19]. Group 3: Future Directions - The research team plans to explore more efficient online data filtering methods for the RCS approach in future work [24]. - There is an intention to investigate multi-intent recognition, as current experiments primarily focus on single-intent scenarios [25]. - The team aims to extend their research to more complex task-oriented dialogue tasks beyond intent recognition [26].