Workflow
监督学习
icon
Search documents
理想智驾二级部门数量从3个调整为11个是次要矛盾
理想TOP2· 2025-09-22 16:56
申明: 本文是一篇推理文,推理前提是认为以下2个推论成立: 推论1: 李想之于理想辅助驾驶作用可以高度类比马斯克之于特斯拉辅助驾驶。(3个核心作用 1.做 大资源 2.保证资源持续投入 3.具备 理解AI底层原理与直接参与公司AI技术讨论的能力的前提下,对 公司长期发展方向与技术路线下关键think different判断并执行。) 推论2: 理想智驾发展主要矛盾是 全球AI产业发展阶段/ 理想各类生产要素匹配度/ 李想(其实就是 天时地利人和) 这2个推论第一性上显然不是必然成立的,故强烈推荐读者充分批判性看待这2个推论,充分默认这2 个推论有可能不成立。 如果这两个推论成立,引申3个观点: 观点1: 理想智驾二级部门数量从3个调整为11个是理想各类生产要素匹配子类下的次要矛盾。 观点2: 不管理想智驾二级部门具体如何变,由于迭代方向过于明确,理想智驾接下来1-12个月有多 次高质量快速迭代是高概率事件。 所有的老板都有前两个作用,大部分技术负责人具备 理解AI底层原理与直接参与公司AI技术讨论的 能力,很少量技术负责人具备对AI技术路线下关键think different的能力,具备这个能力的老板也很 少 ...
生成式视角重塑监督学习!标签不只是答案,更是学习指南 | ICML 2025
量子位· 2025-06-24 13:36
Core Viewpoint - A new paradigm in supervised learning called Predictive Consistency Learning (PCL) is introduced, which redefines the role of labels as auxiliary references rather than just standard answers for comparison [1][5]. Group 1: Training Process Overview - PCL aims to capture complex label representations by progressively decomposing label information, allowing the model to predict complete labels with partial label hints [5][6]. - The training process involves mapping noisy labels back to true labels, with noise levels controlled by time steps, ensuring predictions remain consistent across different noise levels [7][8]. Group 2: Noise Process - The noise process for discrete labels is modeled using a categorical distribution, while continuous labels follow a Gaussian diffusion model, introducing noise progressively [9][11]. - In cases where labels are too complex, PCL introduces Gaussian noise directly into the latent embedding space, aligning with the continuous label noise process [11]. Group 3: Testing Process Overview - After training, the model can efficiently predict by sampling from a random noise distribution, achieving results that surpass traditional supervised learning even without label hints [14][28]. - A multi-step inference strategy is employed to refine predictions, where previous predictions are perturbed with noise to serve as hints for subsequent predictions [14][28]. Group 4: Information Theory Perspective - PCL proposes a structured learning process that gradually captures information, allowing the model to learn from noisy labels while minimizing dependency on them [15][18]. - The model's goal is to minimize noise condition dependence, ensuring predictions remain consistent across varying noise levels [19]. Group 5: Experimental Results - PCL demonstrates significant improvements in prediction accuracy across various tasks, including image segmentation, graph-based predictions, and language modeling, compared to traditional supervised learning [20][25][30]. - In image segmentation, PCL outperforms traditional methods in single-step predictions and continues to improve with additional prediction steps [22][28]. - The results indicate that while more inference steps can enhance detail capture, they also risk error accumulation, necessitating a balance in the number of steps [26][28].
微软副总裁X上「开课」,连更关于RL的一切,LLM从业者必读
机器之心· 2025-05-26 01:28
Core Viewpoint - The article discusses the educational series on artificial intelligence initiated by Nando de Freitas, focusing on reinforcement learning (RL) and its applications in large language models (LLMs) [1][2]. Summary by Sections Introduction to AI Education - Nando de Freitas aims to educate readers on AI through a series of posts on X, starting with reinforcement learning and gradually covering diffusion and flow matching technologies [1][2]. Learning Types - The article highlights that there is no ultimate conclusion on unsupervised learning, supervised learning, and reinforcement learning [8][19]. - Supervised learning is described as basic imitation, requiring high-quality expert data for effective learning [9]. - Reinforcement learning focuses on selective imitation, allowing agents to learn from suboptimal experiences and improve their performance [10][11]. Distributed Reinforcement Learning Systems - Modern distributed RL systems consist of two main components: Actors and Learners, where Actors interact with the environment and collect data, while Learners update the policy network based on this data [23][24]. - The importance of measuring operational durations and communication bandwidth in such systems is emphasized [24][27]. Offline Reinforcement Learning - Offline RL has unique value in scenarios like post-training LLMs, where it can leverage historical data for learning [28][29]. Single-step and Multi-step RL - The article differentiates between single-step and multi-step RL problems, with single-step focusing on immediate actions and multi-step involving planning over a series of interactions [35][39]. - The complexity of multi-step RL is noted, particularly in credit assignment issues where multiple decisions affect outcomes [40][41]. Policy Gradient and Techniques - Policy gradient methods are discussed, including the use of baseline subtraction to reduce variance in reward signals [49][56]. - The article also covers the significance of KL divergence in maintaining proximity to supervised fine-tuning strategies during post-training [69]. Importance Sampling and PPO - Importance sampling is introduced as a method to correct off-policy sample bias, with Proximal Policy Optimization (PPO) being a key technique to manage policy updates [73][78]. - The integration of various techniques in training models like DeepSeek-R1 is highlighted, showcasing the complexity of modern RL systems [81]. Future Directions - Freitas plans to expand the discussion from single-step to multi-step RL, indicating ongoing developments in the field [82].
被拒稿11年后翻盘获时间检验奖,DSN作者谢赛宁:拒稿≠学术死刑
量子位· 2025-05-06 04:24
"恭喜!""当之无愧!" AISTATS官宣其获奖的推文下面,业界大佬齐聚,一片祝贺之声。 当初,这篇论文被AISTATS接收。 然而在谢赛宁本人的转发推文中,我们知道另一重内幕—— 衡宇 发自 凹非寺 量子位 | 公众号 QbitAI 谢赛宁十年前被NeurIPS (当时还叫NIPS) 拒收的论文,刚在今年获得了AISTATS 2025年度时间检验奖。 这篇论文就是《Deeply-Supervised Nets》 (DSN,深度监督网络) ,2014年9月挂上arXiv。 时间匆匆,十一年过去,属于是真·时间检验了。 它提出的中间层监督思想被谢赛宁后续作品REPA (Representation Alignment) 和U-REPA (U-Net Representation Alignment) 等继 承并发展,展示出从单一模型优化到跨模型知识迁移的演进。 而后两者在深度学习、扩散模型深化发展的这两年间,影响颇深。 这篇论文最初投稿给NeurIPS。虽然拿下8/8/7高分,但仍然被该顶会拒绝了。 他表示: 那次挫折一直萦绕在我心头,困扰着我…… 十一年前,拿到8/8/7高分却被拒 补充下背景信息—— 《D ...