训练

Search documents
行进中国|机器人“小陶”上学记
Ren Min Wang· 2025-06-05 10:11
0:00 在这里,机器人需要学习的最大课题就是空间理解和交互。比如,用多大力气可以打开一个柜子?从哪 个角度可以把玩具放进筐里? "这些我们日常看起来平平无奇的操作,对机器人而言却是件难事,因为这些动作需要复杂的空间感知 和精细的运动控制。"群核科技首席科学家唐睿说。 可是,传统机器人训练面临巨大困难。以往,每个动作的训练都需要搭建真实场景,甚至需要研究人员 穿戴传感装置进行"手把手"教学。这种训练方式不仅成本高昂,效率也十分低下,一个简单动作往往需 要数月的反复练习。 "虚拟训练是最好的解法。"唐睿表示,作为全球最大的空间设计平台企业之一,群核科技平台积累了超 过3.62亿个3D模型,这些海量数据成为机器人学习的绝佳"教材"。 机器人在虚拟场景中练习抓取苹果。受访者供图 记者看到,"教材编写者"、机器人大脑工程师周楚寒正在电脑上给机器人出题:板上随机散落着各种形 状的垃圾,门把手被设置成数百种不同的形态,柜门随机打开成不同角度…… "仅仅一个杯子放在桌上的场景,我们就生成了200多万个变体,而且用时不到一天。"周楚寒介绍道。 "小陶"刚入学时表现笨拙,甚至无法避开地上的垃圾桶。经过系统训练后,现在的它面对各 ...
RL后训练步入超节点时代!华为黑科技榨干算力,一张卡干俩活
雷峰网· 2025-06-05 09:17
RL后训练已成为大模型性能突破的「杀手锏」,而算力浪费和集群效率低成为一大难题。这次,华为团队祭出 两大黑科技直接破局。不仅在CloudMatrix 384超节点实现MoE大模型训推共卡,资源利用率翻倍,还打破了 同步算法限制,让训练速度再提升50%。 作者丨李希 在大模型竞赛白热化的当下,「强化学习后训练」已成为突破 LLM 性能天花板的核心路径。 爆火出圈的 OpenAI o1 、 DeepSeek-R1 等模型,背后都是依靠 RL 后训练点石成金。 相较于预训练阶段的「广撒网」式知识获取, RL 后训练通过驱动模型与外部环境进行动态交互,直接塑 造了 LLM 在复杂任务中的推理效能。 当前, RL 后训练阶段已经吃掉了训练全流程 20% 的算力,未来会飙升到 50% ,直接影响模型的性能和 成本 。 在传统 RL 后训练中,训练和推理得排队干活,也就说大量算力都在「摸鱼」。 对此,华为团队拿出「 RL Fusion 训推共卡 」和「 StaleSync 准异步并行 」两大黑科技,把训练效率和 资源利用率拉满。 · RL Fusion : 让一张卡同时兼顾训练和推理两件事,资源利用率和吞吐翻 倍。 · S ...
Gemini2.5弯道超车背后的灵魂人物
Hu Xiu· 2025-06-05 03:14
《硅谷101》创始人泓君邀请了Energent.ai联合创始人Kimi Kong和HeyRevia创始人Shaun Wei,一起和两 位前Google的技术专家聊聊Gemini模型登顶背后的底层逻辑。 以下是这次对话内容的精选: 一、Gemini2.5崛起背后的底层逻辑 泓君:谷歌此次发布的Gemini 2.5 Pro,在当前各项评测中的数据都是所有大模型中最好的,Kimi你可 以分析一下它是如何做到的吗? 从去年在大会前夜被OpenAI的4o模型"精准狙击",到今年Gemini 2.5 Pro全面霸榜。短短一年时间, Gemini是如何完成从追赶者到领跑者的逆转? Kimi:我已经离开DeepMind快一年时间了,也不太清楚我的前同事们在这一年中又做了哪些新的创 新。但大语言模型训练根本的步骤是不变的,包括以下三点:Pre-training(预训练)、SFT(Supervised Fine-tuning,监督微调)和利用RLHF(基于人类反馈的强化学习)技术做的Alignment(对齐)。 大概在去年的NeurIPS(神经信息处理系统大会)上,业内已经普遍承认,公开网络数据基本都已经抓 完了,就像化石燃料已 ...
为什么力量训练又重新流行了?
3 6 Ke· 2025-06-05 02:54
Core Insights - Strength training is experiencing a resurgence in popularity, becoming a mainstream fitness trend rather than a niche activity [1][4][21] - The shift in fitness culture is evident as more individuals, including women, are engaging in strength training, with a notable increase in participation and acceptance [5][16][21] Industry Trends - Social media platforms are driving the popularity of strength training, with significant growth in related content, such as "female strength training" notes on Xiaohongshu increasing by over 150% in a year [1] - Fitness facilities are adapting to this trend, with more training studios opening and traditional gyms incorporating strength training equipment into their offerings [1][5] Brand Strategies - Major brands like lululemon, adidas, and Nike are expanding their product lines to include strength training apparel and equipment, indicating a shift towards making strength training a part of everyday life [4][10][19] - The introduction of specialized training programs and equipment, such as Nike's "Strength Training Studio" and adidas's redesigned footwear, reflects the growing emphasis on strength training as a key market segment [10][19] Consumer Behavior - The perception of strength training is evolving, with more individuals recognizing its benefits for overall health, emotional stability, and body control, moving beyond traditional views of it being solely for bodybuilders [14][16][21] - Home fitness is also adapting, with consumers increasingly investing in strength training equipment for personal use, such as dumbbells and kettlebells, as part of their home workout routines [7][8] Market Opportunities - The rise of strength training presents new opportunities for the fitness industry, with a focus on light strength training becoming a standard practice, particularly for beginners and women [22][26] - The global fitness trend rankings indicate that traditional strength training remains a core component, highlighting its sustained relevance in the market [22]
AI味道太浓?新型教培正在解决这件事
3 6 Ke· 2025-06-04 12:52
Core Insights - The article discusses the evolving role of AI trainers, particularly in the context of enhancing AI's ability to understand and express human emotions and values, moving beyond mere factual accuracy to a more nuanced interaction with users [1][10][12] Group 1: AI Training and Human Interaction - AI models are currently focused on improving their intelligence by mastering standard answers, but many real-world questions lack definitive answers, necessitating a deeper understanding of human preferences and emotions [2][5] - The emergence of AI trainers, particularly those with humanities backgrounds, signifies a shift towards training AI to better perceive and respond to complex human emotions and ethical dilemmas [6][10] - The role of AI trainers is evolving from basic data labeling to creating ethical guidelines and human-like responses, indicating a growing recognition of the importance of human values in AI development [8][10][13] Group 2: Challenges in AI Responses - AI struggles with sensitive topics, such as health issues, where responses can feel mechanical and lack empathy, highlighting the need for more human-like interaction [5][17] - Ethical dilemmas, such as the classic trolley problem, illustrate the complexity of programming AI to navigate moral boundaries, as there are no universally correct answers [4][16] - The challenge of using appropriate pronouns in AI responses reflects broader issues of inclusivity and sensitivity in AI communication, which are still under discussion [3][17] Group 3: The Future of AI Training - The demand for AI trainers with strong humanities backgrounds is increasing, as companies seek to bridge the gap between machine logic and human emotional understanding [10][11] - The concept of "post-training" is gaining traction, where AI is continuously improved through the integration of high-quality data and alignment with human values [9][10] - The emergence of specialized roles, such as "human-AI interaction trainers," indicates a trend towards creating more engaging and responsible AI systems [10][11]
昇腾+鲲鹏联手上大招!华为爆改MoE训练,吞吐再飙升20%,内存省70%
华尔街见闻· 2025-06-04 11:01
最近,华为在MoE训练系统方面,给出了MoE训练算子和内存优化新方案:三大核心算子全面 提速,系统吞吐再提20%,Selective R/S实现内存节省70%。 在通往更强大的 AI 路上, MoE 已成为科技巨头另一个首选路径。 只要 Scaling Law 没有失效,大模型的参数规模依旧不断扩大,由此 AI 智能水平才能不断攀升。 凭借独特的架构设计, MoE 正以前所未有的参数规模,成为突破大规模模型训练的算力瓶颈的关键 路径之一。 然而,如何将 MoE 潜力真正转化为高效的训练实践,一直是业界探索的难题。 此前,华为曾通过 Adaptive Pipe&EDPB 框架,实现了集群级高效分布式计算,让通信和计算能完 美并行,提高训练集群效率。 本次,华为通过昇腾与鲲鹏算力的深度协同,进一步实现了训练算子计算效率和内存利用率大幅提 升。 他们从单节点视角出发,深入到NPU和CPU内部,对算子计算、下发、训练内存使用等进行细粒 度拆解。 令人惊喜的是,结果显示, MOE 训练在之前的基础上,吞吐又提升了 20% ,内存占用降低了 70% 。 首先,硬件核心计算单元,如 Cube 利用率不足,存在冗余操作和可优 ...
昇腾+鲲鹏双核暴击!华为打通MoE训练任督二脉再加速20%,内存省70%
雷峰网· 2025-06-04 09:31
令人惊喜的是,结果显示, MOE 训练在之前的基础上,吞吐又提升了 20% ,内存占用降低了 70% 。 这不仅是一次技术突破,更是引领 MoE 训练的风向标。 " Pangu Ultra MoE 的每一项突破,都体现了华为在AI底层技术 与工程化落地中的领先实力。 " 作者丨李希 最近,华为在 MoE 训练系统方面,给出了 MoE 训练算子和内存优化新方案:三大核心算子全面提速, 系统吞吐再提 20% , Selective R/S 实现内存节省 70% 。 在通往更强大的 AI 路上, MoE 已成为科技巨头另一个首选路径。 只要 Scaling Law 没有失效,大模型的参数规模依旧不断扩大,由此 AI 智能水平才能不断攀升。 凭借独特的架构设计, MoE 正以前所未有的参数规模,成为突破大规模模型训练的算力瓶颈的关键路径 之一。 然而,如何将 MoE 潜力真正转化为高效的训练实践,一直是业界探索的难题。 此前,华为曾通过 Adaptive Pipe&EDPB 框架,实现了集群级高效分布式计算,让通信和计算能完美并 行,提高训练集群效率。 本次,华为通过昇腾与鲲鹏算力的深度协同,进一步实现了训练算子计算 ...
“复刻”幻方量化打造Deepseek 量化私募基金念空在大模型底层技术研发取得突破
经济观察报· 2025-06-03 11:17
随着AI大模型迭代升级,如今量化私募基金对AI大模型底层技 术的研发布局,日益侧重算法优化。在这个过程,产学研的结 合将是他们在大模型底层技术研发方面取得突破的"捷径"。 作者:陈植 封图:图虫创意 5月以来,全球大模型研发公司在大模型语义理解、多模态等方面的"较劲"悄然升级。 中国深度求索(DeepSeek)公司表示,DeepSeek R1模型已完成小版本升级,令大模型的思维深 度与推理能力显著提升。 国内量化私募基金念空科技与上海交通大学计算机学院开展合作,提出一种全新的大模型训练框架 (SASR),并发表论文投向全球顶级人工智能会议期刊NIPS。 念空科技创始人王啸在6月3日接受本报记者专访时表示,这项全新的大模型训练框架(SASR), 在GSM8K任务中,在仅使用1.5B模型的情况下,准确率就超过了80%,接近GPT-4o的表现;而在 KK逻辑推理任务中,其准确率比GPT-4o还高出约9个百分点。SASR让通用大模型变得更"聪明"。 他告诉记者,当前大模型技术的训练框架,主要围绕监督微调(SFT)和强化学习(RL),所谓监督微 调(SFT)就是不断给大模型输入资料和案例进行监督训练,相当于"刷题"; ...
“复刻”幻方量化打造Deepseek 量化私募基金念空在大模型底层技术研发取得突破
Jing Ji Guan Cha Wang· 2025-06-03 06:57
Core Insights - The competition among global large model development companies has intensified, particularly in semantic understanding and multimodal capabilities since May [2] - Domestic quantitative private equity funds are also entering the race, achieving breakthroughs in AI large model foundational technology [2][5] - A new training framework (SASR) proposed by NianKong Technology in collaboration with Shanghai Jiao Tong University has shown promising results, achieving over 80% accuracy on the GSM8K task with a 1.5B model, nearing GPT-4o's performance [2][4] Group 1: Training Framework and Algorithm Optimization - The current training frameworks for large models primarily focus on Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), with the challenge being to optimize the balance between these two methods [3][8] - The new training framework aims to dynamically adjust the relationship between SFT and RL, allowing the model to become "smarter" without increasing data volume [3][9] - The innovative training framework has been applied in quantitative investment strategy development, achieving approximately 80% market prediction accuracy compared to traditional models [4][13] Group 2: Industry Trends and Collaborations - Many quantitative private equity firms are establishing AI Labs to focus on foundational technology research for large models, emphasizing algorithm optimization [6][11] - The integration of academic research and private equity expertise is seen as a shortcut to breakthroughs in large model foundational technology [5][11] - The emergence of smarter large models with lower parameter counts but superior overall capabilities is attributed to innovations in training frameworks and algorithm optimization [10] Group 3: Future Directions and Challenges - The ability of large models to become "smarter" in various vertical fields depends on high-quality data and effective training modes [12] - NianKong Technology aims to empower large models to excel in more vertical fields, enhancing China's competitiveness in the global AI landscape [14]
挑战强化学习后训练霸权!全新无监督方法仅需1条数据+10步优化
量子位· 2025-06-01 03:40
Ubiquant团队 投稿 量子位 | 公众号 QbitAI 无需标注数据、无需繁琐奖励设计,只用10步就能见效—— 「熵最小化」或许比强化学习更适合大语言模型快速升级 。 强化学习(RL)近年来在大语言模型(LLM)的微调中大获成功,但高昂的数据标注成本、复杂的奖励设计和漫长的训练周期,成为制约RL 进一步应用的瓶颈。 Ubiquant研究团队提出了一种极为简单有效的无监督方法——One Shot熵最小化(Entropy Minimization,EM),仅用一条无标签数据, 训练10步内即可显著提升LLM性能,甚至超过使用成千上万数据的RL方法。 一、从RL到EM:LLM微调的困境与新思路 当前,大语言模型(LLM)在经过海量数据预训练后,展现出了惊人的通用能力。然而,要让模型在特定、复杂的推理任务(例如数学、物理 或编程)上达到顶尖水平,后训练(post-training)主流后训练方法是采用强化学习(RL),特别是结合可验证奖励的强化学习(RLVR)。 尽管基于RL的微调在提升模型性能上取得了显著进展,但其过程却面临着一系列明显的弊端,使得这种方法成本巨大且过程繁琐。 相较之下,熵最小化(EM)提出了 ...