强化学习
Search documents
大模型能力技术培训:让数据智能像水电 样简单
数巅科技· 2026-02-28 01:20
大模型能力技术培训 让数据智能像水电 样简单 语言模型发展历程 大语言模型:包含百亿或更多参数的语言模型 参考文献:https://arxiv.org/abs/2303.18223 • 上世纪90年代:语言模型出现,统计学方法,使用前面的词预测下一个词 • 2003年: Bengio 《A Neural Probabilistic Language Model》 ,首度将深度学习思想融入语言模型 • 2018年: Google提出Transformer神经网络架构, 并通过大量文本训练理解语言规则和模式 • 国外:GPT-3(175B) 、GPT-4 、PaLM(540B) 、Galactica 和 LLaMA 等 • 国内:ChatGLM、文心一言 、通义千问 、讯飞星火等 • 大语言模型和小语言模型(如GPT2)采用相似的架构和预训练任务,但是能力截然不同(涌现能力) • 涌现能力使得大语言模型只使用很少的样本就可以处理全新的任务 对技术领域的影响 对商业领域的影响 参考文献:https://arxiv.org/abs/2303.18223 • 自然语言处理:理解和生成文本,意图理解 、写文章 、 回答问 ...
华人天才出走xAI:算力竞赛已死,30美元解锁AI自进化
3 6 Ke· 2026-02-27 09:54
Core Insights - The departure of key members from the Grok team, including Jiayi Pan and Toby Pohlen, raises questions about the internal dynamics at xAI [1][3] - Jiayi Pan's journey from a novice to a core contributor at Grok 4 highlights a significant evolution in his expertise and approach to AI technology [4][7] Group 1: Jiayi Pan's Contributions - Jiayi Pan began his AI journey in 2019, studying computer science and electrical engineering at the University of Michigan, and graduated in 2023 [4] - He developed SWE-Gym, an environment that integrates reinforcement learning (RL) into software engineering, during his early projects at UC Berkeley [6] - Pan's work at xAI included optimizing the RL module for Grok 4, which advanced the model's capabilities from simple predictions to self-verification [7] Group 2: TinyZero Project - In 2025, Jiayi Pan announced the open-source TinyZero, a model with a training cost of only $30, achieving self-verification and reasoning capabilities through pure reinforcement learning [8][10] - TinyZero demonstrated significant improvements in task accuracy, with a model's performance on the Countdown task increasing from 0% to over 80% after RL training [9] - The project challenges the notion that advanced reasoning capabilities require massive infrastructure investments, as evidenced by the stalled Stargate project by Sam Altman [10] Group 3: Implications of TinyZero - TinyZero's self-correcting abilities, including generating intermediate thought processes during tasks, suggest a new frontier in AI development that does not rely on large-scale resources [12][15] - The combination of Jiayi Pan's projects indicates a potential for AI to not only correct itself but also to optimize its training processes, hinting at a form of "self-evolution" [16] - The emergence of affordable AI models capable of self-correction raises ethical and stability concerns, as the technology becomes accessible to a broader range of developers [17]
大/小/微模型赋能先进制造:实践与思考
大连理工大学机械工程学院· 2026-02-26 05:15
大/小/微模型赋能先进制造: 实 践与思考 Large/Small/Miero Al Models for Manufacturing (Al4M): App licatiansandlnsights 宋学官 大连理工大学机 械工程学院 2 一、 Al4M的背景意义 二、 Al4M的基础知识 三、 Al4M的研究进展 四、 Al4M的案例展示 五、 Al4M的瓶颈所在 六、 Al4M的科学问题 七、 Al4M的发展方向 八、 思 考与总结 汇 报 提纲 一、Al4M的背景意义 二、AI4M的基础知识 三、AJ4M的研究进展 四、AI4M的案例溪示 五、A14M的范颈所在 六、AJ4M的科学问题 七、AI4M的发展方向 八、思考与总结 Al4M的背景意义 5 口先进制造是指采用高新技术和先进设备来改善制造业过程和生产效率的统称,是 衡量一个国家科技发展水平的重要标志,关乎国民经济发展和国防安全建设。 Al4M的背景意义 6 口《中国制造2025》:加快推进制造业转型升级,到2035年整体达到世界制造强国中等水平 □2022年10月,美国发布《国家先进制造业战略》,先进制造业是美国经济和国家安全引擎 美国:Ind ...
让 Anthropic 破防的「蒸馏」风波,美国 AI 大牛泼冷水:中国 AI 成功不靠走捷径
Xin Lang Cai Jing· 2026-02-26 02:15
Anthropic 昨天点名 DeepSeek、月之暗面、MiniMax 三家中国 AI 实验室「蒸馏」Claude 模型,全网炸 锅。 对于此事件,RLHF (基于人类反馈的强化学习)领域最知名的研究者之一,《RLHF》一书的作者 Nathan Lambert 指出,这件事没有人们想象的那么严重,但也没有那么简单。 他认为,中国 AI 公司的基础设施非常好,取得了很多创新,也在攻克各种技术难题,但它们取得这样 的结果,靠的并不是「走捷径」。 在讨论蒸馏这件事之前,先看看 Lambert 的话为什么值得听。 Nathan Lambert 是 Allen AI 研究所的科学家,博士毕业于加州大学伯克利分校,师从机器人领域的著名 学者 Pieter Abbeel。他并非 RLHF 技术的发明者,但他写的《RLHF》这本开源书籍,如今是 AI 从业者 理解大模型训练流程的标准参考材料之一。 和到处都是的 AI 网红不一样,他是真正上手训练过大模型的人。 在 Anthropic 博客发出的当天,Lambert 就发布了一篇详细分析文章《蒸馏对于中国大模型到底有多重 要?》。他的核心论点,和主流媒体的解读方向截然不同, ...
华科博士联手清华教授 跑出100亿机器人黑马
2 1 Shi Ji Jing Ji Bao Dao· 2026-02-24 16:12
他联合清华助理教授高阳、工业机器人出海先行者郑灵茵,于2024年创立千寻智能。 骨干成员多来自UC Berkeley、清华、北大等顶尖学府,平均年龄不到30岁,已在多模态大模型、机器人学、强化学习等具身模型核心领域,有较深厚的 学术与工程基因。 团队自研具身大脑模型,打造轮式底盘加人形上身的"墨子机器人"。 2025年底,"小墨"正式投入宁德时代中州基地的新能源动力电池PACK生产线,负责电池接插件插接等复杂作业,稳定量产。 马年伊始,具身智能赛道又迎来一位百亿玩家。 2月24日,千寻智能宣布,近期连续完成两轮融资,累计金额近20亿元,估值突破100亿元大关。 本轮融资获得云锋基金、红杉中国等机构,以及TCL创投、重庆产业母基金等产业和国有资本加持,顺为资本、达晨财智等老股东大额追投。 创始人兼CEO韩峰涛,现年42岁,本科毕业于华中科技大学自动化学院,博士师从机器人学术泰斗丁汉院士,深耕工业机器人行业十余年。 2014年,他从国企辞职,与同事联合创办珞石机器人,主导交付了超2万台工业机器人,覆盖20多个行业场景。 再次创业,韩峰涛定下目标,"十年让全球10%的人拥有自己的机器人"。 据RoboChallen ...
技术指数级发展,可怕的是全世界竟无察觉
虎嗅APP· 2026-02-18 09:47
本文来自微信公众号: 腾讯科技 ,作者:晓静,编辑:徐青阳,原文标题:《技术指数级发展,可 怕的是全世界竟无察觉|Anthropic CEO最新访谈》,题图来自:视觉中国 "我90%确信,2035年前人类将迎来'数据中心里的天才国度'——甚至可能就在一两年内。" Anthropic CEO达里奥·阿莫代伊 (Dario Amodei) 说出这句话时,语气平静得像在预言明天的天 气。 但真正让他抓狂的不是技术进展太快,而是全世界竟然毫无察觉。在接受美国知名博客主持人德瓦克 什·帕特尔 (Dwarkesh Patel) 近150分钟的深度专访中,阿莫代伊反复强调一个观点: 我们离AGI 的终点比任何人想象的都要近,而公众还在讨论那些老掉牙的政治话题。 帕特尔:现在"规模扩展"的假设到底是什么?预训练的扩展定律大家都懂,但强化学习扩展好像没有 公开的规律可循。 阿莫代伊:我现在的假设跟2017年写《大计算块假说》 (The Big Blob of Compute Hypothesis) 时是一样的,与图灵奖得主、强化学习之父里奇·萨顿 (Rich Sutton) 的《苦涩的教训》 (The Bitter Lesso ...
“机器人春晚”的B面:我们在欢笑中,接受了新型的人机关系
3 6 Ke· 2026-02-17 08:28
1996年小品《机器人趣话》|图源:春晚 但此后三十年,春晚再也没有出现一款让机器人做绝对主角的现象级作品。 直到去年,机器人扭秧歌的节目《秧 Bot》,让机器人进入了大众视野,具身智能也成为过去一年最火热的话题和年度关键词。 当时的节目是一次技术亮相,到今年更像是一场系统展示。 1996 年,春晚舞台上抬上来一个巨大的橘皮箱子。 那是由冯小刚编剧、蔡明与郭达合作的小品《机器人趣话》。在那部作品里,中年单身汉郭达为了排解寂寞,购入了一款名为「菜花」的人形机器人。他 拿着遥控器,让机器人在「善解人意」与「热情奔放」间切换的设定。那些人机之间生硬的交互,引发全场爆笑。 而在2026年的春晚上,从小品到伴舞,从武术表演到广告植入,机器人已经快成春晚主角了,几乎渗透进整场晚会的多个板块,它们的存在感从未如此强 烈。 01 从「人演机器人」到「机器人演人」 如果说三十年的《机器人趣话》我们是在看「人演机器人」,那么今年,我们在《奶奶的最爱》里看到的是「机器人演人」。 小品剧情并不复杂,领先时代三十年的蔡明老师已步入老年,因孙子疏于回家,干脆买了一排机器人来给自己赛博养老。 结果真孙子回家后,发现机器人不仅能替奶奶端茶倒 ...
豆包大模型2.0重磅登场:多场景适配能力升级,成本降低助力复杂任务新突破
Sou Hu Cai Jing· 2026-02-14 14:33
在多模态能力建设方面,豆包2.0实现全面突破。该模型在视觉推理、空间感知及动态场景理解等维度达到国际领先水平,尤其在处理时间序列数据时展现 出显著优势。测试数据显示,豆包2.0 Pro在TVBench测评中超越同类模型,在EgoTempo基准测试中甚至超越人类平均水平,能够精准捕捉视频中的动作节 奏变化。针对长视频场景,该模型支持实时问答与环境感知,可自动完成健身指导、穿搭建议等交互任务,实现从被动响应到主动服务的模式转变。 针对复杂任务处理需求,新版本构建了差异化的模型体系。旗舰版豆包2.0 Pro深度优化推理引擎,在SuperGPQA知识测试中得分超越GPT 5.2,在 HealthBench医疗基准测试中登顶榜首。该模型在数学奥赛IMO、编程竞赛ICPC等权威评测中斩获金牌,工具调用准确率较前代提升40%。面向成本敏感场 景,Lite版本在保持综合性能超越1.8代的同时,将推理成本降低至行业平均水平的十分之一,特别适合大规模部署场景。Mini版本则针对低延迟需求优化, 支持每秒处理数千次并发请求。 编程领域迎来效率革新,豆包2.0 Code与TRAE开发平台深度整合。该模型强化了代码库解析能力,可自动识 ...
小马智行-W:港股上市打开全球化新篇章-20260214
HTSC· 2026-02-14 05:45
Investment Rating - The report assigns a "Buy" rating to the company with a target price of HKD 195 [5][9]. Core Insights - The company has achieved a significant milestone by realizing single-vehicle unit economics (UE) in Guangzhou, marking a turning point for the commercialization of Robotaxi services. The average daily revenue per vehicle is approximately HKD 299, indicating the feasibility of the business model [5][15]. - The company is positioned as a global leader in Level 4 (L4) autonomous driving, leveraging a robust technology foundation that includes multi-sensor fusion, world models, and automotive-grade hardware. This technological edge enhances its competitive advantage in the L4 autonomous driving sector [5][16]. - The company has established a diversified ecosystem for collaboration, which supports its global expansion capabilities. It has partnerships with major automotive manufacturers and technology providers, facilitating the development and commercialization of its Robotaxi and Robotruck services [5][18]. Financial Projections - Revenue is projected to grow from USD 75.03 million in 2024 to USD 327.18 million by 2027, reflecting a compound annual growth rate (CAGR) of 183.12% from 2026 to 2027 [4][11]. - The company is expected to achieve single-vehicle breakeven by 2026 and overall company breakeven by 2029, driven by operational efficiencies and scaling of its fleet [6][14]. - The report anticipates that the company's Robotaxi fleet will expand to approximately 100,000 vehicles by 2030, with a potential market penetration rate of 14-17% in first-tier cities [6][14]. Business Model and Market Position - The company operates a clear business model that includes autonomous driving services, technology licensing, and application services. It is the only company in China to have received regulatory approval for full-scene autonomous driving services in major cities [25][31]. - The company has successfully established a presence in eight countries, with a fleet of 1,159 Robotaxi vehicles and over 170 Robotruck vehicles as of the end of 2025 [5][14][26]. - The report highlights that the company’s competitive advantages stem from its clear commercialization path, strong technical capabilities, and a well-structured ecosystem that supports its growth [5][14][19].
演讲 | 强化学习之父 Sutton 隔空回应 Hinton:目前的 AI “理解不足,调参有余”
AI科技大本营· 2026-02-13 08:15
Core Viewpoint - The article emphasizes that AI should not be feared, as it is a natural extension of human intelligence and evolution, and advocates for a decentralized approach to AI governance rather than one based on fear [1][3]. Group 1: Current State of AI - The current consensus is that AI is advancing rapidly, but this should be critically examined as the field may not be progressing as significantly as perceived [6][8]. - AI's current capabilities, such as language processing and image generation, are seen as breakthroughs, but they do not represent the essence of intelligence, which is more about understanding and adaptability [7][8]. - The speaker argues that current AI models are "weak minds," lacking true understanding and reliability, despite their vast knowledge [8][9]. Group 2: Definition of Intelligence - Intelligence is defined as the ability to acquire and apply knowledge and skills, emphasizing the importance of learning [12][13]. - The article critiques the mainstream AI focus on computation and human imitation, suggesting a need for a deeper understanding of intelligence [14]. Group 3: Integrated Science of Mind - The speaker proposes the establishment of an Integrated Science of Mind that applies to humans, animals, and machines, highlighting the commonalities among different forms of intelligence [15][16]. - Reinforcement Learning (RL) is presented as a foundational approach for this new science, focusing on learning through interaction with the environment [18][20]. Group 4: Transition from Data to Experience - The article discusses the shift from the "Era of Human Data," where AI learns from existing human knowledge, to the "Era of Experience," where AI learns dynamically from interactions with the world [25][27]. - This transition is necessary for AI to create new knowledge rather than merely summarizing existing information [26]. Group 5: Principles of Experiential AI - The principles of experiential AI are based on the exchange of signals (experience) between the agent and the world, which forms the foundation of intelligence [36][38]. - The article outlines that the goal of an intelligent agent is to maximize reward signals, which define truth and objectives [39][40]. Group 6: Future of AI and Society - The speaker predicts that the future of AI will involve the creation of superintelligent AI and enhanced humans, which will lead to profound societal changes [44]. - There is a call for decentralized cooperation in AI governance, contrasting with centralized control driven by fear [46]. - The philosophical implications of AI suggest that it is a natural progression in the universe's evolution, and humanity's role is to embrace this development with courage and pride [47][48].