Workflow
强化学习
icon
Search documents
98年清华小伙,如何带着一群草根在机器人马拉松中逆袭?
混沌学园· 2025-05-08 11:08
这是 一场关于 "可能性"的认知革命 。 当松延动力创始人、混沌学园6期学员姜哲源带着草根团队闯入人形机器人赛道时,没人相信这群 "三无创业者"(无顶尖学历、无明星履历、无行业资 源)能存活超过半年。 2023年寒冬 ,他们艰难 自筹 资金 造 机器人 , 2024年春节前夜全员死磕强化学习算法,2025年押注机器人马拉松背水一战——松延动力用三次教科 书级 的 逆袭,撕碎了硬科技创业必须依赖 "豪华团队+天量融资"的行业偏见。 从融资 PPT无人问津到世界机器人大会惊艳全场,从地下室调试 算法 到抖音爆款制造机,这家公司用实打实的 经历 , 丈量出一条技术 公司 创业的生 存公式: 用 Demo击碎质疑,用 热爱 筛选真金,用极致性价比穿透市场迷雾。 以下 , 是姜博士在 2 025 年李善友开年大课上的演讲 。 大家好,我是 松延动力 创始人姜哲源,作为一家初创科技公司, 我 想为大家分享 我们的创业 故事 , 也 就是我们在创业过程中经历了哪些坑,迈过了 哪些坎。 融资生死局: 草根团队的"别墅实验室"突围 第一道坎是在 2023年9月,当时我想从学校出来创立人形机器人公司。那时这个赛道还处于萌芽期,市 ...
国泰海通:具身智能驱动人形机器人商业化落地 算法突破等成行业上涨催化剂
智通财经网· 2025-05-08 07:56
Group 1 - The core viewpoint is that embodied intelligence is the key to the commercialization of humanoid robots, with a market space exceeding one trillion yuan, and the intelligent level of humanoid robots in China is expected to evolve significantly by 2045 [1] - Humanoid robots possess human-like perception, body structure, and movement, making them highly adaptable to human society, with potential applications in manufacturing, social services, and hazardous operations [1] - The market scale for humanoid robots is currently below ten billion yuan, but as intelligent levels progress towards embodied intelligence, the market is expected to expand significantly [1] Group 2 - Multi-modal large models and reinforcement learning are enhancing operational control performance, with significant advancements in communication and computing power to support real-time control [2] - Major companies like NVIDIA and Tesla are integrating multi-modal perception to improve robot interaction and decision-making accuracy, while the development of embodied reasoning models is expected to enhance performance in complex environments [2] - The adoption of pure visual solutions and advanced sensors is anticipated to lower hardware costs and improve perception sensitivity, with EtherCAT emerging as a mainstream communication protocol due to its high real-time performance [2]
突破多模态奖励瓶颈!中科院清华快手联合提出R1-Reward,用强化学习赋予模型长期推理能力
量子位· 2025-05-08 06:58
R1-Reward团队 投稿 量子位 | 公众号 QbitAI 多模态奖励模型(MRMs)在提升多模态大语言模型(MLLMs)的表现中起着至关重要的作用: …… 而强化学习(RL)在理论上能够对MRM引入长期推理能力,使MRM更加高效。 但如果直接把现有的RL算法(比如Reinforce++)用到训练MRM上,就会出现很多状况,比如,训练过程会 很不稳定 、甚至可能 直接崩掉 : 现在,来自中科院自动化所、清华大学、快手和南京大学的研究团队,在探索如何 利用强化学习来稳定、有效地提升多模态奖励模型的长时 推理能力 方面,取得了新进展: 基于多模态强化学习的工作MM-RLHF(ICML 2025),进一步推出了 R1-Reward 模型。 在现有的多模态奖励模型benchmark的基础上,相比于当前最先进的SOTA模型,实现 5%-15% 的提升。 且随着inference sampleing的数目增多还能进一步增长! 主要贡献 在训练阶段,它可以提供稳定的reward; 在评估阶段,它可以选择更好的sample结果; 单独使用时,它可以直接作为evaluator; 1. 重新定义问题 作者把训练奖励模型这个问 ...
仅看视频就能copy人类动作,宇树G1分分钟掌握100+,UC伯克利提出机器人训练新方式
量子位· 2025-05-08 04:04
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI UC伯克利团队研发出了一套新的机器人训练系统,可将视频动作迁移到真实机器人。 这个名为 VideoMimic 的新系统,已经让宇树G1机器人成功模仿了100多段人类动作。 VideoMimic的核心原理是从视频当中提取姿态和点云数据,然后在模拟环境中训练并最终迁移到实体机器人。 这项成果引起了网友们的一片哇声,还有人联想到了《加勒比海盗》中的杰克·斯帕罗,表示简直就像装上了一个Jack的API一样。 适应各种地形,还会爬台阶 为了训练机器人策略,研究团队收集了一个包含123个视频片段的数据集。 这些视频由手持设备在日常环境中拍摄,涵盖了不同的人体运动技能和场景。 在VideoMimic的训练下,宇树Go1已经学会了适应各种地形: 不用动作捕捉, 只用一段视频就能教会机器人学会人类动作 ,效果be like: 学会了跨越路肩: 而且学会了爬台阶,过程中还表演出了花式走位: 既然会上,当然也就能下: 并且在下楼梯的过程中,作者发现即使机器人的脚底发生较大滑动,训练得到的策略也能够快速做出反应并恢复平衡,从而避免跌倒。 除了以上各种行进动作之外,也会站起和坐下 ...
梁文锋和杨植麟再“撞车”
创业家· 2025-05-07 09:57
以下文章来源于中国企业家杂志 ,作者闫俊文 中国企业家杂志 . 讲好企业家故事,弘扬企业家精神 被追赶和超越,是创业者常面对的挑战。 来源:中国企业家杂志 记者:闫俊文 编辑:张晓迪 继2月论文"撞车"之后,梁文锋和杨植麟又在另一个大模型赛道上相遇了。 4月30日,DeepSeek上线新模型DeepSeek-Prover-V2,这是一个数学定理证明专用模型。 Prover-V2的参数规模进一步扩展到671B(6710亿规模参数),相较于前一代V1.5版本的 7B规模增加了近百倍,这让其在数学测试集上的效率和正确率更高,比如,该模型的miniF2F 测试通过率达到88.9%,它还解决了PutnamBench(普特南测试)的49道题。 对于梁文锋而言,在R1模型推出三个多月后,外界对DeepSeek"魔法"的痴迷程度正在下降, 阿里巴巴的开源模型正在迅速赶上以及超过DeepSeek,外界热切期待其发布R2或V4模型, 以加强领先优势。 对于杨植麟和月之暗面,Kimi正在遭受来自字节跳动的豆包和腾讯元宝的挑战,它也需要保持 持续创新。 01 编程与数学,实现AGI的两条路径 对于AGI的实现路径,2024年,Dee ...
3000亿专项资金来了,科技又迎新动力!
Xin Lang Cai Jing· 2025-05-07 02:00
Group 1 - The People's Bank of China announced a 0.5 percentage point reduction in the reserve requirement ratio, expected to provide approximately 1 trillion yuan in long-term liquidity to the market, along with a 0.1 percentage point decrease in policy interest rates [1] - The AI sector is experiencing a significant transformation, moving from quantitative to qualitative changes, with advancements in general large models demonstrating near-human capabilities in various cognitive tasks [1] - The AI technology is reshaping social production methods and human existence, indicating a profound impact on various industries [1] Group 2 - The release of multiple AI models by Alibaba and the financial results from major US tech companies highlight the competitive landscape in the AI sector [2] - The upcoming 2025 Lenovo Tech World and other significant industry events indicate a growing focus on AI and related technologies [2] - The emergence of new job roles, such as prompt engineers, reflects the changing employment landscape driven by AI advancements [4] Group 3 - The diversification of AI applications is evident, with digital human technology marking a shift towards multi-dimensional penetration in various fields, including education and healthcare [5] - The market for digital humans is projected to grow significantly, with estimates indicating a market size exceeding 640 billion yuan by 2025 [5] - The integration of AI into public services and commercial sectors demonstrates the expanding boundaries of technology applications [5] Group 4 - The competition in the AI industry is shifting towards breakthroughs in underlying technologies and cost control, with advancements in embodied intelligence and multi-modal models [7] - The technology sector is expected to regain momentum as concerns over previous performance and tariff disruptions dissipate, with a focus on long-term industry trends [8] - The upcoming months are critical for the tech sector, with numerous industry conferences and events expected to catalyze new growth opportunities [8] Group 5 - The TMT sector is showing signs of recovery, with a notable increase in net profit growth rates, particularly in the AI industry [9] - Institutional investors have significant room for increasing allocations in the TMT sector, particularly in computer and media segments [9] - The AI ETF, which tracks the innovation board's AI index, includes major companies across the AI value chain, indicating a strategic investment opportunity [9][10]
VDC+VBench双榜第一!强化学习打磨的国产视频大模型,超越Sora、Pika
机器之心· 2025-05-06 04:11
机器之心发布 机器之心编辑部 随着 Deepseek 等强推理模型的成功,强化学习在大语言模型训练中越来越重要,但在视频生成领域缺少探索。复旦大学等机构将强化学习引入到视频生成领域, 经过强化学习优化的视频生成模型,生成效果更加自然流畅,更加合理。并且分别在 VDC(Video Detailed Captioning)[1] 和 VBench [2] 两大国际权威榜单中斩获 第一。 视频细粒度文本描述 视频细粒度文本描述模型(video detailed caption)为视频生成模型提供标签,是视频生成的基础。复旦大学等机构提出了 Cockatiel 方法 [3],该方法在权威的 VDC(Video Detailed Captioning 视频细粒度文本描述评测集)榜单上获得第一名,超过了包括通义千问 2-VL、VILA1.5、LLaVA-OneVision,Gemini-1.5 等在内的 多个主流视频理解多模态大模型。 论文标题:Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption ...
OpenAI放弃营利性转型!奥特曼:非营利组织继续掌控;关税重压下Temu停运中国直邮美国商品;英伟达再推中国特供版AI芯片
雷峰网· 2025-05-06 00:29
要闻提示 NEWS REMIND 1. 130%关税重压下TEMU宣布停止向美国出口中国产品,全托管链接大规模下架 2.哪吒汽车APP、官网大面积瘫痪,知情人士:因流量欠费,假期无人看管 3. 小米、华为、理想、蔚来、小鹏集体改口!多家车企"智驾"改名辅助驾驶 4.曝小红书将开放外域合作:广告跳转至天猫 5.梁文锋和杨植麟再"撞车",同时进军另一大模型赛道 6.雷军职务调整,小米女高管许斐新任国际市场部总经理 7.英伟达再推中国特供版AI芯片,传样品六月问世,已向中国三家企业通报 今日头条 HEADLINE NEWS 130%关税重压下TEMU宣布停止向美国出口中国产品,全托管链接大规模下架 据外媒报道,日前中国电商平台Temu对外宣布,将停止通过平台向美国客户直接销售从中国进口的商 品,未来美国市场的销售将转由本地卖家处理。近期该平台网站及应用程序已全面调整,仅显示从美国仓 库发货的半托管产品列表,而此前由中国直发的全托管商品则被普遍标记为"缺货"。 Temu发言人证实,目前美国市场销售已全部由当地卖家承接,商品均从美国本土发货,并强调本地产 品"无进口费用"及"交货时无额外费用"。此前,消费者购买中国直发 ...
梁文锋和杨植麟再“撞车”
华尔街见闻· 2025-05-05 12:26
以下文章来源于中国企业家杂志 ,作者闫俊文 中国企业家杂志 . 讲好企业家故事,弘扬企业家精神 对于梁文锋而言,在R1模型推出三个多月后,外界对DeepSeek"魔法"的痴迷程度正在下降,阿里巴巴的开源模型正在迅速赶上以及超过DeepSeek,外界热切期待 其发布R2或V4模型,以加强领先优势。 对于杨植麟和月之暗面,Kimi正在遭受来自字节跳动的豆包和腾讯元宝的挑战,它也需要保持持续创新。 记者闫俊文 编辑张晓迪 继2月论文"撞车"之后,梁文锋和杨植麟又在另一个大模型赛道上相遇了。 4月30日,DeepSeek上线新模型DeepSeek-Prover-V2,这是一个数学定理证明专用模型。 Prover-V2的参数规模进一步扩展到671B(6710亿规模参数),相较于前一代V1.5版本的7B规模增加了近百倍 ,这让其在数学测试集上的效率和正确率更高,比 如,该模型的miniF2F测试通过率达到88.9%,它还解决了PutnamBench(普特南测试)的49道题。 巧合的是, 4月中旬,月之暗面也曾推出一款用于形式化定理证明的大模型Kimina-Prover ,这是Kimi团队和Numina共同研发的大模型,该 ...
谷歌DeepMind:大模型也很任性,知道最优路径偏要撞南墙
机器之心· 2025-05-05 03:40
Core Insights - The article investigates the common failure modes of Large Language Models (LLMs) in decision-making scenarios, specifically focusing on greediness, frequency bias, and the knowing-doing gap [2][15]. - It proposes a reinforcement learning fine-tuning method (RLFT) to enhance the decision-making capabilities of LLMs by addressing these shortcomings [2][8]. Group 1: Failure Modes - LLMs exhibit suboptimal exploration and a knowing-doing gap, which prevents effective translation of knowledge into action [2][15]. - The three identified failure modes are: 1. Greediness, where LLMs overly favor actions that have previously shown the best performance [15]. 2. Frequency bias, where LLMs tend to repeat high-frequency actions regardless of their reward differences [5][18]. 3. Knowing-doing gap, where LLMs understand task requirements but fail to execute optimal actions due to a preference for greedy choices [7][20]. Group 2: Model Performance - Small-scale LLMs (2B) are significantly affected by frequency bias, leading to a lack of exploration, with up to 55% of actions remaining unexplored [4][18]. - Large-scale LLMs (27B) show reduced frequency bias but still exhibit greedy behavior, limiting their overall performance [6][18]. - The average action coverage for the largest models was only 45%, indicating a substantial gap compared to optimal strategies [17]. Group 3: Reinforcement Learning Fine-Tuning - The RLFT method adjusts the reasoning process of LLMs based on rewards obtained from environmental interactions, promoting the selection of actions that yield higher rewards [8][22]. - Results indicate that RLFT significantly reduces regret values in various environments, improving LLM performance compared to random baselines [22]. - RLFT effectively mitigates greediness by encouraging exploration, thus enhancing decision-making capabilities [22].