Workflow
强化学习
icon
Search documents
新技能get!人形机器人学会连续后空翻统共需几步?揭秘→
Zhong Guo Jing Ji Wang· 2025-03-15 08:36
Core Insights - The humanoid robot has recently demonstrated the ability to perform consecutive backflips, showcasing significant advancements in its capabilities [1][2] - The development team utilized innovative hardware design and advanced algorithms to enhance the robot's performance and stability during complex movements [1][3] Group 1: Robot Capabilities - A humanoid robot named N2 has successfully completed multiple consecutive backflips, a feat that is more challenging than front flips due to the mechanics involved [1] - The robot's design includes concentrated weight distribution in the hip area and the use of powerful motors and lightweight materials to improve agility and explosiveness [1] Group 2: Learning Process - The team achieved the robot's backflip capability in just three weeks through a structured learning process that involved dynamic calculations and virtual simulations [2][3] - The training incorporated reinforcement learning, allowing the robot to learn from trial and error, mimicking human learning processes [3] Group 3: Challenges in Training - Training robots in real environments poses risks of damage due to potential errors in movement, necessitating the use of virtual environments for initial training [4] - There are challenges in ensuring that the virtual training accurately reflects real-world conditions to avoid discrepancies during the transition to physical execution [4]
深度|MiniMax加速调整,收购AI视频创业公司,海螺ai正式改名,或是受DeepSeek影响最小的六小虎
Z Finance· 2025-03-14 11:39
Core Viewpoint - MiniMax is set to acquire Shenzhen-based AI video generation startup Lu Ying Technology (Avolution.ai), aiming for technology complementarity and market expansion in the competitive AI landscape [1][2]. Summary by Sections Acquisition Details - Lu Ying Technology, founded in September 2023, specializes in AI video generation with its core product, YoYo, targeting the anime creator market [1]. - The company has developed the LCM (Latent Consistency Model) visual model, which enhances video generation efficiency and content consistency [2]. - The acquisition is seen as a strategic move for MiniMax to enhance its capabilities in video generation and to compete against larger firms like Baidu and Alibaba [2]. Company Background - Lu Ying Technology's CEO, Huang Zhaoyang, has a strong academic background, having previously worked at SenseTime and NVIDIA [1]. - The company raised approximately 100 million RMB in its angel round financing but faced challenges in securing further funding in 2024 [1]. Market Context - The AI industry in China is experiencing accelerated consolidation, with many startups opting for acquisition due to funding difficulties and commercialization challenges [3]. - Examples include Bian Sai Technology, which was acquired by Ant Group after facing commercialization bottlenecks, and BoFeng Intelligent, which was acquired by OPPO [3][4]. Internal Adjustments at MiniMax - MiniMax is undergoing internal changes, including the departure of key executives and a rebranding of its core product from "Hai Luo AI" to "MiniMax" [5][6]. - The company aims to streamline its brand recognition and enhance its global positioning through these adjustments [6]. Competitive Positioning - MiniMax is noted for its advanced multi-modal model technology, which has achieved breakthroughs in text, visual, and video generation, positioning it favorably in the market [6][7]. - The company has also seen success in international markets, with its product "Talkie" reportedly generating close to tens of millions of dollars in revenue last year [7].
喝点VC|红杉对话OpenAI Deep Research团队:AI Agent将成为今年最具突破性技术,强化学习重新回归主流
Z Potentials· 2025-03-10 03:07
Core Viewpoint - The article discusses the launch and capabilities of OpenAI's "Deep Research," an AI agent that utilizes end-to-end reinforcement learning to enhance efficiency in complex information retrieval and reasoning tasks, significantly reducing the time required for knowledge work from hours to minutes [2][10][24]. Group 1: Product Overview - "Deep Research" is designed to retrieve information from multiple online sources and generate detailed reports, completing tasks in 5 to 30 minutes compared to hours for humans [6][10]. - The product is part of OpenAI's agent series, following the "Operator" agent, with plans for further expansions including a "Shards Seeker" agent [4][6]. - The development of "Deep Research" was inspired by breakthroughs in reasoning paradigms and aims to tackle complex tasks requiring extensive online research and creativity [7][10]. Group 2: Target Users and Applications - The primary users of "Deep Research" include knowledge workers in various fields such as market analysis, medical research, and personal planning [11][12]. - The product has shown significant utility in scientific research, helping users find relevant literature and data [12]. - It is also beneficial for personal tasks like shopping and travel planning, allowing users to save time and make informed decisions [18][19]. Group 3: Technical Mechanism and Innovations - "Deep Research" employs a fine-tuned version of OpenAI's advanced reasoning model, o3, specifically trained for complex web browsing and reasoning tasks [24][25]. - The model's ability to dynamically adjust its search strategies during information retrieval sets it apart from traditional search engines [25][26]. - The integration of a "Chain-of-Thought" summary allows users to understand the reasoning process behind the model's search strategies, enhancing transparency [25][26]. Group 4: Future Developments and Impact - Future plans for "Deep Research" include expanding its capabilities to access private data and improving its analytical functions for more complex tasks [37][38]. - The potential impact of "Deep Research" on various professions, particularly in consulting and healthcare, is significant, as it can drastically reduce the time spent on research tasks [39][40]. - The technology is expected to empower knowledge workers rather than replace them, enhancing their efficiency and decision-making capabilities [39][40].
GPT-5 有了雏形;OpenAI 和 Manus 研发 Agent 的经验;中国大公司扩大算力投资丨 AI 月报
晚点LatePost· 2025-03-08 12:17
2025 年 2 月的全球 AI 重要趋势。 文 丨 贺乾明 2025 年 2 月的 AI 月报,你会看到: 硅谷巨头的新共识:推理能力是大模型的一部分 OpenAI 和 Manus 的 Agent 开发经验 DeepSeek 推动中国大公司加大算力投入,阿里、字节两家加起来,今年就超过 2000 亿 3 家售价过亿的 AI 公司和 23 家获得超过 5000 万美元融资的 AI 公司 OpenAI 时薪 100 美元招专家生产数据提高模型能力 这一期月报中,我们开始邀请研究者、创业者和投资人提供一手视角的对每月 AI 趋势和标志性事件的评述和 洞察。 晚点 AI 月报,每月选取最值得你知道的 AI 信号。 以下是我们第 4 期 AI 月报,欢迎大家在留言区补充我们没有提到的重要趋势。 技术丨GPT-5 雏形出现,行业新共识诞生 DeepSeek 带来的冲击波继续扩散,全球大模型公司陷入混战:不论是马斯克用超过 10 万张 GPU 训练 的 Grok 3,还是 OpenAI 可能投入 10 亿美元训练的 GPT-4.5,或是 Anthropic 融合推理(reasoning) 能力的最新模型 Claude 3 ...
第一批买到宇树机器人的赚麻了
投资界· 2025-03-07 07:15
以下文章来源于科技狐 ,作者老狐 日入过万。 作者 | 老狐 来源 | 科技狐 (ID:kejihutv) 宇树科技的机器人效应,正从春晚舞台蔓延至商业市场。第一批抢到宇树机器人的人,已经赚钱了。 2 月 12 日,宇树科技的 H1 和 G1 人形机器人在京东线上首发开售。其中,G1 起售价 9.9 万元,H1 起售价 65 万元,不过现在 都处于售罄无货阶段。 科技狐 . 一家专注科技互联网领域,每日分享科技、数码、汽车、商业、TMT、AI 的新媒体。 然而,由于现货稀缺,即使直接向宇树订购,交付周期也普遍需要 2 个月。 抢到机器人的买家迅速嗅到了商机,纷纷转向二手市场。 社交平台和二手交易网站上,涌现了大量宇树机器人租赁商家,单台日租金高达 5000 元至 1.5 万元,且档期紧张,甚至出现 " 一 机难求 " 的局面。 这一价格通常包含本地商家运输到场、操作员全天协同护航的费用,不收押金。 如若不需要操作员,部分商家则要求收取高额押金。 如果按照日租 1 万元的价格,低配版的 G1 确实差不多 10 天就能回本。 难怪有人感慨:" 这真是一门好生意。" 继宇树 H1 机器人在春晚《秧 Bot 》中扭出 ...
DeepSeek-R1\Kimi1.5及类强推理模型开发解读
Peking University· 2025-03-05 10:54
Investment Rating - The report does not explicitly state an investment rating for the industry or company Core Insights - DeepSeek-R1 introduces a new paradigm of strong reasoning under reinforcement learning (RL), showcasing significant advancements in reasoning capabilities and long-text processing [4][7] - The model demonstrates exceptional performance in complex tasks, marking a milestone in the open-source community's competition with closed-source models like OpenAI's o1 series [7] - The report highlights the potential of RL-driven models to enhance reasoning abilities without relying on human-annotated supervised fine-tuning [21][56] Summary by Sections Technical Comparison - The report discusses the comparison between STaR-based methods and RL-based methods, emphasizing the advantages of RL in reasoning tasks [3] - It details the innovative RL algorithms used, such as GRPO, which optimize training efficiency and reduce computational costs [49][50] DeepSeek-R1 Analysis - DeepSeek-R1 Zero is built entirely on RL without supervised fine-tuning, showcasing its ability to develop reasoning capabilities autonomously [13][21] - The model's performance metrics indicate strong results in various benchmarks, including AIME 2024 and MATH-500, where it achieved 79.8% and 97.3% respectively, comparable to OpenAI's models [7][15] Insights and Takeaways - The report emphasizes the importance of a robust base model, DeepSeek-V3, which was trained on 671 billion parameters and 14.8 trillion high-quality tokens, enabling significant reasoning capabilities [45][56] - The use of rule-based rewards in training helps avoid reward hacking issues, allowing for automated verification and annotation of reasoning tasks [17][22] Future Directions - The report discusses the potential for further advancements in RL-driven models, suggesting that future training will increasingly focus on RL while still incorporating some supervised fine-tuning [56] - It highlights the need for models to maintain high reasoning performance while ensuring safety and usability in diverse applications [59] Economic and Social Benefits - The exploration of low-cost, high-quality language models is expected to reshape industry dynamics, leading to increased competition and innovation [59] - The report notes that the capital market's volatility is a short-term phenomenon driven by rapid advancements in AI technology, which will lead to a long-term arms race in computational resources [59]
中国AI门派:汪军与他的学生们
投资界· 2025-03-04 07:41
以下文章来源于雷峰网 ,作者赖文昕 雷峰网 . 洞见智能未来,共与产业变迁 中国强化学习研究的半壁江山。 作者 | 赖文昕 编辑丨陈彩娴 来源 | 雷峰网 (ID:leiphone-sz) 作为一支在 AI 领域历经数十年的研究分支,强化学习仍在历久弥新。 从推荐系统到强化学习 2006 年暑假的一个午后,汪军踏上了从荷兰小城代尔夫特开往首都阿姆斯特丹的火 车,他将在阿姆斯特丹换乘飞机,飞往美国西雅图参加第 29 届国际计算机协会信息检 索大会(ACM SIGIR)。 此时的信息检索领域如日中天,加上微软、雅虎和谷歌三巨头最核心的业务也是搜索, ACM SIGIR 每年都能汇集学术界与工业界的最高人才,来开一场信息检索界的"年 会"。 在华盛顿大学的会场里,汪军在一片掌声中获得了最佳博士联盟奖,于博士毕业的前一 年拿下了信息检索领域博士的最高荣誉。 这位意气风发的青年此刻并未想到,自己将会在 15 年后再获得时间检验奖的荣誉提名 ——2021 年的汪军已转向强化学习(RL)数年,作为发起人之一成立了华人强化学习 社区RL China,为国内强化学习研究培养了一批优秀的青年人才,成为领域的"一代宗 师"。 汪军 ...
喝点VC|Greylock解读DeepSeek-R1,掀起AI革命和重构经济秩序
Z Potentials· 2025-03-04 05:33
Core Insights - The introduction of DeepSeek-R1 marks a pivotal moment in the AI landscape, bridging the gap between open-source and proprietary models, with significant implications for AI infrastructure and generative AI economics [1][2][8] Open Source vs. Proprietary Models - DeepSeek-R1 has significantly narrowed the performance gap with proprietary models like OpenAI, achieving parity in key reasoning benchmarks despite being smaller in scale [2] - The emergence of DeepSeek is seen as a watershed moment for open-source AI, with models like Llama, Qwen, and Mistral expected to catch up quickly [2][3] - The competitive landscape is shifting, with a vibrant and competitive LLM market anticipated, driven by the open-source model's advancements [2][3] AI Infrastructure and Developer Utilization - DeepSeek-R1 utilizes reinforcement learning (RL) to enhance reasoning capabilities, marking the first successful large-scale implementation of this approach in an open-source model [3][4] - The model's success is expected to democratize access to high-performance AI, allowing enterprises to customize solutions based on their specific needs [3][4] - The shift in AI infrastructure is characterized by a move away from closed models, enabling more control and flexibility for developers [4] New Applications: Large-Scale AI Reasoning - Enhanced reasoning capabilities of DeepSeek open up new application possibilities, including autonomous AI agents and specialized planning systems across various industries [5][6] - The demand for GPU computing is expected to increase due to the accelerated adoption of agent applications driven by DeepSeek [6] - Companies in highly regulated industries will benefit from the ability to experiment and innovate while maintaining control over data usage [6] Generative AI Economics: Changing Cost Dynamics - DeepSeek is driving a trend towards lower costs and higher efficiency in reasoning and training, fundamentally altering the economics of generative AI deployment [7][8] - Models like R1 can be up to seven times cheaper than using proprietary APIs, unlocking previously unfeasible use cases for many enterprises [7] - The economic advantages of open-source models are expected to lead to a broader adoption of AI technologies across various sectors [7][8] Conclusion - DeepSeek represents a significant milestone in the AI industry, enabling open-source models to compete effectively with proprietary alternatives, while emphasizing the importance of high-quality, domain-specific data and labeling for future advancements [8]
日入过万,第一批买到宇树机器人的赚麻了
36氪· 2025-03-04 00:11
以下文章来源于科技狐 ,作者老狐 科技狐 . 一家专注科技互联网领域,每日分享科技、数码、汽车、商业、TMT、AI 的新媒体。 第一批抢到宇树机器人的人, 已经赚钱了。 文 | 老狐 编辑 | 不吃麦芽糖 来源| 科技狐(ID:kejihutv) 封面来源 | IC photo 宇树科技的机器人效应,正从春晚舞台蔓延至商业市场。第一批抢到宇树机器人的人,已经赚钱了。 继宇树H1机器人在春晚《秧Bot》中扭出"赛博顶流"后,它的"亲弟弟"G1凭算法升级的丝滑舞技,再次引爆全网。 2月12日,宇树科技的H1和G1人形机器人在京东线上首发开售。其中,G1起售价9.9万元,H1起售价65万元,不过现在都处于售罄无货阶段。 然而,由于现货稀缺,即使直接向宇树订购,交付周期也普遍需要2个月。 抢到机器人的买家迅速嗅到了商机,纷纷转向二手市场。 社交平台和二手交易网站上,涌现了大量宇树机器人租赁商家,单台日租金高达5000元至1.5万元,且档期紧张,甚至出现"一机难求"的局面。 这一价格通常包含本地商家运输到场、操作员全天协同护航的费用,不收押金。 如若不需要操作员,部分商家则要求收取高额押金。 如果按照日租1万元的价格, ...
UCL强化学习派:汪军与他的学生们
雷峰网· 2025-02-27 10:15
Core Viewpoint - The article discusses the evolution and significance of reinforcement learning (RL) in China, highlighting key figures and their contributions to the field, particularly focusing on Wang Jun and his influence on the development of RL research and education in China [2][46]. Group 1: Historical Context and Development - Wang Jun's journey in AI began with information retrieval and recommendation systems, where he achieved significant academic recognition [4][8]. - His transition to reinforcement learning was influenced by his experiences in advertising, where he recognized the parallels between decision-making in advertising and RL principles [12][14]. - The establishment of the RL China community marked a pivotal moment in promoting RL research and education in China, addressing the lack of resources and formal education in the field [49][50]. Group 2: Contributions and Innovations - Wang Jun and his students have made substantial contributions to RL, including the development of SeqGAN and IRGAN, which integrate RL with generative adversarial networks for improved performance in various applications [23][24]. - The introduction of multi-agent systems in RL research has been a significant focus, with applications in complex environments such as advertising and gaming [27][28]. - The establishment of MediaGamma allowed for practical applications of RL in real-time advertising, showcasing the commercial viability of RL algorithms [17][18]. Group 3: Educational Initiatives and Community Building - The formation of RL China has facilitated knowledge sharing and collaboration among researchers and students, significantly enhancing the learning environment for RL in China [49][52]. - The publication of "Hands-On Reinforcement Learning" has provided accessible educational resources, bridging the gap between theory and practice for students [53]. - Wang Jun's mentorship has fostered a new generation of RL researchers, emphasizing the importance of exploration and innovation in academic pursuits [26][43]. Group 4: Future Directions and Challenges - The integration of RL with large models and embodied intelligence represents a promising frontier for future research, aiming to address the challenges of generalization across different tasks and environments [56][62]. - The ongoing exploration of RL applications in real-world scenarios, such as robotics and automated decision-making, highlights the potential for RL to impact various industries significantly [61][62]. - Despite setbacks in some projects, the commitment to advancing RL research and its applications remains strong among Wang Jun and his students, indicating a resilient and forward-looking approach to the field [56][62].