强化学习 - filings, earnings calls, financial reports, news - Reportify

强化学习

Search documents

第一批买到宇树机器人的赚麻了

投资界· 2025-03-07 07:15

以下文章来源于科技狐，作者老狐日入过万。作者 | 老狐来源 | 科技狐 (ID：kejihutv) 宇树科技的机器人效应，正从春晚舞台蔓延至商业市场。第一批抢到宇树机器人的人，已经赚钱了。 2 月 12 日，宇树科技的 H1 和 G1 人形机器人在京东线上首发开售。其中，G1 起售价 9.9 万元，H1 起售价 65 万元，不过现在都处于售罄无货阶段。科技狐 . 一家专注科技互联网领域，每日分享科技、数码、汽车、商业、TMT、AI 的新媒体。然而，由于现货稀缺，即使直接向宇树订购，交付周期也普遍需要 2 个月。抢到机器人的买家迅速嗅到了商机，纷纷转向二手市场。社交平台和二手交易网站上，涌现了大量宇树机器人租赁商家，单台日租金高达 5000 元至 1.5 万元，且档期紧张，甚至出现 " 一机难求 " 的局面。这一价格通常包含本地商家运输到场、操作员全天协同护航的费用，不收押金。如若不需要操作员，部分商家则要求收取高额押金。如果按照日租 1 万元的价格，低配版的 G1 确实差不多 10 天就能回本。难怪有人感慨：" 这真是一门好生意。" 继宇树 H1 机器人在春晚《秧 Bot 》中扭出 ...

real2sim2real模型

宇树机器人G1

宇树机器人H1

real2sim2real模型

宇树机器人G1

宇树机器人H1

DeepSeek-R1\Kimi1.5及类强推理模型开发解读

Peking University· 2025-03-05 10:54

Investment Rating - The report does not explicitly state an investment rating for the industry or company Core Insights - DeepSeek-R1 introduces a new paradigm of strong reasoning under reinforcement learning (RL), showcasing significant advancements in reasoning capabilities and long-text processing [4][7] - The model demonstrates exceptional performance in complex tasks, marking a milestone in the open-source community's competition with closed-source models like OpenAI's o1 series [7] - The report highlights the potential of RL-driven models to enhance reasoning abilities without relying on human-annotated supervised fine-tuning [21][56] Summary by Sections Technical Comparison - The report discusses the comparison between STaR-based methods and RL-based methods, emphasizing the advantages of RL in reasoning tasks [3] - It details the innovative RL algorithms used, such as GRPO, which optimize training efficiency and reduce computational costs [49][50] DeepSeek-R1 Analysis - DeepSeek-R1 Zero is built entirely on RL without supervised fine-tuning, showcasing its ability to develop reasoning capabilities autonomously [13][21] - The model's performance metrics indicate strong results in various benchmarks, including AIME 2024 and MATH-500, where it achieved 79.8% and 97.3% respectively, comparable to OpenAI's models [7][15] Insights and Takeaways - The report emphasizes the importance of a robust base model, DeepSeek-V3, which was trained on 671 billion parameters and 14.8 trillion high-quality tokens, enabling significant reasoning capabilities [45][56] - The use of rule-based rewards in training helps avoid reward hacking issues, allowing for automated verification and annotation of reasoning tasks [17][22] Future Directions - The report discusses the potential for further advancements in RL-driven models, suggesting that future training will increasingly focus on RL while still incorporating some supervised fine-tuning [56] - It highlights the need for models to maintain high reasoning performance while ensuring safety and usability in diverse applications [59] Economic and Social Benefits - The exploration of low-cost, high-quality language models is expected to reshape industry dynamics, leading to increased competition and innovation [59] - The report notes that the capital market's volatility is a short-term phenomenon driven by rapid advancements in AI technology, which will lead to a long-term arms race in computational resources [59]

后训练Scaling Law

长文本思维链（Long-CoT）

后训练Scaling Law

长文本思维链（Long-CoT）

中国AI门派：汪军与他的学生们

投资界· 2025-03-04 07:41

以下文章来源于雷峰网，作者赖文昕雷峰网 . 洞见智能未来，共与产业变迁中国强化学习研究的半壁江山。作者 | 赖文昕编辑丨陈彩娴来源 | 雷峰网（ID：leiphone-sz）作为一支在 AI 领域历经数十年的研究分支，强化学习仍在历久弥新。从推荐系统到强化学习 2006 年暑假的一个午后，汪军踏上了从荷兰小城代尔夫特开往首都阿姆斯特丹的火车，他将在阿姆斯特丹换乘飞机，飞往美国西雅图参加第 29 届国际计算机协会信息检索大会（ACM SIGIR）。此时的信息检索领域如日中天，加上微软、雅虎和谷歌三巨头最核心的业务也是搜索， ACM SIGIR 每年都能汇集学术界与工业界的最高人才，来开一场信息检索界的"年会"。在华盛顿大学的会场里，汪军在一片掌声中获得了最佳博士联盟奖，于博士毕业的前一年拿下了信息检索领域博士的最高荣誉。这位意气风发的青年此刻并未想到，自己将会在 15 年后再获得时间检验奖的荣誉提名 ——2021 年的汪军已转向强化学习（RL）数年，作为发起人之一成立了华人强化学习社区RL China，为国内强化学习研究培养了一批优秀的青年人才，成为领域的"一代宗师"。汪军 ...

多智能体强化学习

生成式对抗网络

平均场博弈

Artificial Intelligence

多智能体强化学习

生成式对抗网络

平均场博弈

Artificial Intelligence

喝点VC｜Greylock解读DeepSeek-R1，掀起AI革命和重构经济秩序

Z Potentials· 2025-03-04 05:33

Core Insights - The introduction of DeepSeek-R1 marks a pivotal moment in the AI landscape, bridging the gap between open-source and proprietary models, with significant implications for AI infrastructure and generative AI economics [1][2][8] Open Source vs. Proprietary Models - DeepSeek-R1 has significantly narrowed the performance gap with proprietary models like OpenAI, achieving parity in key reasoning benchmarks despite being smaller in scale [2] - The emergence of DeepSeek is seen as a watershed moment for open-source AI, with models like Llama, Qwen, and Mistral expected to catch up quickly [2][3] - The competitive landscape is shifting, with a vibrant and competitive LLM market anticipated, driven by the open-source model's advancements [2][3] AI Infrastructure and Developer Utilization - DeepSeek-R1 utilizes reinforcement learning (RL) to enhance reasoning capabilities, marking the first successful large-scale implementation of this approach in an open-source model [3][4] - The model's success is expected to democratize access to high-performance AI, allowing enterprises to customize solutions based on their specific needs [3][4] - The shift in AI infrastructure is characterized by a move away from closed models, enabling more control and flexibility for developers [4] New Applications: Large-Scale AI Reasoning - Enhanced reasoning capabilities of DeepSeek open up new application possibilities, including autonomous AI agents and specialized planning systems across various industries [5][6] - The demand for GPU computing is expected to increase due to the accelerated adoption of agent applications driven by DeepSeek [6] - Companies in highly regulated industries will benefit from the ability to experiment and innovate while maintaining control over data usage [6] Generative AI Economics: Changing Cost Dynamics - DeepSeek is driving a trend towards lower costs and higher efficiency in reasoning and training, fundamentally altering the economics of generative AI deployment [7][8] - Models like R1 can be up to seven times cheaper than using proprietary APIs, unlocking previously unfeasible use cases for many enterprises [7] - The economic advantages of open-source models are expected to lead to a broader adoption of AI technologies across various sectors [7][8] Conclusion - DeepSeek represents a significant milestone in the AI industry, enabling open-source models to compete effectively with proprietary alternatives, while emphasizing the importance of high-quality, domain-specific data and labeling for future advancements [8]

Artificial Intelligence

杰文斯悖论

Artificial Intelligence

Artificial Intelligence

杰文斯悖论

Artificial Intelligence

新型电力系统中人工智能应用与扩展

上海交大· 2025-03-04 05:24

Investment Rating - The report does not explicitly provide an investment rating for the industry. Core Insights - The new generation of artificial intelligence (AI) is built on big data, high-performance computing, and machine learning, significantly advancing AI technology [13][160]. - AI applications in power systems include load forecasting, renewable energy output prediction, fault diagnosis, and scenario generation, indicating a strong trend towards digitalization and intelligent management in the energy sector [61][160]. - The integration of AI with blockchain and digital twin technologies is expected to enhance operational efficiency and decision-making in power systems [94][160]. Summary by Sections Artificial Intelligence Overview - AI is defined as a system that combines theories, technologies, and methods inspired by neuroscience, focusing on high-performance computing, big data, and machine learning [13][12]. AI Models - Various machine learning algorithms, including Support Vector Machines (SVM) and Decision Trees (DT), are widely used for predictive analytics in different applications [23][28]. AI Applications in Power Systems - AI is utilized for load forecasting, renewable energy output prediction, and fault diagnosis, employing models like LSTM and GAN for enhanced accuracy and efficiency [61][65][74]. - The report highlights the use of deep learning techniques for diagnosing faults in power distribution networks, particularly in complex scenarios like single-phase grounding faults [69][148]. AI Extensions - The report discusses the potential of federated learning in addressing data privacy issues in power systems, allowing for collaborative model training without compromising sensitive information [44][55]. - The application of blockchain technology in virtual power plants is explored, emphasizing the need for transparency and efficiency in energy trading [94][96]. Digital Twin Technology - Digital twin technology is presented as a means to create a virtual representation of physical systems, facilitating real-time monitoring and predictive maintenance in power systems [101][108]. Conclusion - The report concludes that the advancements in AI, combined with emerging technologies like blockchain and digital twins, will play a crucial role in the future development of intelligent power systems, enhancing their operational capabilities and resilience [160].

2025年DeepSeek-R1&Kimi 1.5及类强推理模型开发解读报告

Peking University· 2025-03-04 01:35

Investment Rating - The report does not explicitly provide an investment rating for the industry or company discussed Core Insights - DeepSeek-R1 introduces a new paradigm of strong reasoning under reinforcement learning (RL), showcasing significant advancements in reasoning capabilities and long-text processing [4][7] - The model demonstrates exceptional performance in complex tasks, marking a milestone in the open-source community's competition with closed-source models like OpenAI's o1 series [7] - The report emphasizes the importance of RL in enhancing model capabilities, particularly in mathematical reasoning and coding tasks, with DeepSeek-R1 achieving notable scores in various benchmarks [7][59] Summary by Sections Technical Comparison - The report discusses the technical advancements of DeepSeek-R1, including its architecture and the innovative RL algorithms employed, such as GRPO [3][4] - A comparison of performance metrics against other models, highlighting DeepSeek-R1's superior capabilities in various reasoning tasks [6] Insights and Takeaways - The model's ability to self-iterate and enhance its reasoning capabilities through RL is emphasized, showcasing its potential for autonomous learning without reliance on supervised fine-tuning [21][56] - The report outlines the significance of rule-based rewards in the training process, which helps avoid reward hacking issues commonly faced in traditional RL setups [16][23] Future Directions - The report suggests future exploration in enhancing model safety and usability, particularly in generating coherent and clear reasoning outputs [30][59] - It highlights the potential for further advancements in multi-modal reasoning and the integration of synthetic data to overcome data reproduction challenges [30][59] Economic and Social Benefits - The exploration of low-cost, high-quality language models is discussed, emphasizing the shift from model size to computational resources and synthetic data in expanding capabilities [59] - The report notes the potential for increased market activity and innovation driven by accessible AI technologies, which could lead to a more diverse application landscape [59]

强推理模型

后训练Scaling Law

形式化验证

强推理模型

后训练Scaling Law

形式化验证

日入过万，第一批买到宇树机器人的赚麻了

36氪· 2025-03-04 00:11

以下文章来源于科技狐，作者老狐科技狐 . 一家专注科技互联网领域，每日分享科技、数码、汽车、商业、TMT、AI 的新媒体。第一批抢到宇树机器人的人，已经赚钱了。文｜老狐编辑｜不吃麦芽糖来源｜科技狐（ID：kejihutv）封面来源｜ IC photo 宇树科技的机器人效应，正从春晚舞台蔓延至商业市场。第一批抢到宇树机器人的人，已经赚钱了。继宇树H1机器人在春晚《秧Bot》中扭出"赛博顶流"后，它的"亲弟弟"G1凭算法升级的丝滑舞技，再次引爆全网。 2月12日，宇树科技的H1和G1人形机器人在京东线上首发开售。其中，G1起售价9.9万元，H1起售价65万元，不过现在都处于售罄无货阶段。然而，由于现货稀缺，即使直接向宇树订购，交付周期也普遍需要2个月。抢到机器人的买家迅速嗅到了商机，纷纷转向二手市场。社交平台和二手交易网站上，涌现了大量宇树机器人租赁商家，单台日租金高达5000元至1.5万元，且档期紧张，甚至出现"一机难求"的局面。这一价格通常包含本地商家运输到场、操作员全天协同护航的费用，不收押金。如若不需要操作员，部分商家则要求收取高额押金。如果按照日租1万元的价格， ...

宇树G1机器人

宇树H1机器人

BeamDojo强化学习框架

宇树G1机器人

宇树H1机器人

BeamDojo强化学习框架

UCL强化学习派：汪军与他的学生们

雷峰网· 2025-02-27 10:15

Core Viewpoint - The article discusses the evolution and significance of reinforcement learning (RL) in China, highlighting key figures and their contributions to the field, particularly focusing on Wang Jun and his influence on the development of RL research and education in China [2][46]. Group 1: Historical Context and Development - Wang Jun's journey in AI began with information retrieval and recommendation systems, where he achieved significant academic recognition [4][8]. - His transition to reinforcement learning was influenced by his experiences in advertising, where he recognized the parallels between decision-making in advertising and RL principles [12][14]. - The establishment of the RL China community marked a pivotal moment in promoting RL research and education in China, addressing the lack of resources and formal education in the field [49][50]. Group 2: Contributions and Innovations - Wang Jun and his students have made substantial contributions to RL, including the development of SeqGAN and IRGAN, which integrate RL with generative adversarial networks for improved performance in various applications [23][24]. - The introduction of multi-agent systems in RL research has been a significant focus, with applications in complex environments such as advertising and gaming [27][28]. - The establishment of MediaGamma allowed for practical applications of RL in real-time advertising, showcasing the commercial viability of RL algorithms [17][18]. Group 3: Educational Initiatives and Community Building - The formation of RL China has facilitated knowledge sharing and collaboration among researchers and students, significantly enhancing the learning environment for RL in China [49][52]. - The publication of "Hands-On Reinforcement Learning" has provided accessible educational resources, bridging the gap between theory and practice for students [53]. - Wang Jun's mentorship has fostered a new generation of RL researchers, emphasizing the importance of exploration and innovation in academic pursuits [26][43]. Group 4: Future Directions and Challenges - The integration of RL with large models and embodied intelligence represents a promising frontier for future research, aiming to address the challenges of generalization across different tasks and environments [56][62]. - The ongoing exploration of RL applications in real-world scenarios, such as robotics and automated decision-making, highlights the potential for RL to impact various industries significantly [61][62]. - Despite setbacks in some projects, the commitment to advancing RL research and its applications remains strong among Wang Jun and his students, indicating a resilient and forward-looking approach to the field [56][62].

多智能体强化学习

生成式对抗网络（GAN）

平均场博弈（Mean Field Game）

基于人类反馈的强化学习（RLHF）

多智能体强化学习

生成式对抗网络（GAN）

平均场博弈（Mean Field Game）

基于人类反馈的强化学习（RLHF）

Deepseek背景综述及在金融领域应用场景初探

China Post Securities· 2025-02-26 11:07

Quantitative Models and Construction Methods Model Name: DeepSeek-R1 - **Model Construction Idea**: The DeepSeek-R1 model leverages a mixture of experts (MoE) architecture and dynamic routing technology to reduce inference costs while maintaining high performance[16] - **Model Construction Process**: - **Mixture of Experts (MoE)**: Integrates multiple "expert" models to enhance overall model performance. A gating network determines which expert(s) should handle specific inputs[27] - **Group Relative Policy Optimization (GRPO)**: Eliminates the need for a separate critic model in reinforcement learning, reducing training costs by using group scores to estimate the baseline[31] - **Self-evolution Process**: The model improves its reasoning capabilities through reinforcement learning, exhibiting complex behaviors like reflection and exploration of alternative approaches[39][41] - **Cold Start**: Introduces high-quality long CoT data to stabilize the model during the initial training phase[42] - **Model Evaluation**: The model demonstrates significant cost efficiency and high performance, making it a groundbreaking development in AI applications[16][43] Model Name: DeepSeek-V2 - **Model Construction Idea**: The DeepSeek-V2 model is a powerful MoE language model designed with innovative architectures like Multi-head Latent Attention (MLA)[23] - **Model Construction Process**: - **Multi-head Latent Attention (MLA)**: Improves performance over traditional Multi-head Attention (MHA) by reducing KV cache, enhancing inference efficiency[25] - **Mixture of Experts (MoE)**: Similar to DeepSeek-R1, it uses a gating network to activate specific experts based on input, optimizing resource usage and performance[27] - **Model Evaluation**: The model shows advantages in performance, training cost, and inference efficiency, making it a strong, economical, and efficient language model[23][27] Model Name: DeepSeek-V3 - **Model Construction Idea**: The DeepSeek-V3 model aims to enhance open-source model performance and push towards general artificial intelligence[33] - **Model Construction Process**: - **Multi-Token Prediction (MTP)**: Enhances model performance by predicting multiple future tokens at each position, increasing training signal density[34] - **FP8 Mixed Precision Training**: Improves computational efficiency and reduces memory usage while maintaining model accuracy by using lower precision data types[36] - **Model Evaluation**: The model effectively balances computational efficiency and performance, making it suitable for large-scale model training[33][36] Model Backtesting Results - **DeepSeek-R1**: Demonstrates significant cost efficiency, achieving performance comparable to ChatGPT-01 with much lower training costs[43] - **DeepSeek-V2**: Shows superior performance and efficiency in training and inference compared to traditional models[23][27] - **DeepSeek-V3**: Achieves high computational efficiency and maintains model accuracy, making it effective for large-scale training[33][36] Quantitative Factors and Construction Methods Factor Name: Scaling Laws - **Factor Construction Idea**: Describes the predictable relationship between model performance and the scale of model parameters, training data, and computational resources[21] - **Factor Construction Process**: - **Scaling Laws**: As model parameters, training data, and computational resources increase, model performance improves in a predictable manner[21] - **Data Quality**: High-quality data shifts the optimal allocation strategy towards model expansion[22] - **Factor Evaluation**: Provides a strong guideline for resource planning and model performance optimization[21][22] Factor Backtesting Results - **Scaling Laws**: Demonstrates a predictable improvement in model performance with increased resources, validating the factor's effectiveness in guiding model development[21][22]

晚点播客丨硅谷怎么看 DeepSeek？与 FusionFund 张璐聊开源、Agent 和除了 AI

晚点LatePost· 2025-02-13 13:01

技术的力量，开源的力量，初创生态的力量。整理丨刘倩 ▲扫描上图中的二维码，可收听播客。《晚点聊 LateTalk》#100 期节目。欢迎在小宇宙、喜马拉雅、苹果 Podcast 等渠道关注、收听我们。《晚点聊 LateTalk》是《晚点 LatePost》推出的播客节目。"最一手的商业、科技访谈，最真实的从业者思考。" 2025 年 1 月，农历春节也没有让模型竞赛丝毫减速。DeepSeek 发布开源推理模型 R1，以相对低的成本，在一些 Benchmark 上比肩，甚至超越了 o1 的表现，在全球掀起了广泛讨论。这期节目，我们邀请了 2015 年，在硅谷创立了 Fusion Fund 的投资人张璐，来和我们一起聊一聊，当前美国科技圈和硅谷语境中，对 DeepSeek 等模型的讨论。我们也延展聊了 DeepSeek-R1 和 o1 等推理模型打开的 Agent（智能体）应用空间；以及在美国的科技投资视野中，除了 AI，大家还在关注什么。 Fusion Fund 曾投资 Grubmarket、Al 会议公司 Otter.ai 还有 Al 与医疗结合的公司 Subtle Medical 等。在 Al ...

智能体（Agent）

科技和生命科学结合（BioTech）

太空科技（SpaceTech）

智能体（Agent）

科技和生命科学结合（BioTech）

太空科技（SpaceTech）