强化学习 - filings, earnings calls, financial reports, news - Reportify

强化学习

Search documents

从实验室到烟火人间：科技落地的美学故事

Bei Jing Ri Bao Ke Hu Duan· 2025-03-30 03:51

Group 1 - The 2025 Zhongguancun Forum showcased advanced robotics and AI technologies, emphasizing the integration of technology and art through performances [3][4][8] - The performance involved a team of engineers and robots, particularly the "Kua Fu" robot, which demonstrated complex movements requiring precise dynamic balance and control [4][6][8] - Collaboration between companies like Beijing General Artificial Intelligence Research Institute and Leju Robotics led to significant advancements in multi-robot coordination and dynamic balance control [7][8] Group 2 - The AI simultaneous interpretation service provided by Huoshan Doubao demonstrated high translation quality and low latency, enhancing communication during the forum [11][16] - The bionic interactive robot "Niya" showcased advanced human-like interactions, significantly improving user experience and engagement at the event [13][16] - The collaboration between different tech companies, such as Good Drink Technology and Galaxy General, highlighted the potential for cross-industry partnerships to enhance service delivery [14][16] Group 3 - The forum served as a platform for testing and showcasing innovative technologies, with a focus on practical applications in real-world scenarios [15][16] - Future plans for robotics and AI development include enhancing movement capabilities, improving translation accuracy, and fostering continuous innovation in technology [15][16] - The event illustrated the evolving role of technology from mere tools to partners in human interaction, reflecting a shift towards more integrated and empathetic technological solutions [17][18][19]

新质生产力

夸父机器人

新质生产力

夸父机器人

与真格戴雨森聊 Agent：各行业都会遭遇 “李世石时刻”，Attention is not all you need

晚点LatePost· 2025-03-28 12:12

" 两瓶茅台的价格体验未来，太划算了。 " 嘉宾丨戴雨森整理丨刘倩程曼祺本期播客，是《晚点聊》与真格基金管理合伙人戴雨森长聊 AI Agent 和 AI 趋势。 3 月 6 日，真格投资的 Monica 发布的 Agent 产品 Manus，虽然还在内测阶段，就引起了大量关注。在期中，雨森提到了 Monica 即将会发布一款 Agent 产品，那时候我们还不知道 Manus 将会席卷社交媒体。当我们把一个任务交给 Manus，过了十几分钟收到完成的结果时，似乎真的感受到了一点 Attention is not all you need 的未来。带来 Agent 等 AI 行业新变化的起点，是去年至今的两个重要节点：o1 和 R1。戴雨森详细分享了他对 Agent 机会的当前观察，以及在 DeepSeek 带来的开源生态的变化中，大小 AI 公司的新动作和调整。 O 系列解锁 Agent 应用，DeepSeek R 系列是开源的胜利、专注的胜利、本 o1 在大语言模型中引入强化学习，开启 Pretraining（预训练）Scaling Law 之外的 Pos ...

Artificial Intelligence

李世石时刻

Artificial Intelligence

Artificial Intelligence

李世石时刻

Artificial Intelligence

抛弃 OpenAI 后，Figure 机器人“进化”：像人一样行走！

AI科技大本营· 2025-03-28 03:41

"AI 的下半场是落地，而具身智能将是最佳载体"。紧接着，Figure 又于近日宣布，其工业机器人 Figure 02 通过纯强化学习算法，成功实现了如人类般自然流畅的行走。强化学习驱动：突破 Sim-to-Real 难题责编 | 梦依丹出品 | CSDN（ID：CSDNnews） Figure 自 2 月宣布与 OpenAI 结束合作转而拥抱完全自主研发路线后，动作频频。先是于 2 月下旬正式发布其倾力打造的机器人操作系统 Helix ，该系统被视为 Figure 实现"真正自主"的关键基石。不仅如此，搭载该模型的 Figure 02 也已进驻物流工厂，承担起快递分拣的重任，显示了其初步的商业化潜力。然而，仅仅在模拟环境中训练是不够的。如何将模拟环境中的学习成果成功迁移到真实的机器人身上，是一个巨大的挑战，被称为 "Sim-to-Real" 问题。为了克服这一难题，Figure 团队采用了两种关键策略：通过将域随机化与高频扭矩反馈控制相结合，Figure 成功地实现了零样本迁移（Zero-Shot Transfer），即无需额外的微调，在模拟环境中训练出的策略可以直接应用于真实的 Fi ...

人形机器人优雅漫步，强化学习新成果！独角兽Figure创始人：之前大家吐槽太猛

量子位· 2025-03-26 10:29

Core Viewpoint - The article highlights the advancements in humanoid robots, particularly focusing on Figure's new model, which utilizes reinforcement learning to achieve more natural walking patterns, resembling human movement more closely [3][4][22]. Group 1: Technological Advancements - Figure's new humanoid robot, Figure 02, demonstrates significant improvements in walking, appearing more human-like with a lighter gait and faster speed [4][6]. - The walking control system is trained using reinforcement learning, which allows the robot to learn how to walk like a human through simulated trials [9][14]. - The training process involves high-fidelity physical simulations, enabling the collection of years' worth of data in just a few hours [10][14]. Group 2: Simulation Techniques - The training incorporates domain randomization and high-frequency torque feedback to bridge the gap between simulation and real-world application, allowing the learned strategies to be applied directly to physical robots without additional adjustments [11][18]. - The robots are exposed to various scenarios during training, learning to navigate different terrains and respond to disturbances [15][18]. Group 3: Future Plans and Industry Context - Figure plans to expand this technology to thousands of Figure robots, indicating a significant scaling of their operations [21]. - The article notes a broader trend in the industry, with many companies, including Vivo, launching their own robotics initiatives, reflecting a growing interest in humanoid robots [24][25].

kHz-速率扭矩反馈控制

kHz-速率扭矩反馈控制

这些大专生，教出人形机器人

盐财经· 2025-03-25 10:39

文｜朱秋雨赖丁萌（实习生）编辑｜向由值班编辑｜宝珠视觉 | 顾芗中国人形机器人赛道最近"好消息"不断。前有深圳的众擎机器人完成全球首例前空翻，后有杭州宇树科技机器人实现720度回旋踢。3月11日，前华为天才少年"智晖君"创立的智元机器人，发布了人形机器人灵犀X2。在视频里，机器人不仅可以像人一样走路、跑步，还能玩滑板车、骑自行车。人们正通向"机器人养老"的美好愿景，而现在，一个新工种随着具身机器人的火爆而出现。在Boss直聘、实习僧等求职APP上，一些公司正招聘学历要求大专以上，名叫"机器人数据采集员"的岗位。在Boss直聘等求职APP上，一些公司正招聘"机器人数据采集员"的岗位这份工作的主要内容包括：负责机器人数据采集工作、控制机器人正确移动、保护机器人处于安全状态，等等。除此以外，很多岗位还列出了对人的外形的要求，有的是，"不戴眼镜，没有高度近视"；有的要求"男生身高170-175，体重65公斤以内；女生160-168，体重55公斤内"；还有的公司要求，"不能有小肚子，身体协调性较好，细心、灵活、有控制力"。这些岗位成功引起了众人的注意。人们不禁好奇：机器人的数据 ...

人形机器人

人形机器人

喝点VC｜a16z关于DeepSeek的内部复盘：推理模型革新与20倍算力挑战下的AI模型新格局

Z Potentials· 2025-03-23 05:10

Core Insights - The article discusses the emergence and significance of DeepSeek, a new high-performance reasoning model from China, highlighting its open-source nature and the implications for the AI landscape [3][4][12]. Group 1: DeepSeek Overview - DeepSeek has gained attention for its performance on AI model rankings, raising both interest and concerns [3]. - The model's open-source release of weights and technical details provides valuable insights into reasoning models and their future development [4][12]. Group 2: Training Process - The training of DeepSeek involves three main steps: pre-training on vast datasets, supervised fine-tuning (SFT) with human-generated examples, and reinforcement learning with human feedback (RLHF) [6][9][10]. - The training process is designed to enhance the model's ability to provide accurate and contextually relevant answers, moving beyond simple question-answering to more complex reasoning [11][12]. Group 3: Innovations and Techniques - DeepSeek R1 represents a culmination of various innovations, including self-learning capabilities and multi-stage training processes that improve reasoning abilities [11][13][14]. - The model employs a mixture of experts (MoE) architecture, which allows for efficient training and high performance in reasoning tasks [15][30]. Group 4: Performance and Cost - The cost of training DeepSeek V3 was approximately $5.5 million, with the transition to R1 being less expensive due to the focus on reasoning and smaller-scale SFT [27][29]. - The article notes that the performance of reasoning models has significantly improved, with DeepSeek R1 demonstrating capabilities comparable to leading models in the industry [31][35]. Group 5: Future Implications - The rise of reasoning models like DeepSeek indicates a shift in the AI landscape, necessitating increased computational resources for inference and testing [31][34]. - The open-source nature of these models fosters innovation and collaboration within the AI community, potentially accelerating advancements in the field [36][39].

有监督微调

Artificial Intelligence

有监督微调

Artificial Intelligence

新技能get！人形机器人学会连续后空翻统共需几步？揭秘→

Zhong Guo Jing Ji Wang· 2025-03-15 08:36

Core Insights - The humanoid robot has recently demonstrated the ability to perform consecutive backflips, showcasing significant advancements in its capabilities [1][2] - The development team utilized innovative hardware design and advanced algorithms to enhance the robot's performance and stability during complex movements [1][3] Group 1: Robot Capabilities - A humanoid robot named N2 has successfully completed multiple consecutive backflips, a feat that is more challenging than front flips due to the mechanics involved [1] - The robot's design includes concentrated weight distribution in the hip area and the use of powerful motors and lightweight materials to improve agility and explosiveness [1] Group 2: Learning Process - The team achieved the robot's backflip capability in just three weeks through a structured learning process that involved dynamic calculations and virtual simulations [2][3] - The training incorporated reinforcement learning, allowing the robot to learn from trial and error, mimicking human learning processes [3] Group 3: Challenges in Training - Training robots in real environments poses risks of damage due to potential errors in movement, necessitating the use of virtual environments for initial training [4] - There are challenges in ensuring that the virtual training accurately reflects real-world conditions to avoid discrepancies during the transition to physical execution [4]

SIASUN(SZ:300024)

人形机器人N2

人形机器人N2

深度｜MiniMax加速调整，收购AI视频创业公司，海螺ai正式改名，或是受DeepSeek影响最小的六小虎

Z Finance· 2025-03-14 11:39

Core Viewpoint - MiniMax is set to acquire Shenzhen-based AI video generation startup Lu Ying Technology (Avolution.ai), aiming for technology complementarity and market expansion in the competitive AI landscape [1][2]. Summary by Sections Acquisition Details - Lu Ying Technology, founded in September 2023, specializes in AI video generation with its core product, YoYo, targeting the anime creator market [1]. - The company has developed the LCM (Latent Consistency Model) visual model, which enhances video generation efficiency and content consistency [2]. - The acquisition is seen as a strategic move for MiniMax to enhance its capabilities in video generation and to compete against larger firms like Baidu and Alibaba [2]. Company Background - Lu Ying Technology's CEO, Huang Zhaoyang, has a strong academic background, having previously worked at SenseTime and NVIDIA [1]. - The company raised approximately 100 million RMB in its angel round financing but faced challenges in securing further funding in 2024 [1]. Market Context - The AI industry in China is experiencing accelerated consolidation, with many startups opting for acquisition due to funding difficulties and commercialization challenges [3]. - Examples include Bian Sai Technology, which was acquired by Ant Group after facing commercialization bottlenecks, and BoFeng Intelligent, which was acquired by OPPO [3][4]. Internal Adjustments at MiniMax - MiniMax is undergoing internal changes, including the departure of key executives and a rebranding of its core product from "Hai Luo AI" to "MiniMax" [5][6]. - The company aims to streamline its brand recognition and enhance its global positioning through these adjustments [6]. Competitive Positioning - MiniMax is noted for its advanced multi-modal model technology, which has achieved breakthroughs in text, visual, and video generation, positioning it favorably in the market [6][7]. - The company has also seen success in international markets, with its product "Talkie" reportedly generating close to tens of millions of dollars in revenue last year [7].

大语言模型

多模态技术

Artificial Intelligence

大语言模型

多模态技术

Artificial Intelligence

喝点VC｜红杉对话OpenAI Deep Research团队：AI Agent将成为今年最具突破性技术，强化学习重新回归主流

Z Potentials· 2025-03-10 03:07

Core Viewpoint - The article discusses the launch and capabilities of OpenAI's "Deep Research," an AI agent that utilizes end-to-end reinforcement learning to enhance efficiency in complex information retrieval and reasoning tasks, significantly reducing the time required for knowledge work from hours to minutes [2][10][24]. Group 1: Product Overview - "Deep Research" is designed to retrieve information from multiple online sources and generate detailed reports, completing tasks in 5 to 30 minutes compared to hours for humans [6][10]. - The product is part of OpenAI's agent series, following the "Operator" agent, with plans for further expansions including a "Shards Seeker" agent [4][6]. - The development of "Deep Research" was inspired by breakthroughs in reasoning paradigms and aims to tackle complex tasks requiring extensive online research and creativity [7][10]. Group 2: Target Users and Applications - The primary users of "Deep Research" include knowledge workers in various fields such as market analysis, medical research, and personal planning [11][12]. - The product has shown significant utility in scientific research, helping users find relevant literature and data [12]. - It is also beneficial for personal tasks like shopping and travel planning, allowing users to save time and make informed decisions [18][19]. Group 3: Technical Mechanism and Innovations - "Deep Research" employs a fine-tuned version of OpenAI's advanced reasoning model, o3, specifically trained for complex web browsing and reasoning tasks [24][25]. - The model's ability to dynamically adjust its search strategies during information retrieval sets it apart from traditional search engines [25][26]. - The integration of a "Chain-of-Thought" summary allows users to understand the reasoning process behind the model's search strategies, enhancing transparency [25][26]. Group 4: Future Developments and Impact - Future plans for "Deep Research" include expanding its capabilities to access private data and improving its analytical functions for more complex tasks [37][38]. - The potential impact of "Deep Research" on various professions, particularly in consulting and healthcare, is significant, as it can drastically reduce the time spent on research tasks [39][40]. - The technology is expected to empower knowledge workers rather than replace them, enhancing their efficiency and decision-making capabilities [39][40].

端到端强化学习

端到端强化学习

GPT-5 有了雏形；OpenAI 和 Manus 研发 Agent 的经验；中国大公司扩大算力投资丨 AI 月报

晚点LatePost· 2025-03-08 12:17

2025 年 2 月的全球 AI 重要趋势。文丨贺乾明 2025 年 2 月的 AI 月报，你会看到：硅谷巨头的新共识：推理能力是大模型的一部分 OpenAI 和 Manus 的 Agent 开发经验 DeepSeek 推动中国大公司加大算力投入，阿里、字节两家加起来，今年就超过 2000 亿 3 家售价过亿的 AI 公司和 23 家获得超过 5000 万美元融资的 AI 公司 OpenAI 时薪 100 美元招专家生产数据提高模型能力这一期月报中，我们开始邀请研究者、创业者和投资人提供一手视角的对每月 AI 趋势和标志性事件的评述和洞察。晚点 AI 月报，每月选取最值得你知道的 AI 信号。以下是我们第 4 期 AI 月报，欢迎大家在留言区补充我们没有提到的重要趋势。技术丨GPT-5 雏形出现，行业新共识诞生 DeepSeek 带来的冲击波继续扩散，全球大模型公司陷入混战：不论是马斯克用超过 10 万张 GPU 训练的 Grok 3，还是 OpenAI 可能投入 10 亿美元训练的 GPT-4.5，或是 Anthropic 融合推理（reasoning）能力的最新模型 Claude 3 ...

Artificial Intelligence

无监督学习

Artificial Intelligence

Artificial Intelligence

无监督学习

Artificial Intelligence