Workflow
强化学习
icon
Search documents
在OpenAI“创新已经变得困难”,离职高管深喉爆料
3 6 Ke· 2026-01-23 13:12
Group 1 - OpenAI is facing an innovation dilemma due to rising costs and growth pressures, which have affected its appetite for risk and hindered cross-team collaboration [3][8] - The rise of Google is attributed to OpenAI's failure to maintain its competitive edge, suggesting that OpenAI should have continued to lead the market [3][4] - The AI industry is experiencing a convergence among top companies, making it difficult for researchers to pursue innovative paths outside mainstream machine learning paradigms [3][4] Group 2 - The talent war in the AI sector has become dramatic, with frequent job changes among researchers, leading to less time spent on actual work [4][42] - Innovation is not solely driven by star researchers; the company's ability to foster a sense of personal responsibility and an environment that allows exploration is crucial [4][5] - The lack of focus, rather than a shortage of computing power, is identified as a key barrier to innovation within AI labs [5][19] Group 3 - The timeline for achieving Artificial General Intelligence (AGI) is projected around 2029, with critical areas of focus being architectural innovation and continuous learning [5][30] - Reinforcement learning is making a comeback, as historical patterns show that good ideas often resurface, but the challenge lies in determining the right timing for their importance [5][24] Group 4 - OpenAI's organizational structure is limiting its ability to support certain research directions, leading to a realization that some desired research cannot be pursued within the current framework [9][10] - The industry is witnessing a lack of diversity in approaches, with many companies following similar technological paths, which is seen as a regrettable trend [15][17] Group 5 - The current competitive landscape is characterized by a few major AI companies using similar technological foundations, resulting in minimal differentiation among their products [15][17] - The pressure to deliver results and maintain competitiveness is causing organizations to shy away from risk-taking, which is essential for genuine innovation [18][19] Group 6 - The significant resource barriers in AI research are hindering innovative attempts, as many promising ideas lack the necessary funding for large-scale experimentation [20][21] - The balance between exploration and exploitation is a critical issue in optimizing AI agents and should also be reflected in organizational decision-making [21][22] Group 7 - The importance of world models in AI training is emphasized, suggesting that integrating world understanding with reinforcement learning could lead to significant advancements [27][30] - Continuous learning and the integration of training and operational phases are identified as essential capabilities that are currently lacking in AI models [30][31] Group 8 - The rapid evolution of AI technology necessitates a cautious approach to its deployment, as the implications of new advancements can have far-reaching effects on society [37][38] - The ongoing discourse around AI technologies is marked by a mix of excitement and concern, highlighting the need for responsible discussions about their impact [40][41]
基于9份官网的急招岗位, 推测理想在做人形机器人
理想TOP2· 2026-01-22 12:16
Core Viewpoint - The company is actively recruiting for various engineering positions related to humanoid robotics, indicating a strategic focus on developing advanced robotic technologies. Group 1: Job Positions and Responsibilities - The company is hiring for roles including dexterous robotic hand design, algorithm development, embedded software engineering, and joint module engineering, suggesting a comprehensive approach to humanoid robot development [1][5] - Specific roles emphasize the need for experience in bipedal walking algorithms, coordination of body parts, and the design of multi-joint systems, indicating a focus on complex motion control [1][8][9] Group 2: Technical Specifications - The robotic hand will utilize a direct drive structure rather than a cable-driven system, highlighting a preference for advanced motor technologies such as hollow cup and brushless DC motors [2] - The core driving mechanism will involve high-power joint modules with hundreds of amperes and a hybrid architecture combining reinforcement learning and model predictive control, emphasizing the importance of advanced algorithms in motion control [3] - The system architecture will incorporate STM32 or TI C2000 series microcontrollers, with real-time operating systems like FreeRTOS and RT-Linux, indicating a focus on high-performance computing and real-time processing [4] Group 3: Design Challenges - The design of the dexterous hand requires independent control of fingers and the ability to handle multimodal data from visual and tactile sensors, which presents significant engineering challenges in terms of bandwidth and latency [4] - The integration of high-torque direct drive motors in compact spaces while managing heat dissipation and wiring issues is a critical design challenge, necessitating collaboration between mechanical and electronic engineering teams [4]
李弘扬团队PlannerRFT:扩散轨迹规划新方案,提升复杂驾驶场景性能(同济&港大)
自动驾驶之心· 2026-01-21 09:16
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Hongchen Li等 编辑 | 自动驾驶之心 同济、上海创智学院、港大OpenDriveLab等团队的工作。基于闭环强化学习和高效微调的Diffusion Planner - PlannerRFT。提炼几个关键点: 基于扩散模型的规划器已成为自动驾驶中生成类人轨迹的一种极具潜力的方法。近期研究通过生成-评估循环中的奖励导向优化,将强化微调融入扩散规划器以提升其 鲁棒性。然而,这些方法难以生成多模态、场景自适应的轨迹,阻碍了微调过程中信息性奖励的利用效率。 为解决这一问题,港大OpenDriveLab联合同济大学等研究团队提出PlannerRFT——一种适用于基于扩散模型规划器的样本高效强化微调框架。PlannerRFT采用双分支 优化策略,在不改变原始推理流程的前提下,同时优化轨迹分布并自适应引导去噪过程朝向更具潜力的探索方向。为支持大规模并行学习,本文开发了nuMax仿真 器,其轨迹推演速度较原生nuPlan快10倍。大量实验表明,Pla ...
DeepSeek新模型“MODEL1”曝光
第一财经· 2026-01-21 08:56
Core Insights - The article discusses the emergence of a new model named "MODEL1" from DeepSeek, which is expected to be distinct from the existing "V32" model, potentially indicating advancements in architecture and performance [4][5]. Group 1: Model Development - "MODEL1" is likely to represent a new model architecture, differing from "V32" in key technical aspects such as KV cache layout, sparsity handling, and support for FP8 data format decoding [4]. - The new model is nearing completion, with indications that it is in the final stages of training or inference deployment, awaiting weight freezing and testing validation [4]. Group 2: Industry Impact - The anticipation surrounding DeepSeek's new flagship model, expected to be released in February, suggests it may surpass current top models in programming capabilities [5]. - The release of DeepSeek-R1 has significantly influenced the open-source community, leading to increased contributions from major Chinese companies and startups, with downloads of Chinese models on Hugging Face surpassing those from the U.S. [8]. Group 3: Research and Innovation - Recent technical papers from DeepSeek introduce new training methods and an AI memory module, hinting at the integration of these innovations into the upcoming model [6]. - The previous flagship model, V3, established a strong performance foundation, and the subsequent R1 model excelled in complex reasoning tasks, setting high expectations for future releases [6].
R1模型发布一周年 DeepSeek新模型“MODEL1”曝光
Xin Lang Cai Jing· 2026-01-21 04:05
Core Insights - DeepSeek has unveiled a new model architecture named "MODEL1" as part of its FlashMLA software, which is designed to optimize large model inference generation on NVIDIA GPUs [1][2] - MODEL1 is expected to be a highly efficient inference model with lower memory usage compared to the existing V3.2 model, making it suitable for edge devices and cost-sensitive applications [2] - The company is set to launch its next flagship AI model, DeepSeek V4, in mid-February 2025, which is anticipated to enhance coding capabilities [3] Group 1 - The FlashMLA tool analyzes a total of 114 code files and identifies the MODEL1 architecture mentioned 31 times [1] - MODEL1 supports multiple GPU architectures, including specific implementations for NVIDIA H100/H200 and B200, indicating a tailored optimization for the latest GPU technology [2] - DeepSeek's existing models represent two technical routes: the V series focusing on comprehensive performance and the R series targeting complex reasoning tasks [2] Group 2 - The V3 model, launched in December 2024, established a strong performance foundation with its efficient MoE architecture, followed by rapid iterations leading to V3.2 [3] - The R1 model, released in January 2025, excels in complex reasoning tasks through reinforcement learning and introduces a "deep thinking" mode [3] - Recent technical papers from DeepSeek suggest ongoing development of new models that may integrate innovative training methods and AI memory modules [3]
滴滴清华签署未来出行联合研究中心二期合作协议
Bei Jing Shang Bao· 2026-01-20 12:45
自2019年滴滴与清华大学成立未来出行联合研究中心以来,双方在共享出行、大数据与人工智能、自动 驾驶等领域累计联合开展科研合作近百项,多项技术从实验室创新转化为产业应用。2025年,滴滴与王 建强教授团队联合攻关的"智能车辆行驶安全关键技术及产业化应用"项目,获北京市科技进步一等奖。 期间,双方通过联合课程建设、设立实习实践基地等方式,共同培养行业优秀人才。 北京商报讯(记者 魏蔚)1月20日,滴滴宣布已与清华大学签署未来出行联合研究中心二期合作协议, 同时开启强化学习产学研深度融合专项合作。双方将进一步加强前沿技术联合攻关、创新人才培养,在 未来出行生态构建等方面持续深化合作。 ...
人形机器人与强化学习交流群来啦~
具身智能之心· 2026-01-20 09:30
具身智能之心人形机器人与强化学习技术交流群成立了,欢迎从事RL、人形机器人相关方向的同学加入。 感兴趣的同学添加小助理微信AIDriver005,备注"方向+机构+姓名/昵称"。 ...
以DiffusionDriveV2为例,解析自动驾驶中强化学习的使用
自动驾驶之心· 2026-01-20 09:03
Core Viewpoint - The rapid development of large models has propelled reinforcement learning (RL) to unprecedented prominence, becoming an essential part of post-training in the autonomous driving sector. The shift to end-to-end (E2E) learning necessitates the use of RL to address challenges that imitation learning cannot solve, such as the centering problem in driving behavior [1]. Understanding Reinforcement Learning Algorithms in Autonomous Driving - Proximal Policy Optimization (PPO) and Generalized Recurrent Policy Optimization (GRPO) are currently the most prevalent algorithms in the field. The article emphasizes the importance of understanding reward optimization through classic algorithms [2]. PPO and GRPO Algorithm Insights - The classic PPO algorithm, particularly the PPO CLIP variant, is discussed with a focus on its application in autonomous driving. The formula for the algorithm is provided, highlighting the interaction between the system and the environment over multiple steps [3]. - The evaluation of actions in trajectory generation is based on overall trajectory quality rather than individual points, which is crucial for effective RL training [3]. RL Loss and DiffusionDriveV2 Architecture - The RL loss function is composed of three parts: anchor design, group design from GRPO, and the denoising process of diffusion. Each component plays a critical role in trajectory generation and optimization [9]. - The denoising process is framed as a Markov Decision Process (MDP), where each denoising step represents a decision-making step within the MDP framework [10]. Intra-Anchor and Inter-Anchor GRPO - Intra-Anchor GRPO modifies the group concept to ensure that each anchor has its own group, which is essential for distinguishing different driving behaviors. This prevents the dominance of straight driving data over other behaviors [12]. - Inter-Anchor GRPO addresses the risk of lacking global constraints between different anchors, optimizing the advantage calculation further [13]. Additional Improvements - The article discusses improvements such as trajectory noise management and the introduction of a model selector, which are crucial for ensuring the reliability and effectiveness of the RL approach in autonomous driving [15]. Conclusion - The article uses DiffusionDriveV2 to elucidate the application of reinforcement learning in autonomous driving, indicating that the current state of RL in this field is still evolving. The expectation is for advancements in closed-loop simulation and deeper applications of RL [15].
红杉资本:这就是AGI
3 6 Ke· 2026-01-20 08:20
红杉资本认为: 2026年为AGI元年,其核心标志是"长时程智能体"的成熟。AI已从简单的对话者演变为具备 自主推理与迭代能力的执行者,能像人类一样在模糊环境中解决复杂问题。商业范式将 从"销售软件"转向"销售工作成果",智能体正成为能全天候工作的"数字员工"。通过强化学 习与代理架构驱动,其能力每7个月翻倍,正彻底重塑生产力边界。 这一转变将对商业和投资领域产生深远影响。红杉资本分析认为,随着智能体能力的指数级增长,创始 人构建产品的逻辑将发生根本性变化——从销售软件转向直接"销售工作成果"。未来的AI应用将不再仅 仅是辅助工具,而是能够作为"同事"全天候并行工作的实体,用户将从独立贡献者转变为智能体团队的 管理者。 随着Claude Code和其他编程智能体在近期跨越了关键的能力阈值,市场对于AGI的认知已被重塑。文章 强调,通过强化学习和代理架构的优化,智能体处理复杂任务的能力正在以每7个月翻一番的速度增 长,这将彻底改变企业的人才结构与生产力边界。 01 功能性定义:AGI即"自行解决问题"的能力 红杉资本表示,作为投资者,他们无意介入AGI的技术定义之争,而是提出了一个务实的功能性定义: AGI就是 ...
未知机构:弘则研究科技国内外AI应用冰火两重天模型和应用的矛盾加剧发布于2026年-20260120
未知机构· 2026-01-20 02:40
Summary of Key Points from the Conference Call Industry Overview - The report focuses on the structural changes in the global AI industry as of early 2026, particularly highlighting the divergence in AI application markets between China and the United States [1][1]. Macro Trends and Market Divergence - The AI application market in China and the U.S. is experiencing a stark contrast, described as "ice and fire" [1][1]. - U.S. software stocks have significantly declined since January 2026, primarily due to concerns raised by Anthropic's release of an Agent product capable of fully automated workflows, which has disrupted market perceptions of software development costs and value [1][1]. AI Application Ecosystem - The Chinese AI application ecosystem is more inclined towards "closed-loop integration," with leading companies leveraging their own traffic and ecosystems to rapidly implement Agent functionalities [2][2]. - Since August 2025, upstream computing power (chips, devices, storage) has shown strong performance, while downstream application sectors (internet, software companies) have exhibited weakness [2][2]. Technology Evolution and Model Landscape - Basic models are entering a linear growth phase, with the first tier consisting of Anthropic, OpenAI, and Gemini, while the second tier includes Grok, Zhiyu, and Kimi [3][3]. - Domestic models like Tongyi Qianwen are lagging, while Deepseek V4 is expected to challenge the first tier [3][3]. - There has been no breakthrough leap in capabilities, but overall abilities are steadily improving [4][4]. - Multimodal capabilities are becoming critical, with models like Google’s NanoBanana enhancing Agent performance in various applications [4][4]. - Vertical models are shifting towards a "post-training + reinforcement learning" approach, internalizing expert reasoning rather than relying on external retrieval systems [4][4]. Comparison of Domestic and International AI Applications - In China, companies like ByteDance, Tencent, and Alibaba are integrating AI into their ecosystems effectively, with Alibaba's Tongyi Qianwen being recognized as the first true consumer-facing Agent [5][5]. - In contrast, international players like Anthropic focus on programming workflows, while OpenAI and Google are still primarily chatbot-oriented, lacking in task planning capabilities [5][5]. Investment Logic and Recommendations - Upstream sectors such as storage (DRAM/HBM/SSD), semiconductor equipment, and power equipment are expected to benefit from the shift in AI inference demand and TSMC's planned capital expenditure increase of 30%-40% in 2026 [6][6]. - Platform companies that integrate ecosystems, models, and traffic are highlighted, with Alibaba and Tencent being key players in China [6][6]. - Recommendations for terminal scene companies include Meitu, Roblox, and Reddit, while ToB tool companies like Adobe and Figma are noted for their collaborations with large model companies [7][7]. Core Judgments and Outlook - The year 2026 is termed the "third year of the Agent," with high market premiums but uncertain outcomes [7][7]. - The core competitiveness of Agents is shifting from "general dialogue" to "automated workflow execution," particularly in vertical fields like programming and healthcare [7][7]. - Domestic AI applications are advancing rapidly in consumer markets due to closed ecosystems, while international markets are more disruptive in B2B workflow automation [7][7]. - Storage demand is transitioning from training to inference, with SSDs expected to become the foundational infrastructure for the next generation of Agents [7][7]. - The document emphasizes a critical turning point in the AI industry from "model competition" to "application implementation," with clear divergence in paths between China and the U.S. [7][7].