强化学习 - filings, earnings calls, financial reports, news - Reportify

强化学习

Search documents

星动纪元招聘！具身多模态、强化学习等多个方向

具身智能之心· 2025-09-17 00:02

Core Viewpoint - The article outlines various job descriptions and requirements for positions related to multi-modal reinforcement learning, data processing, and embodied intelligence, emphasizing the need for advanced skills in AI and machine learning technologies [6][14][15]. Group 1: Job Descriptions - Responsibilities include research, design, and implementation of cutting-edge multi-modal reinforcement learning algorithms to address complex real-world problems [6]. - Involvement in the collection, processing, cleaning, and analysis of multi-modal data to create high-quality training datasets [14]. - Development and optimization of multi-modal models, including training, fine-tuning, and enhancing performance across different tasks [6][15]. Group 2: Job Requirements - Candidates should possess a master's degree or higher in computer science, artificial intelligence, or robotics, with at least one year of research experience in computer vision or embodied intelligence [13]. - Proficiency in programming languages such as Python and deep learning frameworks like PyTorch is essential, along with strong engineering implementation skills [13]. - Experience in publishing papers at top academic conferences (e.g., CVPR, NeurIPS) and contributions to open-source projects are preferred [13][19]. Group 3: Additional Qualifications - Familiarity with multi-modal data cleaning, labeling, and loading, as well as understanding data optimization techniques is required [14]. - Candidates should have experience with large language models and multi-modal models, including knowledge of their capabilities and applicable scenarios [14]. - High standards for data quality and attention to detail are necessary, along with proficiency in data processing tools like Pandas and NumPy [14].

多模态大模型

具身智能系统

多模态大模型

具身智能系统

直击增程消费痛点，别克新能源豪华轿车至境L7全国首秀

Nan Fang Du Shi Bao· 2025-09-16 11:07

Core Insights - SAIC-GM's new luxury electric sedan, the Zhijing L7, was officially unveiled on September 15, featuring the "Zhenlong" range extender system and advanced AI technology [1][3] - The vehicle is positioned in the competitive 200,000-300,000 RMB market segment, aiming to provide consumers with a balanced choice between traditional fuel vehicles and electric cars [1][3] Product Features - The Zhijing L7's range extender system boasts a maximum power output of 252 kW, equivalent to a 3.0T V6 engine, with a 0-100 km/h acceleration time of just 5.9 seconds and a combined fuel consumption of only 0.5L per 100 km [4][6] - The vehicle offers a pure electric range of 302 km and a total range of 1420 km, addressing common consumer concerns regarding range anxiety [4][6] Market Positioning - The luxury and joint venture brands have faced significant challenges in the electric vehicle market, with the Zhijing L7 aiming to fill the gap in the sedan segment for range-extended vehicles [3][4] - The current market for range-extended vehicles is seen as a growing segment, particularly as consumer preferences evolve towards intelligent and electric solutions [6][8] Technological Advancements - The Zhijing L7 is equipped with the Momenta R6 flying wheel model, which enhances its intelligent driving capabilities, including features like "no-stop" city navigation and automated parking [6][8] - The vehicle utilizes Qualcomm's latest SA8775P chip, providing high computational power for its intelligent cabin and driving systems [8][10] Strategic Vision - The company emphasizes a long-term commitment to luxury, comfort, and quietness, aiming to balance various performance aspects rather than focusing solely on standout features [10]

新能源汽车

真龙增程系统

Momenta R6飞轮大模型

新能源汽车

真龙增程系统

Momenta R6飞轮大模型

别克至境L7增程轿车全国首秀

Huan Qiu Wang· 2025-09-16 11:03

2025年9月15日，新能源智能豪华轿车——至境L7首次公开亮相。作为别克高端新能源子品牌"至境"的首款旗舰轿车，至境L7采用顶级"真龙"增程技术，率先搭载"逍遥智行"辅助驾驶系统，全球首发上车基于端到端"强化学习"的Momenta R6飞轮大模型，以及高通最新一代SA8775P芯片。此外，至境L7还拥有豪华底盘和豪华舒享座舱，以及对标高端市场的配置。目前，至境L7已到达全国别克经销商展厅，并开启早鸟计划。设计与舒适：豪华配置与底盘技术至境L7拥有5032mmx1952mmx1500m车身尺寸和3000mm较长轴距。设计师从大自然汲取灵感，塑造了富有流动美感与张力的星空展翼外观，蓄势待发的豪华溜背造型，具备超静谧NVH全车无框车门、隐藏门把手和20吋星光涡扇轮毂。银河星空展翼大灯、星轨浮光展翼尾灯，加上车顶激光雷达，以及标志"逍遥智行"的小蓝灯，将科技融入优雅。座舱采用全新纯净浮岛设计美学，塑造了简洁优雅、势能流淌的错层空间。内饰选材提供270°皮质环绕包覆。湖心岛式顶控、水中石晶雅顶灯，还有门板及仪表台星河金砂饰条，呈现典雅、内敛的东方意蕴，营造高端、雅致的空间氛围。至境L7拥有宽裕的座舱 ...

新能源汽车

新能源汽车

一文读懂GPT-5的绝招，这是决定AI未来的隐形武器

3 6 Ke· 2025-09-16 10:43

Core Insights - The article discusses the significance of the "Universal Verifier" in the evolution of AI models, particularly in the context of GPT-5 and its performance enhancements [2][3] - It highlights the limitations of previous reinforcement learning methods, particularly "Reinforcement Learning with Verifiable Rewards" (RLVR), in complex real-world scenarios where answers are not binary [2][4] - The article outlines two main approaches to developing the Universal Verifier: enhancing the evaluation criteria and allowing models to self-assess their outputs [36][44] Group 1: Universal Verifier and Its Importance - The Universal Verifier is seen as a potential breakthrough in AI, addressing the shortcomings of RLVR by enabling models to evaluate answers in a more nuanced manner [2][10] - The need for a more sophisticated evaluation system arises from the complexity of real-world problems, especially in fields like healthcare and education, where answers are not simply right or wrong [2][11] - The article emphasizes that understanding the Universal Verifier is crucial for grasping the future of AI technology and competition [3] Group 2: Approaches to Developing the Universal Verifier - The first approach involves using large language models (LLMs) as judges to create a more complex evaluation standard, which has been explored in various research papers [4][5][6] - The second approach focuses on self-assessment, where models evaluate their own outputs based on internal confidence levels, reducing reliance on external validation [44][45] - The RaR (Rubrics as Rewards) framework is introduced as a method to create detailed scoring criteria for evaluating model outputs, leading to significant performance improvements in specific domains [19][21][22] Group 3: Performance Improvements and Results - The article presents data showing that models trained using the RaR framework achieved substantial performance gains, with scores in medical evaluations increasing nearly fourfold [21][22] - Comparisons with other evaluation methods indicate that RaR outperformed traditional approaches, demonstrating its effectiveness in complex reasoning tasks [22][24] - The Rubicon framework further enhances the scoring system by incorporating over 10,000 evaluation criteria, leading to improved performance in subjective areas like creative writing [27][28] Group 4: Future Directions and Challenges - The article discusses the limitations of current approaches, noting that while RaR and Rubicon show promise, they still rely on expert-defined criteria, which may hinder scalability [69][70] - The INTUITOR method represents a shift towards internal feedback mechanisms, allowing models to learn without predefined answers, but it also faces challenges in generalizability [59][60] - The OaK architecture is proposed as a long-term vision for AI, aiming for a system that learns and evolves through interaction with the environment, though it remains a distant goal [70][77]

通用验证器

通用验证器

通用验证器

通用验证器

上汽通用汽车“至境L7”公开亮相

Zhong Zheng Wang· 2025-09-16 06:13

Core Viewpoint - SAIC-GM's Buick brand has launched its flagship electric sedan, the Buick Zhijing L7, which aims to compete in the high-end electric vehicle market with advanced technology and features [1] Group 1: Product Launch - The Buick Zhijing L7 made its national debut on September 15 in Shanghai [1] - The vehicle is now available in Buick dealerships across the country and has initiated an early bird program offering lifetime free maintenance for orders placed before September 28 [1] Group 2: Technology and Features - The Zhijing L7 utilizes "True Dragon" range extension technology and is equipped with the "Xiaoyao Zhixing" driver assistance system [1] - It features the Momenta R6 flywheel model based on end-to-end "reinforcement learning" and Qualcomm's latest SA8775P chip, providing a top-tier intelligent electric experience [1] - The vehicle boasts a pure electric range of 302 km and a comprehensive range of 1420 km [1] Group 3: Market Positioning - The Zhijing L7 combines global automotive expertise with local innovation, aiming to enter the first tier of the electric vehicle market [1] - The vehicle is expected to create new opportunities for the Buick brand's development in the new era, leveraging industry-leading range extension technology and luxury experience [1]

SAIC MOTOR(SH:600104)

新能源汽车

真龙增程技术

逍遥智行辅助驾驶系统

Momenta R6飞轮大模型

新能源汽车

真龙增程技术

逍遥智行辅助驾驶系统

Momenta R6飞轮大模型

蚂蚁集团大模型数据智能算法工程师招聘（可内推）

自动驾驶之心· 2025-09-15 23:33

Core Viewpoint - The article discusses the responsibilities and requirements for a position focused on developing advanced algorithms for large model data production, emphasizing the importance of data knowledge systems, automatic classification, authoritative evaluation sets, quality assessment, and innovative solutions in the field of artificial intelligence and deep learning [1][2][3]. Group 1: Responsibilities - The role involves designing and developing algorithms to address key issues in large model data production, including data knowledge system generation, automatic corpus classification, authoritative evaluation set construction, and quality assessment of training data [1][5]. - Specific tasks include researching automatic knowledge graph generation based on LLM, developing classification algorithms, and creating standardized evaluation sets to assess model performance [1][5]. - The position also requires establishing a data-driven system for quality assessment, identifying low-quality data, and synthesizing training data to improve model performance [1][5]. Group 2: Requirements - Candidates should possess a master's degree or higher in computer science, artificial intelligence, deep learning, or related fields, and be proficient in deep learning frameworks such as PyTorch and TensorFlow [2][6]. - Strong problem-solving skills, self-motivation, and the ability to analyze and address issues are essential, along with effective communication and coordination abilities [2][6]. - Preference is given to candidates with practical experience in large model data system design, corpus classification, evaluation set construction, and data annotation algorithms [3][4][6].

蚂蚁大模型

蚂蚁大模型

论文解读之港科PLUTO：首次超越Rule-Based的规划器！

自动驾驶之心· 2025-09-15 23:33

Core Viewpoint - The article discusses the development and features of the PLUTO model within the end-to-end autonomous driving domain, emphasizing its unique two-stage architecture and its direct encoding of structured perception outputs for downstream control tasks [1][2]. Summary by Sections Overview of PLUTO - PLUTO is characterized by its three main losses: regression loss, classification loss, and imitation learning loss, which collectively contribute to the model's performance [7]. - Additional auxiliary losses are incorporated to aid model convergence [9]. Course Introduction - The article introduces a new course titled "End-to-End and VLA Autonomous Driving," developed in collaboration with top algorithm experts from domestic leading manufacturers, aimed at addressing the challenges faced by learners in this rapidly evolving field [12][15]. Learning Challenges - The course addresses the difficulties learners face due to the fast-paced development of technology and the fragmented nature of knowledge across various domains, making it hard for beginners to grasp the necessary concepts [13]. Course Features - The course is designed to provide quick entry into the field, build a framework for research capabilities, and combine theory with practical applications [15][16][17]. Course Outline - The course consists of several chapters covering topics such as the history and evolution of end-to-end algorithms, background knowledge on various technologies, and detailed discussions on both one-stage and two-stage end-to-end methods [20][21][22][29]. Practical Application - The course includes practical assignments, such as RLHF fine-tuning, allowing students to apply their theoretical knowledge in real-world scenarios [31]. Instructor Background - The instructor, Jason, has a strong academic and practical background in cutting-edge algorithms related to end-to-end and large models, contributing to the course's credibility [32]. Target Audience and Expected Outcomes - The course is aimed at individuals with a foundational understanding of autonomous driving and related technologies, with the goal of elevating their skills to the level of an end-to-end autonomous driving algorithm engineer within a year [36].

端到端自动驾驶

多模态大模型

端到端自动驾驶

多模态大模型

字节跳动这篇论文对理想有帮助的

理想TOP2· 2025-09-15 15:32

25年9月11日字节跳动发布 Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents 对理想的帮助之处在于，理想要做agent，大概率会参考的，一样会遇到类似学习信号的强度（梯度大小）与模型决策时的不确定性（熵）存在一种天生的、有害的耦合关系的问题实际和人类学习挺像的，只要结果正确，就容易过渡强化其步骤正确性（类比销量高了，做啥都是对的），遇到一个错误的路径，如果非常自信，容易不反思，无法矫正错误。迷茫探索时遇到错误，容易畏手畏脚，不敢继续探索。本应该被大力强化的自信且正确的步骤，只得到了微调。本应该被严厉惩罚的自信且错误的步骤，也只得到了微调。而那些本应被谨慎对待的不确定的探索步骤，却承受了最剧烈的奖惩，导致训练非常不稳定。字节这篇论文给出了解决这类问题的思路。以下为更细化论述：本质是在讲解决一个当前LLM Agent训练中的核心困境：如何在最终结果"非成即败"（即稀疏奖励）的漫长任务中，知道该奖励或惩罚哪一步决策。在传统的强化学习中，智能体（Agent） ...

LLM Agent训练

Artificial Intelligence

熵调制策略梯度(EMPG)

LLM Agent训练

Artificial Intelligence

熵调制策略梯度(EMPG)

进击新能源第一阵营 “增程豪华轿车新标杆”别克至境L7全国首秀

Yang Zi Wan Bao Wang· 2025-09-15 13:57

Core Viewpoint - The Buick Zhijing L7, a luxury electric vehicle, has been unveiled as the flagship model of Buick's high-end electric sub-brand, showcasing advanced technology and luxury features aimed at redefining the range-extended vehicle segment [1][3]. Group 1: Vehicle Features - The Zhijing L7 is built on the new Buick "Xiaoyao" super fusion architecture, integrating top technologies in driving, assisted driving, and luxury comfort [3]. - It features the "Zhenlong" range-extending system, which offers a maximum power output of 252 kW, equivalent to a 3.0T V6 engine, achieving 0-100 km/h in just 5.9 seconds and a combined fuel consumption of only 0.5L per 100 km [5][7]. - The vehicle boasts a pure electric range of 302 km and a total range of 1420 km, addressing common concerns about range anxiety [5][7]. Group 2: Intelligent Driving and Experience - The Zhijing L7 introduces the "Xiaoyao Zhixing" assisted driving system, featuring the Momenta R6 flywheel model based on end-to-end reinforcement learning, capable of handling complex driving scenarios [8]. - The vehicle has accumulated over 1 billion kilometers of safe driving with its assisted driving technology, positioning it among the top tier of intelligent driving experiences [8]. Group 3: Interior and Comfort - The interior design of the Zhijing L7 emphasizes luxury with a spacious cabin, featuring the industry's first dual 120° zero-gravity seats for enhanced comfort [18][20]. - It is equipped with a 27-speaker Buick Sound theater-level audio system, providing an immersive sound experience akin to being in a top-tier concert hall [18]. Group 4: Design and Aesthetics - The Zhijing L7 showcases a striking exterior design inspired by nature, with a luxurious silhouette and advanced features such as laser radar and high-end lighting [14][16]. - The vehicle's interior utilizes a new pure floating island design aesthetic, creating a sophisticated and elegant atmosphere [16]. Group 5: Market Positioning - As a representative of Buick's redefined brand value in the new energy era, the Zhijing L7 aims to compete in the first tier of the new energy vehicle market, leveraging its advanced range-extending technology and superior luxury experience [20].

逍遥智行辅助驾驶系统

Momenta R6飞轮大模型

高通SA8775P芯片

逍遥智行辅助驾驶系统

Momenta R6飞轮大模型

高通SA8775P芯片

张小珺对话OpenAI姚顺雨：生成新世界的系统

Founder Park· 2025-09-15 05:59

Core Insights - The article discusses the evolution of AI, particularly focusing on the transition to the "second half" of AI development, emphasizing the importance of language and reasoning in creating more generalizable AI systems [4][62]. Group 1: AI Evolution and Language - The concept of AI has evolved from rule-based systems to deep reinforcement learning, and now to language models that can reason and generalize across tasks [41][43]. - Language is highlighted as a fundamental tool for generalization, allowing AI to tackle a variety of tasks by leveraging reasoning capabilities [77][79]. Group 2: Agent Systems - The definition of an "Agent" has expanded to include systems that can interact with their environment and make decisions based on reasoning, rather than just following predefined rules [33][36]. - The development of language agents represents a significant shift, as they can perform tasks in more complex environments, such as coding and internet navigation, which were previously challenging for AI [43][54]. Group 3: Task Design and Reward Mechanisms - The article emphasizes the importance of defining effective tasks and environments for AI training, suggesting that the current bottleneck lies in task design rather than model training [62][64]. - A focus on intrinsic rewards, which are based on outcomes rather than processes, is proposed as a key factor for successful reinforcement learning applications [88][66]. Group 4: Future Directions - The future of AI development is seen as a combination of enhancing agent capabilities through better memory systems and intrinsic rewards, as well as exploring multi-agent systems [88][89]. - The potential for AI to generalize across various tasks is highlighted, with coding and mathematical tasks serving as prime examples of areas where AI can excel [80][82].

语言智能体

语言智能体