动态规划
Search documents
携程闹乌龙,误发通知全员都被离职了。
猿大侠· 2026-01-18 04:11
Group 1 - The core incident involves a miscommunication at Ctrip, where employees received unexpected layoff notifications due to a system error during a software shutdown [2] - Ctrip clarified that the incident was a mistake related to a system test and there was no actual layoff plan, apologizing to affected employees [2] - The incident sparked various reactions on social media, with some users mocking the situation and others suggesting it was a clever marketing move by Ctrip [2]
洗个澡把offer洗没了。。
猿大侠· 2025-12-20 04:11
Core Viewpoint - The article discusses a recent incident where a job offer was rescinded by an HR representative just 40 minutes after it was extended, suggesting that the company may not be in urgent need of hiring [2]. Group 1: Job Offer Incident - A candidate received a job offer but missed the notification while showering, leading to the offer being withdrawn after 40 minutes [2]. - The HR representative deleted the candidate's contact information, making it impossible for the candidate to contest the decision [2]. - This behavior indicates that the company may not be genuinely desperate for hires, as a true need would likely result in more patience regarding the candidate's response time [2]. Group 2: Algorithm Problem - The article presents a LeetCode problem (Problem 1186) that involves finding the maximum sum of a subarray after optionally deleting one element [5]. - The problem requires returning the maximum sum of a non-empty subarray after performing at most one deletion [5]. - Two dynamic programming states are defined: `dp[i][0]` for the maximum sum without deletion and `dp[i][1]` for the maximum sum with one deletion [9]. Group 3: Dynamic Programming Approach - The dynamic programming approach involves defining recursive relations to calculate the maximum sums based on previous states [9]. - The base case initializes the first element's values for both states: `dp[0][0]` is set to the first element, while `dp[0][1]` is initialized to zero [10]. - The algorithm iterates through the array, updating the states and keeping track of the maximum sum found [11][12].
GPT-5惨遭零分打脸,顶级AI全军覆没,奥特曼AI博士级能力神话破灭
3 6 Ke· 2025-09-16 00:39
Group 1 - The FormulaOne benchmark test reveals the limitations of top AI models, with GPT-5 achieving only about 4% accuracy on advanced questions and scoring zero on the most difficult problems [1][6][19] - The benchmark, developed by AAI, aims to measure algorithmic reasoning depth beyond competitive programming, focusing on real-world optimization problems [8][15] - The test consists of 220 novel graph-based dynamic programming problems categorized into three levels of difficulty: shallow, deeper, and deepest [16][18] Group 2 - AAI was founded by Amnon Shashua, co-founder of Mobileye, and focuses on AI research and development [10][11] - The benchmark's problems are designed to be easily understandable but require significant creativity and deep reasoning to solve [19][22] - The challenges presented in the deepest level of the benchmark highlight the gap between current AI capabilities and the reasoning required for complex real-world problems [25][30]
基于深度强化学习的轨迹规划
自动驾驶之心· 2025-08-28 23:32
Core Viewpoint - The article discusses the advancements and potential of reinforcement learning (RL) in the field of autonomous driving, highlighting its evolution and comparison with other learning paradigms such as supervised learning and imitation learning [4][7][8]. Summary by Sections Background - The article notes the recent industry focus on new technological paradigms like VLA and reinforcement learning, emphasizing the growing interest in RL following significant milestones in AI, such as AlphaZero and ChatGPT [4]. Supervised Learning - In autonomous driving, perception tasks like object detection are framed as supervised learning tasks, where a model is trained to map inputs to outputs using labeled data [5]. Imitation Learning - Imitation learning involves training models to replicate actions based on observed behaviors, akin to how a child learns from adults. This is a primary learning objective in end-to-end autonomous driving [6]. Reinforcement Learning - Reinforcement learning differs from imitation learning by focusing on learning through interaction with the environment, using feedback from task outcomes to optimize the model. It is particularly relevant for sequential decision-making tasks in autonomous driving [7]. Inverse Reinforcement Learning - Inverse reinforcement learning addresses the challenge of defining reward functions in complex tasks by learning from user feedback to create a reward model, which can then guide the main model's training [8]. Basic Concepts of Reinforcement Learning - Key concepts include policies, rewards, and value functions, which are essential for understanding how RL operates in autonomous driving contexts [14][15][16]. Markov Decision Process - The article explains the Markov decision process as a framework for modeling sequential tasks, which is applicable to various autonomous driving scenarios [10]. Common Algorithms - Various algorithms are discussed, including dynamic programming, Monte Carlo methods, and temporal difference learning, which are foundational to reinforcement learning [26][30]. Policy Optimization - The article differentiates between on-policy and off-policy algorithms, highlighting their respective advantages and challenges in training stability and data utilization [27][28]. Advanced Reinforcement Learning Techniques - Techniques such as DQN, TRPO, and PPO are introduced, showcasing their roles in enhancing training stability and efficiency in reinforcement learning applications [41][55]. Application in Autonomous Driving - The article emphasizes the importance of reward design and closed-loop training in autonomous driving, where the vehicle's actions influence the environment, necessitating sophisticated modeling techniques [60][61]. Conclusion - The rapid development of reinforcement learning algorithms and their application in autonomous driving is underscored, encouraging practical engagement with the technology [62].