Workflow
强化学习
icon
Search documents
谢赛宁回忆七年前OpenAI面试:白板编程、五小时会议,面完天都黑了
机器之心· 2025-08-29 09:53
Core Insights - The article discusses the unique interview experiences of AI researchers at major tech companies, highlighting the differences in interview styles and the focus areas of these companies [1][9][20]. Group 1: Interview Experiences - Lucas Beyer, a researcher with extensive experience at top AI firms, initiated a poll about memorable interview experiences at companies like Google, Meta, and OpenAI [2][20]. - Saining Xie shared that his interviews at various AI companies were unforgettable, particularly noting the rigorous two-hour marathon interview at DeepMind, which involved solving over 100 math and machine learning problems [5][6]. - The interview process at Meta was described as more academic, focusing on discussions with prominent researchers rather than just coding [6][7]. Group 2: Company-Specific Insights - The interview style at Google Research was likened to an academic job interview, with a significant emphasis on research discussions rather than solely on coding challenges [7]. - OpenAI's interview process involved a lengthy session focused on a reinforcement learning problem, showcasing the company's commitment to deep research engagement [8][9]. - The article notes that the interview questions reflect the research priorities of these companies, such as Meta's focus on computer vision and OpenAI's emphasis on reinforcement learning [9][20]. Group 3: Notable Interviewers and Candidates - Notable figures like John Schulman and Noam Shazeer were mentioned as interviewers, indicating the high caliber of talent involved in the hiring processes at these firms [7][9]. - Candidates shared memorable moments from their interviews, such as solving complex problems on napkins or engaging in deep discussions about research topics [19][20].
四足机械狗+单臂,低成本开启你的具身学习之旅
具身智能之心· 2025-08-29 04:00
Core Viewpoint - Xdog is a low-cost, multifunctional quadruped robotic dog and robotic arm development platform designed for embodied developers, featuring a comprehensive curriculum for research and learning in robotics [1][2]. Group 1: Hardware Overview - Xdog integrates a robotic dog and robotic arm, with advanced functionalities such as voice control, sim2real, real2sim, target recognition and tracking, autonomous grasping, and reinforcement learning gait control [2][5]. - The robotic dog measures 25cm x 20cm x 30cm and weighs 7.0kg, with a maximum speed of 7.2 km/h and a maximum rotation speed of 450 degrees per second [3][11]. - The main control chip is Allwinner H616, featuring a quad-core 1.6GHz CPU, 4GB RAM, and 32GB storage [4][5]. Group 2: Technical Specifications - The robotic dog has a battery capacity of 93.24Wh, providing approximately 120 minutes of operational time and a standby time of about 6 hours [5][11]. - The robotic arm can reach a maximum height of 0.85m and has a grasping range of 0.4m around its base [7]. - The depth camera features active dual infrared and structured light technology, with a depth output resolution of 1280 × 800 @ 30 fps and a working distance of 0.2m - 10m [14]. Group 3: Software and Functionality - The system supports various control methods including voice control, keyboard control, visual control, and reinforcement learning for autonomous movement [15][17]. - Development is based on ROS1, with Python as the primary programming language, and it is recommended to use a GPU of at least 2080ti for inference [16][24]. - The platform allows for advanced functionalities such as collaborative control of the robotic arm and dog for target following, and autonomous grasping capabilities [19][20]. Group 4: Educational Curriculum - The curriculum includes hands-on training in ROS project creation, Mujoco simulation, and reinforcement learning principles, among other topics [22][23]. - Courses cover the setup and usage of the Xdog system, including network configuration, camera parameter adjustments, and advanced algorithms for object recognition and tracking [22][23]. - The teaching team consists of experienced instructors responsible for project management, technical support, and algorithm training [22]. Group 5: Delivery and Support - The delivery cycle is set to be completed within three weeks after payment, with a one-year warranty for after-sales service [25][26]. - The product includes hardware and accompanying courses, with no returns or exchanges allowed for non-quality issues [26].
基于深度强化学习的轨迹规划
自动驾驶之心· 2025-08-28 23:32
Core Viewpoint - The article discusses the advancements and potential of reinforcement learning (RL) in the field of autonomous driving, highlighting its evolution and comparison with other learning paradigms such as supervised learning and imitation learning [4][7][8]. Summary by Sections Background - The article notes the recent industry focus on new technological paradigms like VLA and reinforcement learning, emphasizing the growing interest in RL following significant milestones in AI, such as AlphaZero and ChatGPT [4]. Supervised Learning - In autonomous driving, perception tasks like object detection are framed as supervised learning tasks, where a model is trained to map inputs to outputs using labeled data [5]. Imitation Learning - Imitation learning involves training models to replicate actions based on observed behaviors, akin to how a child learns from adults. This is a primary learning objective in end-to-end autonomous driving [6]. Reinforcement Learning - Reinforcement learning differs from imitation learning by focusing on learning through interaction with the environment, using feedback from task outcomes to optimize the model. It is particularly relevant for sequential decision-making tasks in autonomous driving [7]. Inverse Reinforcement Learning - Inverse reinforcement learning addresses the challenge of defining reward functions in complex tasks by learning from user feedback to create a reward model, which can then guide the main model's training [8]. Basic Concepts of Reinforcement Learning - Key concepts include policies, rewards, and value functions, which are essential for understanding how RL operates in autonomous driving contexts [14][15][16]. Markov Decision Process - The article explains the Markov decision process as a framework for modeling sequential tasks, which is applicable to various autonomous driving scenarios [10]. Common Algorithms - Various algorithms are discussed, including dynamic programming, Monte Carlo methods, and temporal difference learning, which are foundational to reinforcement learning [26][30]. Policy Optimization - The article differentiates between on-policy and off-policy algorithms, highlighting their respective advantages and challenges in training stability and data utilization [27][28]. Advanced Reinforcement Learning Techniques - Techniques such as DQN, TRPO, and PPO are introduced, showcasing their roles in enhancing training stability and efficiency in reinforcement learning applications [41][55]. Application in Autonomous Driving - The article emphasizes the importance of reward design and closed-loop training in autonomous driving, where the vehicle's actions influence the environment, necessitating sophisticated modeling techniques [60][61]. Conclusion - The rapid development of reinforcement learning algorithms and their application in autonomous driving is underscored, encouraging practical engagement with the technology [62].
理想汽车高管解读Q2财报:将通过辅助驾驶的深度焕新强化产品竞争力
Xin Lang Ke Ji· 2025-08-28 14:46
Core Insights - The management of Li Auto discussed strategies to address the decline in sales of the L series and emphasized the importance of enhancing product competitiveness through intelligent driving features [1] - The company is set to upgrade its entire range of extended-range AD Max models with the VLA intelligent driving system, which has shown significant improvements in driving performance [2] - Li Auto's pure electric vehicle lineup is expected to grow with the introduction of the i6 model, targeting younger consumers and aiming for substantial sales contributions [3] Sales Strategy - The company aims to achieve its overall sales targets by focusing on intelligent features and regional marketing strategies tailored to local market conditions [1][3] - The sales system has been adjusted to manage 23 regions directly from headquarters, allowing for localized policy implementation [3] Product Development - The VLA intelligent driving system is undergoing rapid iterations supported by a simulation environment, enhancing the company's competitive edge in autonomous driving technology [2] - The i8 model has received positive feedback, and the company plans to ramp up production to deliver between 8,000 to 10,000 units by the end of September [3] Marketing and Channel Optimization - The marketing strategy emphasizes "regionalization," focusing on different selling points for northern and southern markets [3] - The company is optimizing its store locations and types to improve customer acquisition and conversion rates, particularly in first to third-tier cities [4] - Expansion into lower-tier cities will be facilitated through a lightweight store model, increasing brand visibility and tapping into underdeveloped markets [4][5]
具身智能之心技术交流群成立了!
具身智能之心· 2025-08-28 08:36
Group 1 - The establishment of the Embodied Intelligence Heart Technology Exchange Group focuses on various advanced technologies including VLA, VLN, remote operation, Diffusion Policy, reinforcement learning, VLA+RL, sim2real, multimodal large models, simulation, motion control, target navigation, mapping and localization, and navigation [1] - Interested individuals can add the assistant's WeChat AIDriver005 to join the community [2] - To expedite the group entry process, it is advised to include a note with the institution/school, name, and research direction [3]
波士顿动力机器狗侧空翻炸场!穿轮滑鞋照样能翻
量子位· 2025-08-28 06:46
Core Viewpoint - Boston Dynamics' Spot robot has demonstrated advanced capabilities, including performing flips, which serve as a rigorous test for its hardware and algorithms, ensuring reliability in real-world operations [18][20][21]. Group 1: Robot Capabilities - Spot can perform various tasks beyond acrobatics, such as climbing stairs, surveying, and opening doors, showcasing its practical applications [10][12][14][16]. - The ability to perform flips is not just for show; it indicates the robustness of Spot's hardware and software systems [20][21]. Group 2: Training and Development - Spot's training involves reinforcement learning in simulated environments before real-world testing, allowing for iterative improvements in stability and performance [22]. - The robot's design includes 12 degrees of freedom and is equipped with five pairs of stereo cameras, enhancing its operational capabilities [22]. Group 3: Historical Context and Popularity - Spot has been a well-known entity since its introduction in 2016, gaining fame through various performances, including dancing to popular songs [27][30]. - The acquisition of Boston Dynamics by Hyundai in 2020 has positioned the company for further growth and innovation in robotics [31].
具身智能之心B端和C端培训老师招募来啦~
具身智能之心· 2025-08-28 01:20
Group 1 - The article announces the recruitment of teachers for embodied intelligence training, targeting both B-end (business) and C-end (consumer) training services, with compensation above industry standards [1] - The training covers various advanced topics including VLA, VLN, remote operation, Diffusion Policy, reinforcement learning, sim2real, multimodal large models, simulation, motion control, and target navigation [2] - B-end training is aimed at enterprises, universities, and research institutions, while C-end training focuses on students and job seekers, with responsibilities including curriculum design and material preparation [3] Group 2 - Candidates are required to have a doctoral degree or higher (including those currently enrolled), with a preference for those who have published two papers in A-level or Q1 journals/conferences, or have two years of industry experience [3] - Interested individuals can add a specified WeChat contact for further inquiries [4]
斯坦福大学提出RTR框架,让机械臂助力人形机器人真机训练
具身智能之心· 2025-08-28 01:20
Core Insights - The article discusses the emerging focus on motion control of humanoid robots as a key application area for reinforcement learning (RL) algorithms, emphasizing the "Sim-to-Real" paradigm and the challenges associated with transferring learned behaviors from simulation to real-world environments [1][2]. Group 1: Current Challenges and Innovations - Current methods primarily utilize domain randomization to train general control models in diverse simulated environments, aiming for zero-shot transfer to real-world dynamics [1][2]. - Recent efforts have begun to explore fine-tuning models with limited real-world data after simulation pre-training, with notable contributions from institutions like NVIDIA and CMU [2]. - The inherent instability of humanoid robots poses significant risks during real-world training, making direct reinforcement learning in these environments a longstanding challenge [2]. Group 2: Proposed Solutions - The article introduces an innovative approach inspired by human learning, where a "teacher" robotic arm guides a "student" humanoid robot through online reinforcement learning [3][5]. - The teacher arm serves multiple roles: providing safety, assisting in resets after failures, collecting training data, and facilitating a structured learning process through curriculum learning [5][7]. Group 3: RTR System Overview - The proposed system, named RTR (Robot-Trains-Robot), highlights the importance of physical assistance from the teacher robot for effective real-world learning [7][9]. - To address the high costs of real-world data collection, a novel RL algorithm is introduced that optimizes a low-dimensional latent variable related to environmental dynamics, significantly enhancing sample efficiency [7][9]. Group 4: Methodology and Experimental Validation - The RTR system comprises hardware and algorithmic components, featuring a UR5 robotic arm as the teacher and a ToddlerBot humanoid as the student [9][10]. - The Sim-to-Real process is divided into three stages: training adaptable policies in simulation, optimizing a general latent variable, and performing online fine-tuning in the real world [10][12]. - Experimental results demonstrate the effectiveness of the RTR system in tasks such as walking and swinging, showing significant improvements in learning efficiency and performance compared to traditional methods [14][18]. Group 5: Future Implications - The RTR framework not only addresses current limitations in humanoid robot training but also introduces a new paradigm of physical assistance that could be applied to larger humanoid robots and other complex robotic systems [16][19]. - The findings suggest that the integration of teacher robots can enhance the learning process, making it more efficient and stable, which is crucial for advancing real-world applications of humanoid robotics [16][17].
打破瓶颈,让RAG学会思考:中科大、智源等发布推理检索框架BGE-Reasoner
机器之心· 2025-08-27 08:36
Core Viewpoint - The article discusses the emergence of BGE-Reasoner, an innovative end-to-end solution for Reasoning-Intensive Information Retrieval (IR), developed by a collaborative team from various Chinese institutions. This solution addresses a critical bottleneck in the development of RAG and AI agents, significantly enhancing their performance in complex reasoning tasks [2][3]. Group 1: BGE-Reasoner Overview - BGE-Reasoner achieved a score of 45.2 on the BRIGHT benchmark, surpassing previous records and demonstrating its effectiveness in reasoning-intensive retrieval tasks [2][12]. - The model represents a significant milestone in the BGE series, providing a new paradigm for tackling industry challenges related to reasoning-intensive retrieval [3]. Group 2: Technical Innovations - A replicable framework consisting of three modular components: Rewriter, Embedder, and Reranker, was proposed to efficiently handle complex queries [3]. - The research team explored the feasibility of synthesizing high-quality, multi-domain reasoning training data using large models, addressing the critical issue of data scarcity in this field [4]. - Reinforcement learning was successfully applied to the Reranker training, enhancing the model's reasoning and generalization capabilities when faced with challenging samples [5]. Group 3: Performance Comparison - BGE-Reasoner outperformed submissions from major institutions such as Ant Group, Baidu, and ByteDance, leading the BRIGHT leaderboard by a margin of 3.6 points [12][14]. - The embedded vector model, BGE-Reasoner-Embed, also demonstrated superior performance compared to other leading baseline models, confirming the effectiveness of the synthesized training data [12][22]. Group 4: System Workflow - The BGE-Reasoner system follows a classic three-module structure: the original query is rewritten, candidates are retrieved using the Embedder, and final results are ranked by the Reranker [19][24]. - The query understanding module utilizes synthesized data to generate reasoning paths, significantly improving the model's query understanding and rewriting capabilities [21]. - The embedded vector model and the Reranker are fine-tuned based on high-quality synthetic training data, enhancing their performance in reasoning-intensive retrieval tasks [22][24]. Group 5: Future Directions - The research team aims to continue advancing vector models and retrieval enhancement technologies, collaborating with more research institutions and industry partners to promote the development of retrieval and artificial intelligence [25].
Meta万引强化学习大佬跑路,用小扎原话作为离别寄语,扎心了
3 6 Ke· 2025-08-27 06:48
Core Viewpoint - The departure of Rishabh Agarwal from Meta has raised concerns about employee retention and morale within the company, especially as he was a key figure in the reinforcement learning domain and had made significant contributions during his tenure [1][3][15]. Group 1: Rishabh Agarwal's Background and Contributions - Rishabh Agarwal has a strong academic and professional background in reinforcement learning, with over 10,000 citations of his work and an h-index of 34 [5][6]. - He was involved in the development of significant models such as Gemini 1.5 and Gemma 2 during his time at Google and later at Meta [3][11]. - His paper "Deep Reinforcement Learning at the Edge of the Statistical Precipice" won the NeurIPS Outstanding Paper Award in 2021, highlighting his expertise in the field [11][13]. Group 2: Implications of His Departure - Agarwal's exit is seen as part of a broader trend of experienced employees leaving Meta, which may be linked to internal conflicts over compensation disparities between new hires and long-term staff [15][17]. - The departure of Agarwal and other senior employees could impact Meta's research capabilities and innovation in artificial intelligence [1][15]. - There are speculations that Agarwal may pursue entrepreneurial ventures, indicating a potential shift in the competitive landscape of AI research [14]. Group 3: Company Culture and Employee Morale - The recruitment drive at Meta has reportedly created friction among employees, leading to threats of resignation from some researchers [17]. - The situation reflects a challenging environment for Meta as it attempts to balance attracting new talent while retaining its existing workforce [17].