强化学习 - filings, earnings calls, financial reports, news - Reportify

强化学习

Search documents

具身智能之心技术交流群成立了！

具身智能之心· 2025-08-28 08:36

Group 1 - The establishment of the Embodied Intelligence Heart Technology Exchange Group focuses on various advanced technologies including VLA, VLN, remote operation, Diffusion Policy, reinforcement learning, VLA+RL, sim2real, multimodal large models, simulation, motion control, target navigation, mapping and localization, and navigation [1] - Interested individuals can add the assistant's WeChat AIDriver005 to join the community [2] - To expedite the group entry process, it is advised to include a note with the institution/school, name, and research direction [3]

多模态大模型

Diffusion Policy

多模态大模型

Diffusion Policy

波士顿动力机器狗侧空翻炸场！穿轮滑鞋照样能翻

量子位· 2025-08-28 06:46

Core Viewpoint - Boston Dynamics' Spot robot has demonstrated advanced capabilities, including performing flips, which serve as a rigorous test for its hardware and algorithms, ensuring reliability in real-world operations [18][20][21]. Group 1: Robot Capabilities - Spot can perform various tasks beyond acrobatics, such as climbing stairs, surveying, and opening doors, showcasing its practical applications [10][12][14][16]. - The ability to perform flips is not just for show; it indicates the robustness of Spot's hardware and software systems [20][21]. Group 2: Training and Development - Spot's training involves reinforcement learning in simulated environments before real-world testing, allowing for iterative improvements in stability and performance [22]. - The robot's design includes 12 degrees of freedom and is equipped with five pairs of stereo cameras, enhancing its operational capabilities [22]. Group 3: Historical Context and Popularity - Spot has been a well-known entity since its introduction in 2016, gaining fame through various performances, including dancing to popular songs [27][30]. - The acquisition of Boston Dynamics by Hyundai in 2020 has positioned the company for further growth and innovation in robotics [31].

具身智能之心B端和C端培训老师招募来啦~

具身智能之心· 2025-08-28 01:20

Group 1 - The article announces the recruitment of teachers for embodied intelligence training, targeting both B-end (business) and C-end (consumer) training services, with compensation above industry standards [1] - The training covers various advanced topics including VLA, VLN, remote operation, Diffusion Policy, reinforcement learning, sim2real, multimodal large models, simulation, motion control, and target navigation [2] - B-end training is aimed at enterprises, universities, and research institutions, while C-end training focuses on students and job seekers, with responsibilities including curriculum design and material preparation [3] Group 2 - Candidates are required to have a doctoral degree or higher (including those currently enrolled), with a preference for those who have published two papers in A-level or Q1 journals/conferences, or have two years of industry experience [3] - Interested individuals can add a specified WeChat contact for further inquiries [4]

Diffusion Policy

多模态大模型

Diffusion Policy

多模态大模型

斯坦福大学提出RTR框架，让机械臂助力人形机器人真机训练

具身智能之心· 2025-08-28 01:20

Core Insights - The article discusses the emerging focus on motion control of humanoid robots as a key application area for reinforcement learning (RL) algorithms, emphasizing the "Sim-to-Real" paradigm and the challenges associated with transferring learned behaviors from simulation to real-world environments [1][2]. Group 1: Current Challenges and Innovations - Current methods primarily utilize domain randomization to train general control models in diverse simulated environments, aiming for zero-shot transfer to real-world dynamics [1][2]. - Recent efforts have begun to explore fine-tuning models with limited real-world data after simulation pre-training, with notable contributions from institutions like NVIDIA and CMU [2]. - The inherent instability of humanoid robots poses significant risks during real-world training, making direct reinforcement learning in these environments a longstanding challenge [2]. Group 2: Proposed Solutions - The article introduces an innovative approach inspired by human learning, where a "teacher" robotic arm guides a "student" humanoid robot through online reinforcement learning [3][5]. - The teacher arm serves multiple roles: providing safety, assisting in resets after failures, collecting training data, and facilitating a structured learning process through curriculum learning [5][7]. Group 3: RTR System Overview - The proposed system, named RTR (Robot-Trains-Robot), highlights the importance of physical assistance from the teacher robot for effective real-world learning [7][9]. - To address the high costs of real-world data collection, a novel RL algorithm is introduced that optimizes a low-dimensional latent variable related to environmental dynamics, significantly enhancing sample efficiency [7][9]. Group 4: Methodology and Experimental Validation - The RTR system comprises hardware and algorithmic components, featuring a UR5 robotic arm as the teacher and a ToddlerBot humanoid as the student [9][10]. - The Sim-to-Real process is divided into three stages: training adaptable policies in simulation, optimizing a general latent variable, and performing online fine-tuning in the real world [10][12]. - Experimental results demonstrate the effectiveness of the RTR system in tasks such as walking and swinging, showing significant improvements in learning efficiency and performance compared to traditional methods [14][18]. Group 5: Future Implications - The RTR framework not only addresses current limitations in humanoid robot training but also introduces a new paradigm of physical assistance that could be applied to larger humanoid robots and other complex robotic systems [16][19]. - The findings suggest that the integration of teacher robots can enhance the learning process, making it more efficient and stable, which is crucial for advancing real-world applications of humanoid robotics [16][17].

ToddlerBot人形机器人

ToddlerBot人形机器人

打破瓶颈，让RAG学会思考：中科大、智源等发布推理检索框架BGE-Reasoner

机器之心· 2025-08-27 08:36

Core Viewpoint - The article discusses the emergence of BGE-Reasoner, an innovative end-to-end solution for Reasoning-Intensive Information Retrieval (IR), developed by a collaborative team from various Chinese institutions. This solution addresses a critical bottleneck in the development of RAG and AI agents, significantly enhancing their performance in complex reasoning tasks [2][3]. Group 1: BGE-Reasoner Overview - BGE-Reasoner achieved a score of 45.2 on the BRIGHT benchmark, surpassing previous records and demonstrating its effectiveness in reasoning-intensive retrieval tasks [2][12]. - The model represents a significant milestone in the BGE series, providing a new paradigm for tackling industry challenges related to reasoning-intensive retrieval [3]. Group 2: Technical Innovations - A replicable framework consisting of three modular components: Rewriter, Embedder, and Reranker, was proposed to efficiently handle complex queries [3]. - The research team explored the feasibility of synthesizing high-quality, multi-domain reasoning training data using large models, addressing the critical issue of data scarcity in this field [4]. - Reinforcement learning was successfully applied to the Reranker training, enhancing the model's reasoning and generalization capabilities when faced with challenging samples [5]. Group 3: Performance Comparison - BGE-Reasoner outperformed submissions from major institutions such as Ant Group, Baidu, and ByteDance, leading the BRIGHT leaderboard by a margin of 3.6 points [12][14]. - The embedded vector model, BGE-Reasoner-Embed, also demonstrated superior performance compared to other leading baseline models, confirming the effectiveness of the synthesized training data [12][22]. Group 4: System Workflow - The BGE-Reasoner system follows a classic three-module structure: the original query is rewritten, candidates are retrieved using the Embedder, and final results are ranked by the Reranker [19][24]. - The query understanding module utilizes synthesized data to generate reasoning paths, significantly improving the model's query understanding and rewriting capabilities [21]. - The embedded vector model and the Reranker are fine-tuned based on high-quality synthetic training data, enhancing their performance in reasoning-intensive retrieval tasks [22][24]. Group 5: Future Directions - The research team aims to continue advancing vector models and retrieval enhancement technologies, collaborating with more research institutions and industry partners to promote the development of retrieval and artificial intelligence [25].

推理密集型信息检索

BGE-Reasoner-Embed

推理密集型信息检索

BGE-Reasoner-Embed

Meta万引强化学习大佬跑路，用小扎原话作为离别寄语，扎心了

3 6 Ke· 2025-08-27 06:48

Core Viewpoint - The departure of Rishabh Agarwal from Meta has raised concerns about employee retention and morale within the company, especially as he was a key figure in the reinforcement learning domain and had made significant contributions during his tenure [1][3][15]. Group 1: Rishabh Agarwal's Background and Contributions - Rishabh Agarwal has a strong academic and professional background in reinforcement learning, with over 10,000 citations of his work and an h-index of 34 [5][6]. - He was involved in the development of significant models such as Gemini 1.5 and Gemma 2 during his time at Google and later at Meta [3][11]. - His paper "Deep Reinforcement Learning at the Edge of the Statistical Precipice" won the NeurIPS Outstanding Paper Award in 2021, highlighting his expertise in the field [11][13]. Group 2: Implications of His Departure - Agarwal's exit is seen as part of a broader trend of experienced employees leaving Meta, which may be linked to internal conflicts over compensation disparities between new hires and long-term staff [15][17]. - The departure of Agarwal and other senior employees could impact Meta's research capabilities and innovation in artificial intelligence [1][15]. - There are speculations that Agarwal may pursue entrepreneurial ventures, indicating a potential shift in the competitive landscape of AI research [14]. Group 3: Company Culture and Employee Morale - The recruitment drive at Meta has reportedly created friction among employees, leading to threats of resignation from some researchers [17]. - The situation reflects a challenging environment for Meta as it attempts to balance attracting new talent while retaining its existing workforce [17].

Meta推理模型

Meta推理模型

打磨7年，李航新书《机器学习方法（第2版）》发布，有了强化学习，赠书20本

机器之心· 2025-08-27 03:18

Core Viewpoint - The article discusses the release of the second edition of "Machine Learning Methods" by Li Hang, which expands on traditional machine learning to include deep learning and reinforcement learning, addressing the growing interest in these areas within the AI community [4][5][22]. Summary by Sections Overview of the Book - The new edition of "Machine Learning Methods" includes significant updates and additions, particularly in reinforcement learning, which has been gaining attention in AI applications [4][5]. - The book is structured into four main parts: supervised learning, unsupervised learning, deep learning, and reinforcement learning, providing a comprehensive framework for readers [5][22]. Supervised Learning - The first part covers key supervised learning methods such as linear regression, perceptron, support vector machines, maximum entropy models, logistic regression, boosting methods, hidden Markov models, and conditional random fields [7]. Unsupervised Learning - The second part focuses on unsupervised learning techniques, including clustering, singular value decomposition, principal component analysis, Markov chain Monte Carlo methods, EM algorithm, latent semantic analysis, and latent Dirichlet allocation [8]. Deep Learning - The third part introduces major deep learning methods, such as feedforward neural networks, convolutional neural networks, recurrent neural networks, Transformers, diffusion models, and generative adversarial networks [9]. Reinforcement Learning - The fourth part details reinforcement learning methods, including Markov decision processes, multi-armed bandit problems, proximal policy optimization, and deep Q networks [10]. - The book aims to provide a systematic introduction to reinforcement learning, which has been less covered in previous textbooks [4][10]. Learning Approach - Each chapter presents one or two machine learning methods, explaining models, strategies, and algorithms in a clear manner, supported by mathematical derivations to enhance understanding [12][19]. - The book is designed for university students and professionals, assuming a background in calculus, linear algebra, probability statistics, and computer science [22]. Author Background - Li Hang, the author, is a recognized expert in the field, with a background in natural language processing, information retrieval, machine learning, and data mining [24].

《机器学习方法（第2版）》

《机器学习方法（第2版）》

手把手教机器人：斯坦福大学提出RTR框架，让机械臂助力人形机器人真机训练

机器之心· 2025-08-27 00:46

Core Viewpoint - The application of reinforcement learning (RL) algorithms in humanoid robot motion control is emerging as a key research area, with a focus on the "Sim-to-Real" paradigm, which aims to train general control models in diverse simulated environments to adapt to the real world [2][3]. Group 1: Current Challenges and Innovations - Existing methods primarily utilize domain randomization to train models in simulation, achieving impressive results in various tasks but often sacrificing performance in specific real-world environments [2][3]. - Recent efforts have begun to explore fine-tuning models with limited real-world data after simulation pre-training, with notable contributions from institutions like NVIDIA and CMU [3]. - The challenge of conducting RL training in real environments has been a significant barrier due to the instability of humanoid robots, which can lead to hardware damage from minor errors [3]. Group 2: Proposed Solution - RTR System - The RTR (Robot-Trains-Robot) system introduces a novel approach where a "teacher" robotic arm guides a "student" humanoid robot through online reinforcement learning, inspired by how human parents teach infants to walk [4][6]. - The teacher arm plays multiple roles: it provides safety support, assists in resetting the student after failures, collects valuable training data, and sets a curriculum to enhance learning efficiency [5][6]. Group 3: Hardware and Algorithm Design - The RTR system consists of a hardware setup with a teacher and student robot, where the teacher is a UR5 robotic arm equipped with force-torque sensors, and the student is based on the open-source ToddlerBot [8][9]. - The system's algorithm involves a three-stage Sim-to-Real process: training adaptable strategies in simulation, optimizing a general initial latent variable, and performing online fine-tuning in the real world with minimal data [9][11]. Group 4: Experimental Validation - Experiments demonstrated the effectiveness of the RTR system in tasks like walking and swinging, showing that the teacher's flexible assistance significantly improves learning outcomes compared to fixed supports [15][19]. - The proposed fine-tuning method using latent variables outperformed traditional methods in data efficiency and final performance, achieving a twofold speed increase in walking strategies with just 20 minutes of real-world training [15][18]. Group 5: Future Prospects - The RTR framework not only addresses the current challenges in deploying humanoid robots but also introduces a new paradigm of physical assistance for real-world learning, with potential applications in larger humanoid robots and other complex robotic systems [17].

仿真到现实

仿真到现实

一天之内，Meta痛失两员大将，小扎钞能力失效？

机器之心· 2025-08-26 08:53

Core Viewpoint - Meta is experiencing significant talent attrition, particularly among top AI researchers, due to internal management issues and a lack of alignment with the company's vision and culture [1][9][39]. Group 1: Talent Departure - Two senior researchers, Rishabh Agarwal and Bert Maher, recently announced their departure from Meta, with Agarwal moving to an unspecified location and Maher joining Anthropic [3][24]. - Agarwal's exit highlights the argument that even high salaries cannot retain top talent, as he follows Zuckerberg's advice on taking risks in a rapidly changing world [14][39]. - Maher, who worked at Meta for 12 years, contributed to significant projects like PyTorch and HHVM, indicating the loss of valuable expertise [25][27]. Group 2: Internal Management Issues - Meta's internal management culture is cited as a reason for its low employee retention rate of 64%, compared to Anthropic's 80% [30][33]. - Previous complaints from former employees, including John Carmack and Tijmen Blankevoort, point to issues such as poor resource utilization, performance evaluation pressures, and internal competition [33][34]. - The lack of a strong CTO to balance the power of the CEO is seen as a potential risk for the company's future stability [11]. Group 3: Cultural Misalignment - Many top researchers are leaving Meta due to a misalignment with the company's focus on speed and profitability, which contrasts with their values of safety, independence, and long-term research [39][40]. - The absence of a compelling mission at Meta makes it difficult for some employees to justify staying, as exemplified by Tesla engineer Yun-Ta Tsai's decision to remain with his current employer for its meaningful goals [40][42]. - The perception that Meta's culture prioritizes financial gain over meaningful work is leading to a reluctance among potential recruits to join the company [39][42].

Meta Platforms(US:META)

Meta万引强化学习大佬跑路！用小扎原话作为离别寄语，扎心了

量子位· 2025-08-26 04:36

Core Viewpoint - The departure of Rishabh Agarwal from Meta highlights a potential trend of employee attrition within the company, raising concerns about internal conflicts and employee satisfaction amidst a hiring spree [1][22][24]. Group 1: Rishabh Agarwal's Departure - Rishabh Agarwal, a prominent figure in reinforcement learning at Meta, is leaving the company after 7.5 years, expressing a desire to explore a completely different path [1][17]. - His contributions include significant work on models like Gemini 1.5 and Gemma 2, and he received the Outstanding Paper Award at NeurIPS in 2021 for his research on statistical instability in deep reinforcement learning [4][14][13]. - Agarwal's next steps remain uncertain, but speculation suggests he may venture into entrepreneurship [17]. Group 2: Employee Turnover at Meta - Agarwal's exit is part of a broader trend, as another long-term employee with 12 years at Meta also announced their departure, joining a competing firm, Anthropic [18][19]. - Reports indicate that tensions between new and old employees regarding salary disparities have led to dissatisfaction, prompting some researchers to threaten resignation [23][24]. - The current hiring surge at Meta may be exacerbating internal conflicts, contributing to the trend of experienced employees leaving the company [22][24].