Workflow
强化学习(RL)
icon
Search documents
字节跳动:2025年思考模型Seed-Thinking-v1.5技术报告
Sou Hu Cai Jing· 2025-08-22 09:20
Core Insights - ByteDance has introduced Seed1.5-Thinking, a state-of-the-art reasoning model with 20 billion activated parameters and a total of 200 billion parameters, demonstrating exceptional reasoning capabilities across various benchmarks [1][5][60] - The model achieved scores of 86.7 on AIME 2024, 55.0 on Codeforces, and 77.3 on GPQA, showcasing its strengths in STEM and coding tasks while also exhibiting strong generalization abilities in non-reasoning tasks [1][5][49] Model Performance - Seed1.5-Thinking matches OpenAI's o3-mini-high in AIME 2024 but still lags behind in AIME 2025 and BeyondAIME challenges [2][49] - In the GPQA task, Seed1.5-Thinking's performance is close to o3-level, achieving a score of 77.3% [49] - The model outperforms DeepSeek R1 by 8% in overall user preference in non-reasoning tasks, indicating its broader applicability [1][5][51] Development Aspects - The development of Seed1.5-Thinking focuses on three key areas: training data, reinforcement learning (RL) algorithms, and RL infrastructure [10][12][60] - The training data includes a mix of STEM problems, coding tasks, and logic reasoning, with a strong emphasis on chain-of-thought data for supervised fine-tuning [10][15][23] - The RL training employs innovative frameworks like VAPO and DAPO to address instability issues, ensuring robust training trajectories [12][10] Infrastructure and Efficiency - The model utilizes a hybrid engine architecture and a Streaming Rollout System (SRS) to enhance training efficiency and scalability [2][42][44] - The SRS architecture allows for dynamic adjustments in sample ratios and optimizes memory usage, significantly improving training speed [43][44] Future Directions - The team plans to explore more efficient RL methods and tackle more complex tasks, aiming to push the boundaries of the model's intelligence [2][60] - Upcoming releases will include internal benchmarks like BeyondAIME and Codeforces to support further research in the field [2][5]
Science Robotics 通过人机交互强化学习进行精确而灵巧的机器人操作
机器人圈· 2025-08-22 09:02
Core Insights - The article discusses the challenges and advancements in robotic manipulation, particularly focusing on the potential of Reinforcement Learning (RL) to enhance robotic skills and performance in real-world applications [2][3][4]. Group 1: Challenges in Robotic Manipulation - Robotic manipulation remains a significant challenge, with traditional methods requiring extensive manual design and large-scale data collection, limiting their deployment in real-world scenarios [2]. - RL offers a promising alternative by enabling robots to autonomously acquire complex skills through interaction, but issues related to sample efficiency and safety hinder its full potential in real environments [3]. Group 2: HIL-SERL Framework - The UC Berkeley BAIR lab introduced a revolutionary RL framework called Human-in-the-Loop Sample-Efficient Robotic Reinforcement Learning (HIL-SERL), which integrates multiple components to effectively train vision-based RL strategies for general robotic manipulation [4]. - HIL-SERL achieves remarkable performance, reaching a 100% success rate across tasks with only 1-2.5 hours of training, significantly outperforming baseline methods that average below 50% success [4][12]. Group 3: Methodology and Results - To address optimization stability, a pre-trained visual backbone is utilized for policy learning, while a sample-efficient non-policy RL algorithm is employed to manage sample complexity, combining human demonstrations and corrections [5]. - The system's ability to learn from human corrections is crucial for improving performance, especially for challenging tasks that are difficult to learn from scratch [5][12]. - The tasks tackled include complex operations like assembling furniture and flipping objects in a pan, demonstrating the system's robustness even under external disturbances [7][11]. Group 4: Performance Metrics - The trained RL strategies show a 101% increase in average success rate and a 1.8 times reduction in cycle time compared to traditional imitation learning methods, indicating RL's superior capability in real-world training scenarios [12][21]. - The system's design allows for dual-arm coordination and the execution of intricate tasks, showcasing its flexibility across various operational contexts [21].
3个月!搞透VLA/VLA+触觉/VLA+RL/具身世界模型等方向!
具身智能之心· 2025-08-22 00:04
Core Viewpoint - The exploration of Artificial General Intelligence (AGI) is increasingly focusing on embodied intelligence, which emphasizes the interaction and adaptation of intelligent agents within physical environments, enabling them to perceive, understand tasks, execute actions, and learn from feedback [1]. Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, which are advancing the technology of embodied intelligence [3]. - Major domestic companies like Huawei, JD, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build a robust ecosystem for embodied intelligence, while international firms like Tesla and investment institutions are supporting companies like Wayve and Apptronik in the development of autonomous driving and warehouse robots [5]. Technological Evolution - The development of embodied intelligence has progressed through several stages: - The first stage focused on grasp pose detection, which struggled with complex tasks due to a lack of context modeling [6]. - The second stage involved behavior cloning, allowing robots to learn from expert demonstrations but revealing weaknesses in generalization and performance in multi-target scenarios [6]. - The third stage introduced Diffusion Policy methods, enhancing stability and generalization by modeling action sequences, followed by the Vision-Language-Action (VLA) model phase, which integrates visual perception, language understanding, and action generation [7][8]. - The fourth stage, starting in 2025, aims to integrate VLA models with reinforcement learning, world models, and tactile sensing to overcome current limitations [8]. Product and Market Development - The evolution of embodied intelligence technologies has led to the emergence of various products, including humanoid robots, robotic arms, and quadrupedal robots, serving industries such as manufacturing, home services, dining, and medical rehabilitation [9]. - The demand for engineering and system capabilities is increasing as the industry shifts from research to deployment, necessitating higher engineering skills for training and simulating strategies on platforms like Mujoco, IsaacGym, and Pybullet [23]. Educational Initiatives - A comprehensive curriculum has been developed to cover the entire technology route of embodied "brain + cerebellum," including practical applications and real-world projects, aimed at both beginners and advanced learners [10][20].
能横着走的轮足机器人诞生?
机器人大讲堂· 2025-08-19 10:32
Core Viewpoint - The article highlights the innovative design and capabilities of the new wheel-legged robot FLORES developed by Hong Kong University of Science and Technology (Guangzhou), emphasizing its versatility in navigating various terrains and its energy efficiency compared to traditional robots [1][9]. Group 1: Robot Design and Capabilities - FLORES is capable of running on flat surfaces and climbing stairs, making it a unique addition to the field of robotic technology [1][6]. - The robot can traverse different terrains, including smooth roads, cobblestones, and grass, while maintaining stability and speed, showcasing its multi-terrain adaptability [6][12]. - Experimental results indicate that FLORES consumes only 30% of the energy of traditional wheel-legged robots during straight-line movement and 35% during turns, marking it as an energy-efficient option in the robotics sector [9][12]. Group 2: Technical Specifications - The hardware components of FLORES include an Intel i7-1165G7 CPU, various motors for joint movement, and a lithium battery with a capacity of 44.4V 8AH, which contribute to its performance [11]. - The innovative design features a unique front leg configuration that allows for seamless transitions between wheel and leg movements, enhancing navigation efficiency [15][18]. Group 3: Future Developments - The research team plans to enhance FLORES with a lightweight robotic arm for object manipulation and to enable more complex bipedal movements through advanced simulation techniques [20]. - FLORES is particularly suited for environments where flat surfaces and stairs coexist, making it ideal for tasks such as material transport and patrolling in office buildings and shopping malls [20].
VLA/VLA+触觉/VLA+RL/具身世界模型等方向教程来啦!
具身智能之心· 2025-08-18 00:07
Core Viewpoint - The exploration of Artificial General Intelligence (AGI) is increasingly focusing on embodied intelligence, which emphasizes the interaction and adaptation of intelligent agents within physical environments, enabling them to perceive, understand tasks, execute actions, and learn from feedback [1]. Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, leading to the establishment of valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, which are advancing the technology of embodied intelligence [3]. - Major domestic companies like Huawei, JD.com, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build a robust ecosystem for embodied intelligence, while international players like Tesla and investment firms are supporting companies like Wayve and Apptronik in the development of autonomous driving and warehouse robots [5]. Technological Evolution - The development of embodied intelligence has progressed through several stages: - The first stage focused on grasp pose detection, which struggled with complex tasks due to a lack of context modeling [6]. - The second stage involved behavior cloning, allowing robots to learn from expert demonstrations but revealing weaknesses in generalization and performance in multi-target scenarios [6]. - The third stage introduced Diffusion Policy methods, enhancing stability and generalization by modeling action sequences, followed by the emergence of Vision-Language-Action (VLA) models that integrate visual perception, language understanding, and action generation [7]. - The fourth stage, starting in 2025, aims to integrate VLA models with reinforcement learning, world models, and tactile sensing to overcome current limitations [8]. Product and Market Development - The evolution of embodied intelligence technologies has led to the emergence of various products, including humanoid robots, robotic arms, and quadrupedal robots, serving industries such as manufacturing, home services, dining, and medical rehabilitation [9]. - The demand for engineering and system capabilities is increasing as the industry shifts from research to deployment, necessitating training in platforms like Mujoco, IsaacGym, and Pybullet for strategy training and simulation testing [23]. Educational Initiatives - A comprehensive curriculum has been developed to cover the entire technology route of embodied "brain + cerebellum," including practical applications and advanced topics, aimed at both beginners and those seeking to deepen their knowledge [10][20].
VLA/VLA+触觉/VLA+RL/具身世界模型等!国内首个具身大脑+小脑算法实战教程
具身智能之心· 2025-08-14 06:00
Core Viewpoint - The exploration of Artificial General Intelligence (AGI) is increasingly focusing on embodied intelligence, which emphasizes the interaction and adaptation of intelligent agents within physical environments, enabling them to perceive, understand tasks, execute actions, and learn from feedback [1]. Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, which are advancing the technology of embodied intelligence [3]. - Major domestic companies like Huawei, JD.com, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build a robust ecosystem for embodied intelligence, while international players like Tesla and investment firms are supporting companies like Wayve and Apptronik in the development of autonomous driving and warehouse robots [5]. Technological Evolution - The development of embodied intelligence has progressed through several stages: - The first stage focused on grasp pose detection, which struggled with complex tasks due to a lack of context modeling [6]. - The second stage involved behavior cloning, allowing robots to learn from expert demonstrations but revealing weaknesses in generalization and performance in multi-target scenarios [6]. - The third stage introduced Diffusion Policy methods, enhancing stability and generalization by modeling action sequences, followed by the emergence of Vision-Language-Action (VLA) models that integrate visual perception, language understanding, and action generation [7][8]. - The fourth stage, starting in 2025, aims to integrate VLA models with reinforcement learning, world models, and tactile sensing to overcome current limitations [8]. Product and Market Development - The evolution of embodied intelligence technologies has led to the emergence of various products, including humanoid robots, robotic arms, and quadrupedal robots, serving industries such as manufacturing, home services, dining, and medical rehabilitation [9]. - The demand for engineering and system capabilities is increasing as the industry shifts from research to deployment, necessitating skills in platforms like Mujoco, IsaacGym, and Pybullet for strategy training and simulation testing [24].
OpenAI联合创始人Greg Brockman:对话黄仁勋、预言GPT-6、我们正处在一个算法瓶颈回归的时代
AI科技大本营· 2025-08-13 09:53
Core Insights - The article emphasizes the importance of focusing on practical advancements in AI infrastructure rather than just the theoretical discussions surrounding AGI [1][3] - It highlights the duality of the tech world, contrasting the "nomadic" mindset that embraces innovation and speed with the "agricultural" mindset that values order and reliability in large-scale systems [3][5] Group 1: Greg Brockman's Journey - Greg Brockman's journey from a young programmer to a leader in AI infrastructure showcases the evolution of computing over 70 years [3][5] - His early experiences with programming were driven by a desire to create tangible solutions rather than abstract theories [9][10] - The transition from academia to industry, particularly his decision to join Stripe, reflects a commitment to practical problem-solving and innovation [11][12] Group 2: Engineering and Research - The relationship between engineering and research is crucial for the success of AI projects, with both disciplines needing to collaborate effectively [27][29] - OpenAI's approach emphasizes the equal importance of engineering and research, fostering a culture of collaboration [29][30] - The challenges faced in integrating engineering and research highlight the need for humility and understanding in team dynamics [34][35] Group 3: AI Infrastructure and Future Directions - The future of AI infrastructure requires a balance between high-performance computing and low-latency responses to meet diverse workload demands [45][46] - The development of specialized accelerators for different types of AI tasks is essential for optimizing performance [47][48] - The concept of "mixture of experts" models illustrates the industry's shift towards more efficient resource utilization in AI systems [48]
为何强化学习火遍硅谷?AGI的关键一步
Hu Xiu· 2025-08-07 07:46
Group 1 - Reinforcement Learning (RL) has become a mainstream trend in Silicon Valley for building technical architectures and model pre-training, following its previous popularity during the AlphaGo era [1][2][3] - Top talent in reinforcement learning is highly sought after by major tech companies and investors in Silicon Valley [1][2] Group 2 - The discussion highlights the evolution of models and the commercialization of AI agents, focusing on the latest technological directions [2][3] - The acquisition of ScaleAI by Meta is driven by the need for high-quality data annotation, particularly in multimodal contexts like video and image data [31][36] Group 3 - There are two main decision-making frameworks in RL: one based on large language models (LLMs) and another that focuses on actions rather than language tokens [5][6] - RL is particularly effective for tasks that are goal-driven, such as coding, mathematics, and financial analysis, where data may be scarce [10][11] Group 4 - The consensus is that supervised learning is effective for tasks with abundant labeled data, while RL from human feedback (RLHF) can enhance model performance to align with human preferences [8][9] - The challenges of RL pre-training include the need for counterfactual learning and the difficulty of generating data for unique tasks [27][28] Group 5 - The conversation touches on the five levels of Artificial General Intelligence (AGI) as defined by OpenAI, with a focus on the significant gap between agent-based AI and innovative AI [15][21] - The potential for RL to discover new knowledge and contribute to superintelligence is discussed, emphasizing the importance of verification mechanisms [12][13] Group 6 - The importance of reward design in RL is highlighted, as it can significantly impact the behavior and outcomes of AI agents [55][56] - The future of AI agents will depend on their ability to balance multiple objectives and optimize performance across various tasks [56][63] Group 7 - The conversation indicates that the landscape of AI companies is evolving, with a potential for significant mergers and acquisitions in the near future [64][65] - The need for companies to focus on technical paths that ensure profitability and sustainability is emphasized, as high operational costs can lead to challenges in growth [63][64]
国内首个具身大脑+小脑算法实战全栈教程
具身智能之心· 2025-08-07 02:38
Core Insights - The exploration towards Artificial General Intelligence (AGI) highlights embodied intelligence as a key direction, focusing on the interaction and adaptation of intelligent agents within physical environments [1] - The development of embodied intelligence is marked by the evolution of technology from low-level perception to high-level task understanding and generalization [6][9] Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, transitioning from laboratories to commercial and industrial applications [3] - Major domestic companies like Huawei, JD, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build an ecosystem for embodied intelligence, while international players like Tesla and investment firms support advancements in autonomous driving and warehouse robotics [5] Technological Evolution - The evolution of embodied intelligence technology has progressed through several stages: - The first stage focused on grasp pose detection, which struggled with complex tasks due to a lack of context modeling [6] - The second stage involved behavior cloning, allowing robots to learn from expert demonstrations but revealing weaknesses in generalization and performance in multi-target scenarios [6] - The third stage introduced Diffusion Policy methods, enhancing stability and generalization through sequence modeling [7] - The fourth stage, emerging in 2025, explores the integration of VLA models with reinforcement learning and tactile sensing to overcome current limitations [8] Product Development and Market Growth - The advancements in embodied intelligence have led to the development of various products, including humanoid robots, robotic arms, and quadrupedal robots, serving industries such as manufacturing, home services, and healthcare [9] - The demand for engineering and system capabilities is increasing as the industry shifts from research to deployment, necessitating higher engineering skills [13] Educational Initiatives - A comprehensive curriculum has been developed to assist learners in mastering the full spectrum of embodied intelligence algorithms, covering topics from basic tasks to advanced models like VLA and its integrations [9][13]
揭秘:OpenAI是如何发展出推理模型的?
Hua Er Jie Jian Wen· 2025-08-04 07:02
Core Insights - OpenAI's journey towards developing general AI agents began unexpectedly with a focus on mathematics, which laid the groundwork for their reasoning capabilities [2][3] - The success of ChatGPT was seen as a surprising outcome of this foundational work, which was initially low-profile but ultimately led to significant consumer interest [2][3] - OpenAI's CEO Sam Altman envisions a future where users can simply state their needs, and AI will autonomously complete tasks, highlighting the potential benefits of AI agents [3] Group 1: Mathematical Foundations - The initial focus on mathematics was crucial as it serves as a testbed for logical reasoning, indicating that a model capable of solving complex math problems possesses foundational reasoning abilities [2][3] - OpenAI's model recently won a gold medal at the International Mathematical Olympiad, showcasing the effectiveness of their reasoning capabilities developed through mathematical challenges [3] Group 2: Breakthrough Innovations - In 2023, OpenAI achieved a significant leap in reasoning capabilities through an innovative approach known as "Strawberry," which combined large language models, reinforcement learning, and test-time computation [4][5] - This combination led to the development of a new method called "Chain-of-Thought," allowing models to demonstrate their reasoning processes rather than just providing answers [6] Group 3: Nature of AI Reasoning - OpenAI researchers are pragmatic about the nature of AI reasoning, focusing on the effectiveness of models in completing complex tasks rather than strictly adhering to human-like reasoning processes [7] - The company's culture emphasizes a bottom-up approach to research, prioritizing breakthrough ideas over short-term product gains, which has enabled significant investments in reasoning models [7] Group 4: Future Directions - Current AI agents show promise in well-defined tasks but struggle with more subjective tasks, indicating a need for advancements in training models for these areas [8] - OpenAI is exploring new universal reinforcement learning techniques to enable models to learn skills that are difficult to verify, as demonstrated by their IMO gold medal model [8] Group 5: Competitive Landscape - OpenAI, once the leader in the AI industry, now faces strong competition from companies like Google, Anthropic, xAI, and Meta, raising questions about its ability to maintain its lead in the race towards advanced AI agents [9]