强化学习 - filings, earnings calls, financial reports, news - Reportify

强化学习

Search documents

VLA/强化学习/VLN方向1v1论文辅导~

具身智能之心· 2025-08-25 06:00

Group 1 - The article announces the availability of 1v1 paper guidance in the field of embodied intelligence, specifically focusing on three areas: vla, reinforcement learning, and sim2real [1] - The guidance is primarily aimed at participants of major conferences such as CVPR, ICCV, ECCV, ICLR, CoRL, ICML, and ICRA [1] - The instructors are actively engaged in the academic field of embodiment and have innovative ideas [1] Group 2 - Interested individuals are encouraged to add a specific WeChat contact for inquiries or to scan a QR code for consultation regarding the paper guidance [2]

具身智能1v1论文辅导

具身智能1v1论文辅导

自动驾驶转具身智能有哪些切入点？

自动驾驶之心· 2025-08-24 23:32

Core Viewpoint - The article discusses the transition from autonomous driving to embodied intelligence, highlighting the similarities and differences in algorithms and tasks between the two fields [1]. Group 1: Algorithm and Task Comparison - Embodied intelligence largely continues the algorithms used in robotics and autonomous driving, such as training and fine-tuning methods, as well as large models [1]. - There are notable differences in specific tasks, including data collection methods and the emphasis on execution hardware and structure [1]. Group 2: Community and Learning Resources - A full-stack learning community named "Embodied Intelligence Heart" has been established to share knowledge related to algorithms, data collection, and hardware solutions in the field of embodied intelligence [1]. - Key areas of focus within the community include VLA, VLN, Diffusion Policy, reinforcement learning, robotic arm grasping, pose estimation, robot simulation, multimodal large models, chip deployment, sim2real, and robot hardware structure [1].

重磅！浙大最新综述，解码40+年足式机器人技术演进与未来挑战

机器人大讲堂· 2025-08-24 13:15

近日，浙江大学流体动力与机电系统国家重点实验室的研究团队在国际期刊《 Cyborg and Bionic Systems 》上发表一篇系统性综述论文，全面梳理单腿机器人在结构设计、建模方法与控制策略等核心领域的发展演进与未来挑战。论文名为《 Bridging the Gap to Bionic Motion: Challenges in Legged Robot Limb Units Design, Modeling, and Control 》，由中国工程院院士领衔的研究团队撰写，系统探讨了实现 "仿生运动"的关键路径，为理解 "让机器人像生物一样灵活行走"这一根本性难题提供了新的思路。该研究的独特价值在于：它不仅追溯了四十多年来从简单伸缩结构到复杂关节系统的演化历程，更重要的是揭示单腿机器人作为多腿机器人 "基本单元"的科学意义 ——通过在简化系统复杂度的前提下聚焦腿足运动本质，为波士顿动力 Spot 、云深处绝影等商业化四足机器人的成功奠定了理论基础。文章链接： https://spj.science.org/doi/10.34133/cbsystems.0365 ▍ 为什么要从 ...

单腿机器人

四足机器人

单腿机器人

四足机器人

在OpenAI炼Agent一年半，回国做出首个开源Agent训练框架！这个30岁清华天才却说：创业不是技术命

AI前线· 2025-08-23 05:32

Core Viewpoint - The article highlights the journey and achievements of Wu Yi, a prominent figure in AI and reinforcement learning, emphasizing his contributions to the field and the unique positioning of his startup, BianSai Technology, which focuses on the AReaL framework for training large models [2][4][8]. Group 1: Career and Achievements - Wu Yi has a distinguished background, being an ACM World Medalist and a coach for the IOI team, with significant experiences at Facebook, ByteDance, and OpenAI [2][4]. - His startup, BianSai Technology, was acquired by Ant Group in 2024, and the team has developed a unique asynchronous reinforcement learning framework called AReaL, which has gained traction on GitHub with 2.4k stars [2][4][8]. Group 2: Insights from OpenAI Experience - Wu Yi's decision to join OpenAI was somewhat serendipitous, as he initially aimed for Google Brain but found OpenAI more accommodating due to its non-profit structure [4][5]. - He emphasizes the importance of evidence-driven decision-making in AI development, advocating for a flexible approach that allows for rapid adjustments based on new findings [5][13]. Group 3: Reinforcement Learning and Competitions - Wu Yi discusses the differences in performance of AI models in competitions like IOI and CCPC, attributing failures to the readiness of the models rather than inherent limitations of AI [6][7]. - He believes that AI's role in competitive programming is akin to sports, where psychological factors and skills play a significant role [6][7]. Group 4: AReaL Framework and Market Position - AReaL is positioned as a unique framework for training agent models, with Wu Yi asserting that there are currently no direct competitors in this space [2][33][36]. - The framework aims to facilitate faster and more effective training of agent models, focusing on user-friendliness and performance [36][37]. Group 5: Future Directions and Challenges - Wu Yi anticipates that multi-agent systems will become increasingly important as the complexity of agent workflows grows, presenting new opportunities for algorithm development [41][42]. - He expresses confidence that agent technology will evolve to become a mainstream interaction form in AI, moving towards more autonomous and proactive roles [42].

又帮到了一位同学拿到了VLA算法岗......

具身智能之心· 2025-08-22 16:03

Core Insights - The article emphasizes the importance of joining the "Embodied Intelligence Heart Knowledge Planet," a comprehensive community for learning and sharing knowledge in the field of embodied intelligence, which is rapidly growing in popularity and demand [1][16][85]. Community Features - The community offers a variety of resources including video content, written materials, learning pathways, Q&A sessions, and job exchange opportunities, aiming to create a robust platform for both beginners and advanced learners in embodied intelligence [1][2][17]. - It has established a job referral mechanism with multiple leading companies in the embodied intelligence sector, facilitating direct connections between job seekers and employers [10][17]. Learning Resources - The community has compiled over 30 technical pathways, covering various aspects of embodied intelligence, such as data collection, algorithm deployment, and simulation [2][16]. - It provides access to nearly 40 open-source projects and 60 datasets related to embodied intelligence, significantly reducing the time needed for research and development [16][30][36]. Networking and Collaboration - The community hosts roundtable discussions and live broadcasts to share insights on the latest developments in the embodied intelligence industry, fostering collaboration among members [4][76]. - Members can freely ask questions and receive guidance on career choices and research directions, enhancing the collaborative learning environment [78]. Industry Insights - The community includes members from renowned universities and leading companies in the field, ensuring a diverse range of expertise and perspectives [16][20][21]. - It provides summaries of industry reports and research papers, keeping members informed about the latest trends and applications in embodied intelligence [23][26].

机器人仿真

视觉语言模型

机器人仿真

视觉语言模型

用三组关键词囊括所有看好理想人士近期对理想的观点

理想TOP2· 2025-08-22 13:29

Core Viewpoint - The article discusses the differing perspectives of VC (Venture Capital) and PE (Private Equity) mindsets towards the company 理想 (Li Auto), highlighting how these mindsets influence the evaluation of the company's potential and performance. VC Mindset - The VC mindset focuses on long-term potential, often looking at a 3-5 year horizon and analyzing the core value or potential of 理想 in the context of being a leading physical AI company [2] - VCs are more tolerant of mistakes and failures during the long-term goal achievement process, believing in the transformative potential of AI technology [2][5] - The VC perspective emphasizes the low marginal cost of software and the significant future value creation potential, regardless of immediate financial metrics [9] PE Mindset - The PE mindset is more short-term oriented, typically evaluating the company on a timeline of less than a year, focusing on concrete financial metrics such as sales volume, revenue, and profit margins [3] - PEs require solid evidence of value and are less forgiving of short-term misjudgments, leading to a more critical view of 理想's recent performance [4][19] - The PE perspective is influenced by recent financial data, which has been disappointing, leading to a low evaluation based on specific performance metrics [15][16] Physical AI - 理想's approach to physical AI combines AI software with hardware, representing a significant advancement over traditional software-hardware integration [6][7] - The article emphasizes the unique capabilities of 理想 in achieving a high level of integration between AI software and hardware, which may be underestimated by those focused solely on hardware or traditional software [7] Recent Performance and Criticism - Recent performance metrics have led to criticism from the PE perspective, particularly regarding delivery targets and product expectations [15][16] - Specific issues highlighted include unmet delivery guidance, product delays, and customer dissatisfaction, which have contributed to a negative perception among PE investors [16] - The article notes that while the VC mindset may overlook these issues due to a focus on long-term potential, the PE mindset is less tolerant of such discrepancies [18][19]

VLA方向的论文还不知怎么下手？有的同学已经CCF-A了......

自动驾驶之心· 2025-08-22 12:00

Core Insights - The article discusses the advancements of the Li Auto VLA driver model, highlighting its improved capabilities in understanding semantics, reasoning, and trajectory planning, which are crucial for autonomous driving [1][3][5] Group 1: VLA Model Capabilities - The VLA model demonstrates enhanced semantic understanding through multimodal input, improved reasoning via thinking chains, and a closer approximation to human driving intuition through trajectory planning [1] - Four core abilities of the VLA model are showcased: spatial understanding, reasoning ability, communication and memory capability, and behavioral ability [1][3] Group 2: Research and Development Trends - The VLA model has evolved from VLM+E2E, integrating various cutting-edge technologies such as end-to-end learning, trajectory prediction, visual language models, and reinforcement learning [5] - While traditional perception and planning tasks are still being optimized in the industry, the academic community is increasingly shifting focus towards large models and VLA, indicating a wealth of subfields still open for exploration [5] Group 3: VLA Research Guidance Program - A VLA research paper guidance program has been initiated, receiving positive feedback, aimed at helping participants systematically grasp key theoretical knowledge and develop their own research ideas [6] - The program includes a structured curriculum over 14 weeks, covering topics from traditional end-to-end autonomous driving to writing methodologies for research papers [9][11][30] Group 4: Course Structure and Requirements - The course is designed for a maximum of 8 participants per session, targeting individuals with a background in VLA and autonomous driving at various academic levels [12][15] - Participants are expected to have a foundational understanding of deep learning, Python programming, and familiarity with PyTorch, with specific hardware requirements suggested for optimal performance [21][22] Group 5: Expected Outcomes - Participants will gain insights into classic and cutting-edge research papers, coding skills, and methodologies for writing and submitting research papers, culminating in the production of a draft paper [20][34] - The program aims to enhance participants' understanding of algorithms, their advantages and disadvantages, and to stimulate their research ideas through structured guidance [20][34]

端到端自动驾驶

视觉语言模型

理想VLA司机大模型

端到端自动驾驶

视觉语言模型

理想VLA司机大模型

AI已迷失方向？强化学习教父Sutton最新发布OaK架构，挑战当前AI范式，提出超级智能新构想

AI科技大本营· 2025-08-22 08:05

Core Concept - The OaK architecture is a systematic response to the need for intelligent agents that can continuously learn, model the world, and plan effectively, aiming to achieve superintelligence through experiential learning [3][5][7]. Group 1: OaK Architecture Overview - OaK architecture is a model-based reinforcement learning framework characterized by continuous learning components, specialized learning rates for each weight, and a five-step evolution path called FC-STOMP [3][26]. - The architecture emphasizes the importance of runtime learning over design-time learning, advocating for online learning where agents learn from real-world interactions [13][14][21]. Group 2: Key Features of OaK - The architecture is designed to be domain-general, empirical, and capable of open-ended complexity, allowing agents to form necessary concepts based on their computational resources [16][19]. - The "Big World" hypothesis posits that the world is far more complex than any intelligent agent can fully comprehend, leading to the conclusion that agents must operate with approximate models and strategies [19][20]. Group 3: Learning Mechanisms - OaK architecture introduces the concept of subproblems, where agents autonomously generate subproblems based on curiosity and intrinsic motivation, facilitating a cycle of problem-solving and feature generation [28][31]. - The architecture's core process involves eight steps that include learning main strategies, generating new state features, creating subproblems, and using learned models for planning [27][29]. Group 4: Challenges and Future Directions - Two significant challenges remain: ensuring reliable continual deep learning and generating new state features, which are critical for the architecture's success [37][38]. - The OaK framework aims to provide a comprehensive solution to fundamental AI problems, offering a mechanism for how learned models can be used for planning, which is currently lacking in AI [40].

大世界视角

大世界视角

快手Klear-Reasoner登顶8B模型榜首，GPPO算法双效强化稳定性与探索能力！

AI前线· 2025-08-22 06:07

Core Viewpoint - The competition in large language models has highlighted the importance of mathematical and coding reasoning capabilities, with the introduction of the Klear-Reasoner model by Kuaishou's Klear team, which achieves state-of-the-art performance in various benchmarks [1][2]. Group 1: Model Performance - Klear-Reasoner outperforms other strong open-source models in benchmarks such as AIME2024 and AIME2025, achieving scores of 90.5% and 83.2% respectively, making it the top 8B model [2]. - The model's performance is attributed to the innovative GPPO (Gradient-Preserving Clipping Policy Optimization) algorithm, which enhances exploration capabilities while maintaining training stability [5][24]. Group 2: Technical Innovations - The GPPO algorithm allows for the retention of all gradients during training, which contrasts with traditional clipping methods that can hinder model exploration and slow down convergence [8][10]. - GPPO enables high-entropy tokens to participate in backpropagation, thus preserving exploration ability and accelerating error correction [10]. Group 3: Training Methodology - The Klear team emphasizes the importance of data quality over quantity during the supervised fine-tuning (SFT) phase, demonstrating that high-quality data sources yield better training efficiency and outcomes [12]. - For high-difficulty tasks, retaining some erroneous samples can enhance model performance by providing additional exploration opportunities [16]. - In the reinforcement learning (RL) phase, using soft rewards based on test case pass rates is more effective than hard rewards, leading to improved training stability and efficiency [19]. Group 4: Future Implications - The release of Klear-Reasoner not only showcases impressive performance but also offers a reproducible and scalable approach for reasoning models in supervised and reinforcement learning tasks, providing valuable insights for future applications in mathematics, coding, and other RLVR tasks [24].

KUAISHOU(HK:01024)

大语言模型

Artificial Intelligence

Klear-Reasoner模型

大语言模型

Artificial Intelligence

Klear-Reasoner模型

从繁杂技巧到极简方案：ROLL团队带来RL4LLM新实践

机器之心· 2025-08-22 04:58

本研究由淘天集团算法技术—未来生活实验室与爱橙科技智能引擎事业部联合完成，核心作者刘子贺，刘嘉顺，贺彦程和王维埙等。未来生活实验室汇聚淘天集团的算力、数据与顶尖技术人才，专注于大模型、多模态等前沿 AI 方向，致力于打造基础算法、模型能力及各类 AI Native 应用，引领 AI 在生活消费领域的技术创新。爱橙科技则在大模型训练与优化方面具有丰富的实践经验。双方此前联合开源了高效大模型强化学习训练框架 ROLL，此次论文工作同样是基于 ROLL 框架的实践探索。近年来，强化学习（Reinforcement Learning, RL）在提升大语言模型（LLM）复杂推理能力方面展现出显著效果，广泛应用于数学解题、代码生成等任务。通过 RL 微调的模型常在推理性能上超越仅依赖监督微调或预训练的模型。也因此催生了大量的相关研究。但随之而来的，是一系列令人困惑的现象：不同研究提出了不同的 RL 优化技巧，却缺乏统一的实验对比和机制解释，有的甚至得出相互矛盾的结论。对于研究者和工程师而言，这种 "方法多、结论乱" 的局面，反而增加了落地应用的难度。为此，阿里巴巴淘天集团和爱橙科技联合多所高校，基 ...

大语言模型

大语言模型