Slime
Search documents
来这场沙龙,一览SGLang X 超长上下文扩展、RL后训练框架、扩散语言模型等前沿技术实践
机器之心· 2026-01-29 08:12
在当前人工智能从"聊天"范式加速向"能办事"的智能体时代演进的关键节点,LLM 系统优化与技术落地的实践探索,更需要开发者们的深度联结与经验共创。 基于此,由 SGLang 社区、机器之心、张江孵化器联合举办线下 Meetup,让屏幕前的贡献者走到台前,让幕后优化者分享实战心法。2 月 6日下午,「 SGLang 上 海 Meetup」 将 在上海浦东·纳贤路 800 号 1 层举办。 本次 Meetup 将围绕 SGLang 技术路线、超长上下文扩展、RL 后训练框架、 扩散语言模型 探索等议题展开深度解析,并设有自由交流环节。诚邀开发者与研究同 仁共赴现场,探讨 LLM 系统优化与落地实践的新可能。 最新日程 最新日程正式揭晓,扫描下方报名二维码,锁定您的专属入场资格。 1层 活动日程 | 13:30-14:00 签 झ | 14:00-14:30 主题分享一 SGLang roadmap 张柏舟 SGLang 核心开发成员 | 14:30-15:00 主题分享二 Omni-infer 对 SGL 的性能优化实践 郑锦焕 Omni-infer 核心开发者 | 15:00-15:30 主题分享三 slime ...
强化学习 AI 系统的设计实现及未来发展
AI前线· 2025-11-12 04:53
Core Insights - The article discusses the application of Reinforcement Learning (RL) in the design of large language model systems and offers preliminary suggestions for future development [3] - It emphasizes the complexity of RL systems, particularly in their engineering and infrastructure requirements, and highlights the evolution from traditional RLHF systems to more advanced RL applications [4][24] Group 1: RL Theory and Engineering - The engineering demands of RL algorithms are multifaceted, focusing on the integration of large language models with RL systems [4] - The interaction between agents and their environments is crucial, with the environment defined as how the language model interacts with users or tools [7][8] - Reward functions are essential for evaluating actions, and advancements in reward modeling have significantly impacted the application of RL in language models [9][10] Group 2: Algorithmic Developments - The article outlines the evolution of algorithms such as PPO, GRPO, and DPO, noting their respective advantages and limitations in various applications [13][19] - The shift from human feedback to machine feedback in RL practices is highlighted, showcasing the need for more robust evaluation mechanisms [11][24] - The GRPO algorithm's unique approach to estimating advantages without relying on traditional critic models is discussed, emphasizing its application in inference-heavy scenarios [19] Group 3: Large-Scale RL Systems - The rapid advancements in RL applications are noted, with a transition from simple human alignment to more complex model intelligence objectives [24] - The challenges of integrating inference engines and dynamic weight updates in large-scale RL systems are outlined, emphasizing the need for efficient resource management [28][35] - Future developments in RL systems will require a focus on enhancing inference efficiency and flexibility, as well as building more sophisticated evaluation frameworks [41][58] Group 4: Open Source and Community Collaboration - The article mentions various open-source frameworks developed for RL, such as Open RLHF and VeRL, which aim to enhance community collaboration and resource sharing [50][56] - The importance of creating a vibrant ecosystem that balances performance and compatibility in RL systems is emphasized, encouraging industry participation in collaborative design efforts [58]
强化学习AI系统的设计实现及未来发展
3 6 Ke· 2025-11-04 12:52
Core Insights - Reinforcement Learning (RL) is a crucial and complex component in enhancing the intelligence of large language models (LLMs) [1][2] - The presentation by Alibaba's algorithm expert, Cao Yu, at AICon 2025 discusses the current state and future directions of RL systems, particularly in the context of LLMs [1][2] Group 1: RL Theory and Engineering - The engineering demands of RL algorithms are multifaceted, focusing on the integration of LLMs as agents within RL systems [3][4] - The interaction between agents and their environments is essential, with the environment defined as how LLMs interact with users or tools [6] - Key components include the reward function, which evaluates the quality of actions taken by the agent, and various algorithms like PPO, GRPO, and DPO that guide policy updates [7][8] Group 2: Algorithm Development and Challenges - The evolution of RL applications has seen a shift from human feedback to more complex reward modeling, addressing issues like reward hacking [9][12] - The traditional PPO algorithm is discussed, highlighting its complexity and the need for a robust evaluation process to assess model capabilities [12][13] - Newer algorithms like GRPO have emerged, focusing on improving the efficiency of the critic model and addressing challenges in training and inference [20][22] Group 3: Large-Scale RL Systems - The rapid advancements in RL have led to a shift from simple human-aligned metrics to more sophisticated models capable of higher reasoning [25][28] - Future RL systems will require enhanced capabilities for dynamic weight updates and efficient resource allocation in distributed environments [36][38] - The integration of various frameworks, such as Ray and DeepSpeed, is crucial for optimizing the performance of large-scale RL systems [49][57] Group 4: Open Source and Community Collaboration - The development of open-source frameworks like Open RLHF and VeRL reflects the industry's commitment to collaborative innovation in RL [53][55] - Companies are encouraged to participate in the design and improvement of RL systems, focusing on efficiency, evaluation, and training balance [58]
2025 National Toy Hall of Fame finalists revealed
NBC News· 2025-09-18 06:05
Industry Recognition - The National Toy Hall of Fame announced this year's top finalists [1] - Notable contenders include Connect Four, Battleship, Furby, and Tickle Me Elmo [1] - Even Snow and Slime are among the finalists [1] Public Engagement - Public voting is available online [1] - The winners will be announced in November [1]