Workflow
Reinforcement Learning
icon
Search documents
X @TechCrunch
TechCrunch· 2025-10-05 15:05
AI tasks that work well with reinforcement learning are getting better fast — and threatening to leave the rest of the industry behind. https://t.co/lFT3lyvg4o ...
Anthropic CEO: AGI Is Marketing
Alex Kantrowitz· 2025-09-30 16:58
AGI and super intelligence. Like you'll hear leaders of companies say we've achieved AGI and we're moving on to super intelligence or like it's really exciting that someone stopped working on AGI and started working on super intelligence. So I think these terms are totally meaningless.I don't know what AGI is. I don't know what super intelligence is. It sounds like a marketing term.Yeah, it sounds like, you know, something something designed to activate people's people's dopamine. So you'll see in public I ...
X @Herbert Ong
Herbert Ong· 2025-09-29 12:48
RT phil beisel (@pbeisel)FSD Version 14Maybe this week we get version 14.0, as foretold by the profit [sic].Key/expected features:1. Version 14 10x larger model than 13, trained on more data and with many reinforcement learning (RL) use cases (see article).2. Version 14 is multi-mode: it can operate in supervised, unsupervised, and unsupervised Robotaxi modes, all software-controlled*.3. Version 14.1 incorporates more fine-tuning based on an expanded range of RL use cases. Use cases likely fielded from Robo ...
Z Event|SF Tech Week10.8硅谷线下会:为什么是现在?RL 的转折点与未来
Z Potentials· 2025-09-28 14:29
Core Insights - Reinforcement Learning (RL) is transitioning from a niche area to a critical component in advancing reasoning, decision-making, and complex scene interactions, especially as developments in Large Language Models (LLMs) reach a bottleneck [3] Group 1: Event Overview - An event is scheduled for October 8th at 6:30 PM in San Francisco, featuring top experts from academia, industry, and startups to discuss the future of RL [4] - The event is organized by Z Potentials in collaboration with HatTrick Capital and Future Builderz, focusing on connecting researchers, founders, and investors [8][9] Group 2: Featured Speakers - Notable speakers include Zeng Dong, an Assistant Professor at UCSB and former NVIDIA AI Researcher, who specializes in RL and intelligent decision-making [6] - Qifei Wang, Research Lead at DeepMind, is leading explorations at the intersection of RL and multimodal integration [6] - Bill Zhu, CEO of Pokee AI and former head of Applied RL at Meta, is working on large-scale RL applications in products [6] - Other speakers include Mike Cheng, Andy Lyu, Daanish Khazi, and Robi Lin, who are influential figures in the RL space and represent a blend of research and entrepreneurial efforts [7]
From Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki
a16z· 2025-09-25 13:00
The big thing that we are targeting is producing an automated researcher. So automating the discovery of new ideas. The next set of evals and milestone that we're looking at will involve actual movement on things that are economically relevant. And I was talking to some some high schoolers and they're saying, "Oh, you know, actually the default way to code is vibe coding. I I do think you know the future hopefully will be vibe researching. " Thanks for coming Jacob and Mark. Jacob, you're the chief scientis ...
X @Elon Musk
Elon Musk· 2025-09-19 13:48
1.21 Gigawatts of training compute!(Actually, slightly more)SangBin Cho (@Saaaang94):We are hiring numerics / quantization expert to scale RL (with @sehoonkim418)! there will be lots of exciting challenges coming with Jax + Sglang + the first Gigawatt cluster in the world (with many hundred thousands of GB200/300)! ...
DeepSeek 创始人梁文锋在《自然》杂志回应质疑,R1 训练真 29.4 万美金
Xin Lang Cai Jing· 2025-09-19 00:03
Core Insights - DeepSeek-R1 has made a significant impact in the AI field by being featured on the cover of Nature, highlighting its innovative approach to enhancing reasoning capabilities in large language models (LLMs) through reinforcement learning (RL) [1][3][5]. Group 1: Achievements and Recognition - The paper "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" was published in January and has now been recognized on the cover of a leading journal, Nature [3]. - DeepSeek-R1 has become the most popular model on Hugging Face after its open-source release, achieving over 10.9 million downloads [5]. - The training cost for DeepSeek-R1 was remarkably low at $294,000, which is significantly less than the costs incurred by competitors like OpenAI and Google [6][7]. Group 2: Training Methodology - DeepSeek-R1 utilizes a novel RL framework that focuses solely on the task format and reward signals based on the correctness of the final answer, allowing for a more organic development of reasoning capabilities [10]. - The model's reasoning accuracy improved dramatically from 15.6% to 77.9% during training, with a peak accuracy of 86.7% when combined with "self-consistent decoding" techniques [10]. Group 3: Self-Evolution and Advanced Strategies - The model exhibited self-evolution behaviors, such as increasing the length of generated text and employing advanced reasoning strategies like self-reflection and systematic exploration of alternative solutions [12][14]. - A notable "Aha Moment" was observed when the model began using the word "wait" more frequently, indicating a shift in its reasoning approach [15][17]. Group 4: Future Development Plans - To address the limitations of DeepSeek-R1, a multi-stage refinement plan has been initiated, which includes cold starting with high-quality conversational data, followed by multiple rounds of RL and supervised fine-tuning [18][19]. - The model's performance has improved by 17%-25% on various benchmarks after undergoing this multi-stage training process [21]. Group 5: Algorithm and Reward System - DeepSeek employs the GRPO (Group Relative Policy Optimization) algorithm, which optimizes model performance by evaluating a group of answers rather than a single best answer, thus reducing resource consumption while maintaining stability [23][24]. - A dual reward system has been established, incorporating both rule-based rewards for reasoning tasks and model-based rewards for general tasks, ensuring the model aligns with human preferences while maintaining its reasoning capabilities [25][26]. Group 6: Challenges and Limitations - Despite its advancements, DeepSeek-R1 faces challenges in structured outputs and tool usage, and it is sensitive to prompts, which limits its effectiveness in complex scenarios [35][37]. - The potential for reward hacking exists, particularly in subjective tasks, which could undermine the model's performance if the reward signals are not robust [37].
xAI 巨像 2 号——全球首个吉瓦级数据中心,独特强化学习方法论及融资计划——半导体分析 --- xAI’s Colossus 2 – First Gigawatt Datacenter In The World, Unique RL Methodology, Capital Raise – SemiAnalysis
2025-09-18 13:09
Summary of xA's Coossus 2 Conference Call Company and Industry Overview - The conference call focuses on xA, a company involved in the development of advanced data centers, specifically the Coossus 2 project, which is positioned as the world's first gigawatt-scale data center [1][2][14]. Key Points and Arguments 1. **Coossus 2 Project Launch**: The Coossus 2 project was initiated on March 7, 2025, with the acquisition of a 1 million square foot warehouse in Memphis and adjacent sites totaling 100 acres [18]. 2. **Cooling Capacity**: By August 22, 2025, xA had installed 11 air-cooled chillers, providing approximately 200 MW of cooling capacity, sufficient to support around 110,000 GB200 NVL72 systems [18]. 3. **Speed of Construction**: xA completed the Coossus 2 project in six months, a significant reduction compared to the 15 months taken by competitors like Oracle, Crusoe, and OpenAI [19]. 4. **Power Infrastructure**: The power infrastructure for Coossus 2 is being developed in Southaven, Mississippi, where xA acquired a former Duke Energy power plant and received temporary approval to operate gas turbines [24][31]. 5. **Partnership with Soaris Energy**: xA has partnered with Soaris Energy Infrastructure, which owns a fleet of 100 MW gas turbines, to enhance power generation capabilities [33][34]. 6. **Future Capacity Plans**: xA aims to scale its power capacity to over 1.5 GW, with plans to deploy additional turbines and infrastructure [40]. 7. **Funding Needs**: The required capital expenditures for Coossus 2 are projected to be in the tens of billions of dollars, raising concerns about xA's ability to generate meaningful external revenue [51]. 8. **Middle East Expansion**: xA is considering large-scale expansion in the Middle East, leveraging existing relationships with regional investors and potential funding sources [56][58]. Additional Important Insights - **Technological Edge**: xA is utilizing unique reinforcement learning methodologies that may allow it to surpass competitors like OpenAI and Anthropic in AI capabilities [14]. - **Internal Revenue Generation**: A significant portion of xA's revenue may come from inter-company transfers, raising questions about the sustainability of its revenue model [67]. - **Investor Sentiment**: There are challenges in justifying xA's valuation, which is nearing $200 billion, especially in comparison to competitors like Anthropic [58]. This summary encapsulates the critical aspects of xA's Coossus 2 project and its strategic positioning within the data center and AI industry, highlighting both opportunities and challenges ahead.
刚刚,梁文锋发Nature了
36氪· 2025-09-18 10:18
Core Viewpoint - DeepSeek's R1 reasoning model has achieved significant recognition by being published in the prestigious journal Nature, marking a milestone in AI research and transparency in the industry [4][22][36]. Group 1: Model Development and Achievements - The DeepSeek-R1 model, developed by Liang Wenfeng's team, is the first mainstream large language model to undergo peer review, breaking a significant gap in the AI industry [4][11][22]. - The model has become the most popular open-source reasoning model globally, with over 10.9 million downloads on Hugging Face [4]. - DeepSeek-R1's research addresses a major issue in AI, enhancing reasoning capabilities through reinforcement learning without relying on extensive human labeling [14][16]. Group 2: Transparency and Peer Review - Nature's editorial highlights the importance of peer-reviewed publications in clarifying how large models work and ensuring their performance aligns with vendor claims [24][25][34]. - The peer review process for DeepSeek-R1 involved eight external experts who provided over a hundred specific comments, enhancing the paper's clarity and credibility [26][29][34]. - DeepSeek's commitment to transparency is evident in the detailed disclosures about model training and safety assessments, which are crucial for mitigating risks associated with AI technologies [11][18][36]. Group 3: Safety and Data Integrity - DeepSeek conducted a comprehensive safety evaluation of the R1 model, demonstrating its superior safety compared to contemporaneous models [11][18]. - The model's training data underwent rigorous decontamination processes to prevent bias and ensure that evaluation results accurately reflect its problem-solving capabilities [17][20]. - Despite acknowledging potential contamination issues in some benchmark tests, DeepSeek has implemented external risk control systems to enhance safety during deployment [18][19]. Group 4: Industry Impact and Future Directions - DeepSeek's open-source model is positioned as a representative of domestic AI technology on the global stage, potentially setting a standard for research transparency in the AI industry [36]. - The call for more AI companies to submit their models for peer review reflects a growing recognition of the need for verified claims and enhanced credibility in AI research [36].
X @外汇交易员
外汇交易员· 2025-09-18 02:30
DeepSeek-R1论文登上Nature期刊封面,提到的是DeepSeek今年1月在arxiv发布的论文《DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning》,通讯作者为梁文锋。Nature编辑认为,同行评审模式对AI大语言模型发展有益,因为基准测试是可被操控,将模型的设计、方法论和局限性交由独立的外部专家审视,能够有效“挤水分”,抑制AI行业炒作。🗒️DeepSeek-R1被认为是首个通过权威学术期刊同行评审的大语言模型。 ...