Reinforcement Learning
Search documents
AI大神10亿美元创业,不走寻常路
Sou Hu Cai Jing· 2026-02-21 07:38
同时,David Silver 创立的这家公司计划绕过大语言模型,直接通过强化学习(IT之家注:reinforcement learning)训练 AI,最终创造出"超级智能"。 值得注意的是,David Silver 的职业生涯可谓传奇,他在十年前主导开发了AlphaGo和AlphaStar,分别 击败了围棋冠军和《星际争霸》顶尖玩家,重塑人们对"AI"对认知。 2014 年谷歌收购 DeepMind 后,他还成为了 Gemini 等模型发展的关键推手。 知情人士透露,这家公司的估值大约是 40 亿美元(现汇率约合 276.63 亿元人民币),目前 David Silver 仍在与资方谈判,条款可能随时改动。不过即使在当前阶段,这种融资规模也体现出投资人乐于 看见行业顶尖大佬单飞创业"。 据悉,David Silver 去年底从谷歌 DeepMind 离职后立即引发了风投机构对激烈竞争。红杉资本合伙人 Alfred Lin 和 Sonya Huang 在他立志不久就意图会面,英伟达、谷歌和微软等互联网巨头也意图参与投 资。 IT之家 2 月 21 日消息,当地时间 2 月 18 日,英国 AI 研究学者Da ...
Applovin(APP) - 2025 Q4 - Earnings Call Transcript
2026-02-11 23:02
Financial Data and Key Metrics Changes - Revenue for Q4 2025 was $1.66 billion, representing a 66% year-over-year increase, driven by advancements in technology and seasonal strength [13] - Adjusted EBITDA for Q4 was $1.4 billion, up 82% year-over-year, with an 84% margin, reflecting a 700 basis point expansion from the previous year [13][15] - Free Cash Flow for Q4 was $1.31 billion, an 88% increase year-over-year, contributing to a cash balance of $2.5 billion [15] - For the full year 2025, revenue reached $5.48 billion, growing 70% year-over-year, with Adjusted EBITDA of $4.51 billion, up 87% year-over-year [15][16] Business Line Data and Key Metrics Changes - The e-commerce initiative has shown strong growth, with existing customers increasing their spend significantly as models improve [21] - The self-service platform for e-commerce was launched in Q4, leading to new customer acquisition and increased spending from existing customers [21][22] Market Data and Key Metrics Changes - The MAX auction is critical for the ecosystem, with increased competition leading to higher bid density and overall growth in the market [8][9] - The company is not seeing evidence of a declining mobile gamer demographic, indicating a stable market for casual gaming [10][11] Company Strategy and Development Direction - The company focuses on leveraging AI to enhance its platform and improve monetization for publishers, believing that increased content creation will lead to more opportunities [10][11] - The strategy includes helping smaller businesses scale through the platform, similar to its approach in the gaming sector [51][52] Management's Comments on Operating Environment and Future Outlook - Management expressed confidence in the company's strong operating performance despite market volatility and competition concerns, emphasizing the disconnect between market sentiment and actual business performance [6][12] - The outlook for Q1 2026 anticipates revenue between $1.745 billion and $1.775 billion, indicating 5%-7% sequential growth [17] Other Important Information - The company repurchased approximately 800,000 shares for $482 million in Q4, with a total of 6.4 million shares repurchased for $2.58 billion in 2025 [16] - The company maintains a remaining share repurchase authorization of approximately $3.28 billion [16] Q&A Session Summary Question: E-commerce opportunity and self-service launch - Management noted that the e-commerce business is performing well, with significant increases in spend from existing customers and new customer acquisition through the self-service platform [20][21] Question: Automation of ad creatives - The company is still early in the automation process for ad creatives, with plans to roll out generative AI tools to help customers create video ads more efficiently [25][28] Question: Black box nature of the model - Management acknowledged the challenges in providing clear metrics for investors but emphasized the potential for growth as the platform matures and more advertisers come on board [34][36] Question: Impact of AI on the business - Management believes that AI will lower content creation costs, leading to an explosion of content, which will enhance the company's advertising solutions [44][46] Question: Changes in customer types due to self-service - The self-service launch has allowed smaller businesses to enter the platform, leading to direct correlations between ad spend and revenue growth for these companies [49][50] Question: Marketing investment and growth expectations - Management indicated that they are cautious about ramping up marketing until the necessary tools are in place, but they are optimistic about the potential for growth [67][70]
Uber launches an ‘AV Labs' division to gather driving data for robotaxi partners
TechCrunch· 2026-01-27 13:00
Core Insights - Uber is launching a new division called Uber AV Labs to provide data to its more than 20 autonomous vehicle partners, focusing on democratizing access to valuable real-world driving data [1][9] Group 1: Uber's Strategy and Operations - Uber is not returning to developing its own robotaxis but will collect data using its own vehicles equipped with sensors for partners like Waymo and Lucid Motors [2] - The new AV Labs division currently operates with a single vehicle, a Hyundai Ioniq 5, and is in the process of equipping it with necessary sensors [10] - Uber's VP of engineering stated that the lab aims to build a foundational data set before determining product market fit, emphasizing the company's responsibility to accelerate the autonomous vehicle ecosystem [10] Group 2: Data Collection and Value - Real-world driving data is increasingly valuable for training self-driving systems, as companies shift from rules-based operations to reinforcement learning [3] - The physical limit of an autonomous vehicle company's fleet restricts data collection, making extensive real-world driving essential for addressing edge cases [5] - Uber's approach to data collection is targeted, allowing for deployment in specific cities based on partner needs, which contrasts with Tesla's broader scale of data collection [13][14] Group 3: Collaboration with Partners - Partners will not receive raw data; instead, Uber will process the data to fit the needs of its partners, enhancing the semantic understanding for driving software [11] - Uber plans to run partner driving software in "shadow mode" to identify discrepancies and improve model training, aiming to make autonomous vehicles drive more like humans [12] - Partners have expressed a strong desire for any helpful data, recognizing that Uber's data collection capabilities far exceed their own [15]
硅谷“钱太多”毁了AI ?!前OpenAI o1负责人炮轰:别吹谷歌,Q-Star 被炒成肥皂剧,7年高压被“逼疯”!
Xin Lang Cai Jing· 2026-01-25 01:24
Core Insights - Jerry Tworek's departure from OpenAI highlights a growing divide between AI research and commercialization, as he seeks to pursue riskier foundational research that is increasingly difficult within a company focused on user growth and commercial strategies [2][3][4] - Tworek criticizes the AI industry for a lack of innovation, noting that major companies are developing similar technologies, which pressures researchers to prioritize short-term gains over experimental breakthroughs [3][4][24] - He emphasizes that OpenAI's slow response to competition from Google was a significant factor in its current position, suggesting that the company made critical missteps despite its initial advantages [3][4] Company Dynamics - Tworek points out that employee turnover can indicate deeper issues within a company, suggesting that if many key personnel leave due to misalignment in direction or decision-making, it reflects underlying problems [4][24] - He contrasts OpenAI's organizational rigidity with the agility of competitors like Anthropic, which he praises for its focused and effective execution in AI research [4][5] - The current state of the AI industry resembles a dramatic narrative, where personal movements and internal conflicts are sensationalized, creating a high-pressure environment for researchers [6][7][44] Research and Innovation - Tworek believes that the AI field is overly focused on scaling existing models, particularly those based on the Transformer architecture, and argues for the need to explore new methodologies and architectures [19][36] - He identifies two underappreciated research directions: architectural innovation beyond Transformers and the integration of continual learning, which he sees as essential for advancing AI capabilities [36][37] - The industry is at a crossroads where researchers must balance the pursuit of groundbreaking ideas with the pressures of existing corporate structures and funding constraints [28][30] Future Outlook - Tworek expresses cautious optimism about the potential for breakthroughs in AI, suggesting that while significant progress has been made, there are still many unexplored avenues that could lead to substantial advancements [38][40] - He acknowledges the challenges of achieving AGI, emphasizing the importance of integrating continuous learning and multimodal perception into AI systems [39][40] - The conversation around AI's impact on society is evolving, with a recognition that new technologies will have profound effects on various aspects of life, including interpersonal relationships and economic productivity [42][43]
硅谷“钱太多”毁了AI ?!前OpenAI o1负责人炮轰:别吹谷歌,Q-Star 被炒成肥皂剧,7年高压被“逼疯”!
AI前线· 2026-01-24 05:33
Core Viewpoint - The departure of Jerry Tworek from OpenAI highlights the growing divide between AI research and commercialization, emphasizing the need for risk-taking in foundational research that is increasingly difficult in a competitive corporate environment [3][4][5]. Group 1: Departure and Industry Insights - Jerry Tworek's exit from OpenAI was met with shock among employees, indicating his significant influence within the company [3][10]. - Tworek criticized the AI industry for a lack of innovation, stating that major companies are developing similar technologies, which pressures researchers to prioritize short-term gains over experimental breakthroughs [4][5]. - He pointed out that Google's success in catching up with OpenAI was due to OpenAI's own missteps, including slow actions and failure to leverage its initial advantages [4][5]. Group 2: Organizational Challenges - Tworek identified organizational rigidity as a barrier to innovation, where team structures limit cross-team research and collaboration [4][22]. - He expressed concern that the current state of the AI industry resembles a soap opera, where personal movements and internal conflicts overshadow genuine research progress [6][7]. Group 3: Future Research Directions - Tworek emphasized the importance of exploring new research paths rather than following the mainstream trajectory, advocating for more diversity in AI model development [30][31]. - He highlighted two underexplored areas: architectural innovation beyond the Transformer model and the integration of continual learning into AI systems [45][47]. - Tworek believes that significant advancements in AI will require a shift away from the current focus on scaling existing models and towards more innovative approaches [26][28]. Group 4: AGI and Industry Evolution - Tworek updated his perspective on the timeline for achieving AGI, acknowledging that while current models are powerful, they still lack essential capabilities like continuous learning and multimodal perception [49][50]. - He noted that the rapid evolution of AI technology and increasing investment in the field could lead to breakthroughs sooner than previously anticipated [51].
为什么自动驾驶领域内的强化学习,没有很好的落地?
自动驾驶之心· 2026-01-13 03:10
Core Viewpoint - The article discusses the challenges and advancements in reinforcement learning (RL) for autonomous driving, emphasizing the need for a balanced reward system to enhance both safety and efficiency in driving models [2][5]. Group 1: Challenges in Reinforcement Learning - Reinforcement learning faces significant issues such as reward hacking, where increased safety requirements can lead to decreased efficiency, and vice versa [2]. - Achieving a comprehensive performance improvement in RL models is challenging, with many companies not performing adequately [2]. - The complexity of autonomous driving requires adherence to various driving rules, making it essential to optimize through RL, especially in uncertain decision-making scenarios [2][5]. Group 2: Model Development and Talent Landscape - The current industry leaders have developed a complete model iteration approach that includes imitation learning, closed-loop RL, and rule-based planning [5]. - The high barriers to entry in the autonomous driving sector have led to generous salaries, with top talents earning starting salaries of 1 million and above [6]. - There is a notable gap in practical experience among many candidates, as they often lack the system-level experience necessary for real-world applications [7]. Group 3: Course Offerings and Structure - The article promotes a specialized course aimed at practical applications of end-to-end autonomous driving systems, highlighting the need for hands-on experience [8]. - The course covers various chapters, including an overview of end-to-end tasks, two-stage and one-stage algorithm frameworks, and the application of navigation information [13][14][15][16]. - It also addresses the integration of RL algorithms and trajectory optimization, emphasizing the importance of combining imitation learning with RL for better performance [17][18]. Group 4: Practical Experience and Knowledge Requirements - The final chapter of the course focuses on sharing production experiences, analyzing data, models, scenarios, and rules to enhance system capabilities [20]. - The course is designed for advanced learners with a foundational understanding of autonomous driving algorithms, reinforcement learning, and programming skills [21][22].
我们在招募这些方向的合伙人(世界模型/4D标注/RL)
自动驾驶之心· 2026-01-12 09:20
Core Viewpoint - The autonomous driving industry has entered its second phase, requiring more dedicated individuals to address its challenges and pain points [2]. Group 1: Industry Direction - The main focus areas include but are not limited to: autonomous driving product management, 4D annotation/data loop, world models, VLA, large models for autonomous driving, reinforcement learning, and end-to-end solutions [4]. Group 2: Job Description - The positions are primarily aimed at training collaborations in autonomous driving, targeting B-end (enterprises, universities, research institutes) and C-end (students, job seekers) for course development and original article creation [5]. Group 3: Contact Information - For discussions regarding compensation and collaboration methods, interested parties are encouraged to add the WeChat contact wenyirumo for further communication [6].
毫无征兆,DeepSeek R1爆更86页论文,这才是真正的Open
3 6 Ke· 2026-01-09 03:12
Core Insights - DeepSeek has significantly updated its R1 paper from 22 pages to 86 pages, demonstrating that open-source models can compete with closed-source ones and even teach them new methodologies [1][2][4] - The updated paper serves as a fully reproducible technical report for the open-source community, showcasing the advancements made in AI reasoning capabilities through reinforcement learning [2][4] Summary by Sections Paper Update and Content - The R1 paper now includes precise data specifications, detailing a dataset of 26,000 math problems and 17,000 code samples, along with the creation process [4] - Infrastructure details are provided, including a diagram of the vLLM/DualPipe setup [4] - The training cost is broken down, totaling approximately $294,000, with R1-Zero utilizing 198 hours of H800 GPU [4][24] - A retrospective on failed attempts is included, explaining why the Process Reward Model (PRM) did not succeed [4] - A comprehensive safety report of 10 pages outlines safety assessments and risk analyses [4] Performance Comparison - DeepSeek R1's performance is comparable to OpenAI's o1, even surpassing o1-mini, GPT-4o, and Claude 3.5 in several metrics [5][10] - In educational benchmarks like MMLU and GPQA Diamond, R1 outperforms previous models, particularly excelling in STEM-related questions due to reinforcement learning [10][12] - R1's performance in long-context question-answering tasks is notably strong, indicating excellent document understanding and analysis capabilities [10] Reinforcement Learning and Distillation - The paper discusses the effectiveness of distilling reasoning capabilities from larger models to smaller ones, confirming that learned reasoning can be transferred without re-exploring the reward space [20][22] - The training data distribution for reinforcement learning includes 26,000 math problems, 17,000 code samples, and 66,000 general knowledge tasks [19] Safety and Risk Assessment - DeepSeek R1's safety evaluation includes a risk control system that filters potential risk dialogues and assesses model responses against predefined keywords [31][32] - The model's performance in safety benchmarks is comparable to other advanced models, although it shows weaknesses in handling intellectual property issues [35][37] - A multi-language safety testing dataset has been developed, demonstrating R1's safety performance across 50 languages [42] Conclusion - The advancements made by DeepSeek R1 represent a significant milestone in open-source AI, showcasing competitive performance against proprietary models while maintaining lower operational costs [17][18]
清库存,DeepSeek突然补全R1技术报告,训练路径首次详细公开
3 6 Ke· 2026-01-09 03:12
Core Insights - DeepSeek has released an updated version of its research paper on the R1 model, adding 64 pages of technical details, significantly enhancing the original content [4][25] - The new version emphasizes the implementation details of the R1 model, showcasing a systematic approach to its training process [4][6] Summary by Sections Paper Update - The updated paper has expanded from 22 pages to 86 pages, providing a comprehensive view of the R1 model's training and operational details [4][25] - The new version includes a detailed breakdown of the training process, which is divided into four main steps: cold start, inference-oriented reinforcement learning (RL), rejection sampling and fine-tuning, and alignment-oriented RL [6][9] Training Process - The cold start phase utilizes thousands of CoT (Chain of Thought) data to perform supervised fine-tuning (SFT) [6] - The inference-oriented RL phase enhances model capabilities while introducing language consistency rewards to address mixed-language issues [6] - The rejection sampling and fine-tuning phase incorporates both reasoning and general data to improve the model's writing and reasoning abilities [6] - The alignment-oriented RL phase focuses on refining the model's usefulness and safety to align more closely with human preferences [6] Safety Measures - DeepSeek has implemented a risk control system to enhance the safety of the R1 model, which includes a dataset of 106,000 prompts to evaluate model responses based on predefined safety criteria [9][10] - The safety reward model employs a point-wise training method to distinguish between safe and unsafe responses, with training hyperparameters aligned with the usefulness reward model [9] - The risk control system operates through two main processes: potential risk dialogue filtering and model-based risk review [9][10] Performance Metrics - The introduction of the risk control system has led to a significant improvement in the model's safety performance, with R1 achieving benchmark scores comparable to leading models [14] - DeepSeek has developed an internal safety evaluation dataset categorized into four main categories and 28 subcategories, totaling 1,120 questions [19] Team Stability - The core contributors to the DeepSeek team have largely remained intact, with only five out of over 100 authors having left, indicating strong team retention in a competitive AI industry [21][24] - Notably, a previously departed author has returned to the team, highlighting a positive team dynamic compared to other companies in the sector [24]
强化学习环境与科学强化学习:数据工厂与多智能体架构 --- RL Environments and RL for Science_ Data Foundries and Multi-Agent Architectures
2026-01-07 03:05
Summary of Key Points from the Conference Call Industry Overview - The focus of the conference call is on the scaling of Reinforcement Learning (RL) and its applications across various domains, including AI capabilities, coding environments, and data foundries [2][3][51]. Core Insights and Arguments 1. **Scaling RL as a Critical Path**: The scaling of RL is identified as essential for unlocking further AI capabilities, with significant performance gains attributed to increased RL compute [2][4]. 2. **OpenAI's Model Performance**: OpenAI has demonstrated that improvements in model performance over the past 18 months were primarily driven by post-training and scaling up RL compute, using the same base model across various flagship models [4][6]. 3. **Challenges in Scaling RL**: The scaling of RL faces challenges due to the need for a continuous stream of tasks for models to learn from, which is labor-intensive compared to pre-training that utilizes vast internet data [7]. 4. **Task Aggregation**: Companies like Windsurf and Cursor have managed to create competitive models by aggregating tasks and data, even without lab-level resources [9]. 5. **Utility and Capability Evaluation**: OpenAI's GDPval evaluation measures model improvements across 1,000+ tasks in 44 occupations, indicating a shift from abstract intelligence measurement to real-world utility [10][14]. 6. **Autonomous AI Development**: Companies like OpenAI and Anthropic are targeting the development of autonomous AI researchers by 2028 and 2027, respectively, indicating a trend towards models that can operate independently for longer periods [16]. Additional Important Content 1. **Outsourcing Data Tasks**: The need for significant data and task curation has led to outsourcing, with companies like Scale AI historically being major contractors but now absorbed by Meta [19][21]. 2. **Emergence of New Companies**: Over 35 companies have emerged to provide RL environments, focusing on various domains, including website cloning and more sophisticated software environments [24][29]. 3. **Demand for Coding Environments**: There is a high demand for coding environments, with companies acquiring defunct startups for their GitHub repositories to create these environments [37][38]. 4. **Expert Contractors**: Firms like Surge and Mercor are utilized to hire domain-specific experts for task creation, with Surge being a significant player with an estimated annual recurring revenue of around $1 billion [55]. 5. **Chinese Market Dynamics**: Chinese VC firms are attempting to establish local data foundry competitors to serve the ecosystem at lower costs, with most Chinese labs still in early stages of scaling RL [58][59]. This summary encapsulates the key points discussed in the conference call, highlighting the advancements, challenges, and market dynamics within the RL and AI landscape.