Workflow
Artificial Intelligence
icon
Search documents
刚刚,Thinking Machines Lab博客提出在策略蒸馏,Qwen被cue 38次
3 6 Ke· 2025-10-28 02:00
Core Insights - Thinking Machines Lab (TML) has introduced a new training method called on-policy distillation, which combines reinforcement learning (RL) error correlation with supervised fine-tuning (SFT) reward density, achieving superior performance at a lower cost [1][17]. Group 1: Methodology and Applications - On-policy distillation is effective for small models, enhancing their domain performance and continuous learning capabilities [1][17]. - The method is inspired by the Qwen team’s research and heavily utilizes the Qwen3 series models during experiments [3][34]. - The training process consists of three stages: pre-training, mid-training, and post-training, focusing on general capabilities, domain knowledge, and target behavior respectively [6][7]. Group 2: Advantages of On-Policy Distillation - Small models trained with on-policy distillation often outperform larger general models in specialized fields due to benefits like local deployment, easier continuous training, and reduced inference costs [7][17]. - The method provides dense reward signals, allowing for more efficient learning compared to traditional RL, which offers sparse feedback [9][18]. Group 3: Performance and Cost Efficiency - TML's experiments show that on-policy distillation can achieve performance comparable to RL at a fraction of the cost, with reported costs being only one-tenth of traditional RL methods [34][41]. - The method has demonstrated significant computational efficiency, requiring 7-10 times fewer gradient steps to achieve similar performance levels as RL [58]. Group 4: Continuous Learning and Personalization - On-policy distillation is positioned as a promising tool for continuous learning, allowing models to update without degrading previously learned behaviors [66][70]. - The approach can effectively personalize models, enabling them to adapt to specific tasks while retaining core capabilities [42][53].
全球开源大模型杭州霸榜被终结,上海Minimax M2发布即爆单,百万Tokens仅需8元人民币
量子位· 2025-10-28 01:18
Core Insights - The open-source model throne has shifted to Minimax M2, surpassing previous leaders DeepSeek and Qwen, which were based in Hangzhou, now replaced by the Shanghai-based Minimax [1] Performance and Features - Minimax M2 achieved a score of 61 in the Artificial Analysis test, ranking it as the top open-source model, just behind Claude 4.5 Sonnet [2] - The model is designed specifically for agents and programming, showcasing exceptional programming capabilities and agent performance [4] - Minimax M2 is economically efficient, with a reasoning speed twice that of Claude 3.5 Sonnet, while its API pricing is only 8% of Claude's [5][9] - The model's total parameter count is 230 billion, with only 10 billion active parameters, allowing for rapid execution [9][10] - It employs an interleaved thinking format, crucial for planning and verifying operations across multiple dialogues, enhancing agent reasoning [11] Comparative Analysis - In the overall performance ranking, M2 placed fifth in the Artificial Analysis test, securing the top position among open-source models [14] - The test utilized ten popular datasets, including MMLU Pro and LiveCodeBench, to evaluate model performance [15] - M2's pricing is set at $0.3 per million input tokens and $1.2 per million output tokens, representing only 8% of Claude 3.5 Sonnet's cost [16] Agent Capabilities - Minimax has deployed M2 on an agent platform for limited free use, showcasing various existing projects created with the model [32][35] - The platform allows users to create diverse web applications and even replicate classic games in a web environment [36][38] - Users have successfully developed projects like an online Go game platform, demonstrating M2's programming capabilities [40][43] Technical Insights - M2 utilizes a hybrid attention mechanism, combining full attention and sliding window attention, although initial plans to incorporate sliding window attention were abandoned due to performance concerns [45][46] - The choice of attention mechanism reflects Minimax's strategy to optimize performance for long-range dependency tasks [49][54]
Thinking Machine新研究刷屏!结合RL+微调优势,小模型训练更具性价比了
量子位· 2025-10-28 01:18
Core Insights - The article discusses the innovative research by Thinking Machine, focusing on a new training method for small language models called On-Policy Distillation, which enhances their understanding of specialized fields [1][4]. Summary by Sections Methodology - On-Policy Distillation combines the strengths of two traditional training methods: reinforcement learning (self-exploration) and supervised fine-tuning (direct answers), creating a more efficient training framework [3][8]. - This method allows AI to learn through practical problem-solving while receiving immediate guidance when it encounters difficulties, significantly improving training efficiency by 50-100 times [4][5]. Training Phases - The training process consists of three main phases: Pre-training (general capabilities), Mid-training (domain-specific knowledge), and Post-training (target behavior guidance) [9]. - The focus of the research is on the Post-training phase, where the model learns to perform specific tasks effectively [6][9]. Evaluation Metrics - The method employs Negative reverse KL divergence as a key evaluation metric, ensuring that the student model learns effectively by minimizing the divergence from the teacher model's expectations [12][15]. Experimental Results - Experiment 1 demonstrated that using On-Policy Distillation, a smaller model (8B) could achieve a performance score of 70% on a math benchmark with significantly lower computational costs compared to traditional methods [19][22]. - Experiment 2 showed that the method effectively mitigates "catastrophic forgetting" in AI models, allowing them to retain general capabilities while learning new knowledge [23][25]. Implications - The research indicates that On-Policy Distillation can empower resource-constrained individuals or small companies to train effective specialized models, enhancing accessibility in AI development [5][19]. - The findings suggest a promising avenue for achieving lifelong learning in AI systems, addressing the challenge of balancing new knowledge acquisition with the retention of existing skills [26].
月之暗面能扳回一局吗?
虎嗅APP· 2025-10-28 01:06
Core Insights - The article discusses the recent financing rumors surrounding "月之暗面" (Moonlight), highlighting the potential involvement of notable VC firms and the speculation about an IPO, although some claims are deemed untrue [5][6][7]. Financing and Valuation - The key points of interest regarding the financing include the identity of the lead investor, the post-financing valuation of 月之暗面, and its future market positioning [6]. - Currently, among the "six small dragons" in large models, 智谱AI (Zhipu AI) holds the highest valuation at 40 billion RMB, followed by MiniMax at 30 billion RMB. The outcome of 月之暗面’s financing could potentially alter its competitive standing in the market [6]. Strategic Shifts - 月之暗面 is attempting to pivot its strategy, focusing on consumer (toC) commercialization despite the challenging domestic environment for content and subscription services. The company has launched a subscription plan and is exploring international markets [8][10]. - The company is also shifting its product focus towards coding and agent capabilities, aiming to enhance its offerings beyond basic search and response functionalities [13][15]. Kimi's Performance - Kimi, the chatbot product of 月之暗面, has seen a significant decline in monthly active users (MAU), dropping to approximately 27 million, while competitors like 豆包 (Doubao) and DeepSeek have MAUs of 250 million and 170 million, respectively [10][12]. - The competitive landscape has changed dramatically, with Kimi failing to achieve the anticipated growth and being surpassed by newer entrants [12][17]. Self-Rescue Measures - In response to its declining performance, 月之暗面 has reduced its marketing expenditures and is focusing on developing coding and agent capabilities as key areas for growth [13][15]. - The company has introduced a tiered subscription model for its services, aiming to create a more sustainable revenue stream by targeting professional users who require in-depth research capabilities [15][16]. Open Source Strategy - 月之暗面 has adopted an open-source approach to enhance its market presence and developer engagement, releasing components related to its AI models and agent functionalities [18][19]. - This strategy is seen as a way to mitigate competitive pressures from larger players while establishing a foothold in the developer community [18][19]. Challenges Ahead - Despite the strategic pivots, 月之暗面 faces significant challenges, particularly in user acquisition and retention, as it struggles to establish a strong market presence [28][30]. - The company must balance its operational costs with user engagement to ensure sustainable growth, especially as competition intensifies with the upcoming release of new models from rivals [30][32].
Investment Targets for JPMorgan Chase’s Security and Resiliency Initiative Analyzed in New Report
Crowdfund Insider· 2025-10-28 00:49
Core Insights - JPMorgan Chase plans to allocate approximately $1.5 trillion over the next decade as part of its Security and Resiliency Initiative, which includes up to $10 billion in direct equity investments to enhance national security in the US [1][2] - The initiative aims to support US-based firms across 27 identified sub-areas, focusing on capital deployment and financing strategies [2] - A new class of industrial AI startups is emerging to rebuild the nation's productive capacity in key sectors such as manufacturing, materials, and mobility [2] Investment Focus - Startups like Skild AI, Charge Robotics, and Cartken are working to reduce costs for onshore production through automation in physical work and logistics [3] - Earth AI and Periodic Labs are leveraging AI to accelerate materials discovery and ensure domestic access to critical minerals and battery inputs [3] - Material development platforms are gaining investor interest as AI and quantum computing unlock new capabilities, marking them as a hot emerging market in manufacturing [3] Security and Infrastructure - As AI becomes integral to critical systems, the attack surface for cyber threats expands significantly [3] - Emerging startups are integrating AI-native security, quantum-safe encryption, and infrastructure hardening to enhance the security of the intelligent economy [3] - Companies like TXOne Networks and Xage Security focus on protecting industrial and energy assets using zero-trust architectures, while TrustLogix and Concentric AI safeguard sensitive enterprise data [3] M&A Activity - TrustLogix is identified as a potential acquisition target among AI security startups, as larger cyber players are actively pursuing M&A opportunities to integrate AI security features [3] - Quantum Xchange provides secure data transmission with quantum-resistant encryption, and HiddenLayer protects AI models from various attacks [3] Overall Objective - The collective aim of these emerging technologies is to establish a secure AI infrastructure that encompasses digital, data, and physical domains, ensuring that as intelligence scales, trust scales with it [3] - Investment in this layer is not solely about cybersecurity; it is also about safeguarding the "nervous system" of the digital economy [3]
DeepMind再登Nature:AI Agent造出了最强RL算法
3 6 Ke· 2025-10-28 00:35
Core Insights - The main objective of artificial intelligence (AI) is to design agents capable of autonomously predicting, acting, and achieving goals in complex environments. The challenge has been to enable these agents to independently develop efficient reinforcement learning (RL) algorithms [1][2]. Group 1: Discovery Methodology - Google DeepMind introduced a method called DiscoRL, which allows agents to autonomously discover RL rules through interactions in various environments. This method outperformed existing RL algorithms in both known and challenging benchmark tests [1][2]. - The discovery process involves two types of optimization: agent optimization and meta-optimization. Agents optimize their parameters by updating their strategies and predictions, while the meta-network optimizes the goals of the RL rules to maximize cumulative rewards [3][5]. Group 2: Performance Evaluation - DiscoRL was evaluated using the interquartile mean (IQM) as a performance metric, demonstrating superior performance over existing RL algorithms like MuZero and Dreamer in the Atari benchmark tests [7][8]. - The Disco57 rule, trained on 57 Atari games, achieved an IQM score of 13.86, surpassing all current RL rules and showing significant efficiency improvements over MuZero [8][14]. Group 3: Generalization and Robustness - The generalization capability of Disco57 was tested across 16 independent benchmark tests, outperforming all published methods, including MuZero and PPO. It also showed competitive performance in the Crafter benchmark and ranked third in the NetHack NeurIPS 2021 challenge without using domain-specific knowledge [9][11]. - Disco103, discovered in 103 environments, demonstrated comparable performance to Disco57 in Atari benchmarks and reached human-level performance in the Crafter benchmark, indicating that more complex and diverse environments lead to stronger and more generalizable RL rules [11][14]. Group 4: Efficiency and Scalability - The optimal performance of Disco57 was achieved within approximately 600 million steps per game, significantly more efficient than traditional human-designed RL rules, which require more experimental iterations and time [14][18]. - The performance of the discovered RL rules improved with the increase in the number of training environments, suggesting that the effectiveness of the discovered RL is dependent on the data (environments) and computational resources available [14][17].
A16Z最新洞察:视频模型从狂飙到分化,产品化是下一个机会
3 6 Ke· 2025-10-28 00:18
Core Insights - The video generation model industry is transitioning from a phase of rapid performance improvement to a "product era," focusing on diversity and specialization rather than just model parameters and benchmark scores [2][4][12] - There is a growing realization that no single model can dominate all video generation tasks, leading to a trend of specialization where different models excel in specific areas [4][11][12] - The need for better integrated products to simplify the creative process is becoming increasingly apparent, as many creators still rely on multiple tools to achieve their desired outcomes [13][15][16] Group 1: Industry Trends - The pace of progress in video generation models has slowed, with most mainstream models now capable of generating impressive 10-15 second videos with synchronized audio [1][6] - The concept of a "superior model" in the video domain is being challenged, as recent releases like Sora 2 have not consistently outperformed predecessors like Veo 3 [4][11] - The industry is witnessing a shift towards models that are tailored for specific capabilities, such as physical simulation and multi-shot editing, rather than one-size-fits-all solutions [2][11][12] Group 2: Product Development - The current landscape shows that while video generation capabilities have improved, the corresponding product development has not kept pace, leading to a gap in user experience and creative efficiency [13][15] - Companies are beginning to address this gap by developing tools that allow users to modify video elements more intuitively, such as Runway's suite of tools and OpenAI's Sora Storyboard [15][16] - The future is expected to see more specialized models for specific industries or scenarios, along with comprehensive creative toolkits that integrate various media elements into a cohesive workflow [16]
零一万物官宣三位高管新任命;前天猫精灵总裁彭超创业,想从运动AI硬件实现通用智能丨AIGC日报
创业邦· 2025-10-28 00:10
Group 1 - Zero One Wanhua announced a new round of executive appointments, with co-founder Shen Pengfei overseeing domestic ToB and ToG business expansion and sales system, while Zhao Binqiang and Ning Ning were promoted to vice presidents focusing on model platform technology and international business development respectively [2] - Former Tmall Genie president Peng Chao has launched a new company named "Yun Jue Technology," aiming to develop sports AI hardware that integrates wearable devices with intelligent agents, with a focus on self-evolving capabilities in high-frequency sports environments [2] - Apple is reportedly planning to introduce an advertising feature in Apple Maps, allowing businesses to pay for top placement in search results, with the integration expected as early as next year, utilizing AI to enhance relevance and utility of search results [2] Group 2 - Volcano Engine officially launched the Doubao video generation model 1.0 pro fast, achieving a significant efficiency breakthrough with generation speed increased by approximately 3 times and costs reduced by 72% [2]
奥特曼考虑给ChatGPT加广告了,用8亿用户,救万亿债务
3 6 Ke· 2025-10-27 23:55
Core Insights - OpenAI has achieved 800 million weekly active users and is projected to generate approximately $13 billion in annual recurring revenue (ARR), with 30% coming from enterprise clients [1][4]. - OpenAI's subscription model is insufficient to cover the astronomical costs of training and operating advanced AI models, prompting the exploration of new revenue streams such as advertising [1][16]. - Anthropic, in contrast, focuses on enterprise clients, with 80% of its revenue derived from this segment, and has a projected ARR of $7 billion to $9 billion [4][26]. OpenAI's Business Model - OpenAI's revenue is primarily driven by consumer subscriptions, with only about 3% of individual users willing to pay for generative AI services, highlighting a significant monetization challenge [7][9]. - The company is undergoing a restructuring to become a Public Benefit Corporation (PBC) to facilitate a future IPO, with a total funding target of $300 billion from SoftBank [13][15][16]. - OpenAI's operational costs are projected to reach $16 billion this year and $40 billion next year, necessitating substantial capital to sustain its growth ambitions [16][30]. Anthropic's Business Strategy - Anthropic has adopted a more conservative approach, focusing on enterprise solutions and leveraging partnerships with Google and Amazon for cloud infrastructure and resources [25][31]. - The company has developed a strong presence in sectors like programming and legal documentation, with its Claude model capturing a 42% market share in code generation [5][10]. - Anthropic's revenue model is based on usage-based API and customized solutions, which may lead to more stable cash flow compared to OpenAI's aggressive growth strategy [31][32]. Market Positioning - OpenAI is characterized by its consumer-facing, high-profile approach, while Anthropic emphasizes a low-key, value-driven strategy targeting enterprise clients [3][10]. - The contrasting business models reflect differing market perceptions, with OpenAI seeking to capitalize on its vast user base and Anthropic focusing on delivering reliable solutions to businesses [34][36]. - Both companies are vying for dominance in the AI market, but their paths diverge significantly, with OpenAI betting on mass-market appeal and Anthropic prioritizing steady, incremental growth [26][30].
马斯克的AI百科全书来了:Grokipedia已上线,收录超88.5万篇文章
Sou Hu Cai Jing· 2025-10-27 23:39
Core Insights - Grokipedia, an online encyclopedia powered by xAI's Grok, briefly launched but experienced a crash shortly after its debut, currently hosting over 885,000 articles [1] Group 1: Project Overview - Grokipedia aims to address perceived biases in Wikipedia, as stated by its founder Elon Musk, who views it as a necessary step for xAI's understanding of the universe [3] - The platform's launch was delayed due to the need to eliminate promotional content, and some entries closely mirror those found in Wikipedia, with a disclaimer noting the adaptation from Wikipedia [3] Group 2: Elon Musk's Background and Ventures - Elon Musk, born on June 28, 1971, in Pretoria, South Africa, is recognized for founding and leading companies in electric vehicles, reusable spaceflight, artificial intelligence, and digital payments [5] - Musk's significant ventures include SpaceX, founded in 2002, and Tesla, where he became CEO in 2008, alongside Neuralink and The Boring Company established in 2016, and xAI in 2023 [5] - His net worth is driven by equity in Tesla and SpaceX, with SpaceX valued at over $400 billion by 2025, positioning him as the world's richest person with an estimated $428 billion as of September 2025 [5]