Workflow
Large Language Model
icon
Search documents
「重要性采样」并不「重要」?快手清华ASPO攻克重要性采样权重错配
量子位· 2025-10-15 10:20
Core Insights - Reinforcement Learning (RL) has become a crucial component in the post-training phase of Large Language Models (LLMs) like ChatGPT and DeepSeek [1] - A significant issue has emerged with the increasing scale of model parameters: the importance sampling (IS) mechanism may not be as beneficial as previously thought [2][5] - The research team from Kuaishou and Tsinghua University identified a deep-rooted "weight mismatch" phenomenon in existing supervised RL paradigms, leading to overconfidence in models and potential issues like entropy collapse and premature convergence [2][6] Importance Sampling Issues - Importance sampling is intended to correct the distribution differences between old and new policies, allowing models to reuse old data without deviating from the target distribution [5] - In small-scale RL, IS is effective; however, it fails in the context of supervised RL for large language models [6] - Experiments showed that in GRPO algorithms, IS did not provide the expected benefits and instead contributed to training instability [7] Weight Mismatch and Self-Reinforcing Loops - The research revealed that the advantage values in supervised RL are inaccurate, as different tokens contribute differently to the final answer [8] - The average IS weight for positive advantage tokens is higher than for negative ones, leading to a decrease in entropy [9] - IS in supervised RL algorithms has shifted from being a correction term to a token-level weight, causing a self-reinforcing loop that reinforces high-scoring tokens while neglecting low-probability ones [11][12] ASPO Algorithm Introduction - The proposed ASPO (Asymmetric Importance Sampling Policy Optimization) algorithm addresses these issues by inverting the IS weights for positive advantage tokens, allowing low-probability tokens to receive stronger updates [3][18] - ASPO incorporates a Dual-Clipping mechanism to manage extreme values resulting from the inverted weights, ensuring stability while maintaining effective gradient flow [20] Experimental Results - ASPO demonstrated significant advantages in various benchmarks, including mathematical reasoning and code generation tasks, outperforming traditional methods [24] - The average performance improvement was 12.5% for mathematical tasks and 17.0% for code generation tasks, with smoother training curves and reduced entropy collapse [26] - ASPO achieved notable results in the LiveCodeBench v5 benchmark, indicating its superiority over mainstream RL methods [26][27]
Google AI 今年最大王炸,测试曝光直接复刻 macOS,比GPT-5更值得期待
3 6 Ke· 2025-10-15 09:29
Core Insights - The article discusses the advancements of Google's Gemini 3.0 AI model, highlighting its superior coding capabilities compared to competitors like GPT-5 and Claude [1][3][51] - Gemini 3.0 is reported to generate fully functional web applications, including a macOS-like web operating system, showcasing significant improvements in both functionality and design [6][7][22] - The model's inference speed has also improved, with tasks being completed in 1-2 minutes, which is faster than its predecessors [8][22] Group 1: Model Performance - Gemini 3.0 has demonstrated the ability to generate a fully functional web operating system, allowing users to interact with applications as if they were using a real computer [6][7] - The model's coding capabilities have been tested against various tasks, showing a trend of outperforming GPT-5 and even Claude in certain areas [3][5][51] - Users have reported that Gemini 3.0 can create complex applications, including video editors and interactive games, indicating a leap in its programming abilities [24][44] Group 2: User Experience and Feedback - Feedback from users indicates that Gemini 3.0's design and functionality are impressive, with many noting its ability to create aesthetically pleasing and functional web applications [21][22] - Some users have expressed concerns about the model's default design choices, suggesting that while improvements have been made, there are still areas for enhancement [22][24] - The model's ability to generate unique and creative outputs has led to speculation that it may dominate the front-end development space, similar to its predecessor, nano banana [21][55] Group 3: Competitive Landscape - The advancements of Gemini 3.0 position Google as a strong competitor in the AI space, particularly in coding and application development, challenging the established dominance of OpenAI's GPT-5 and Anthropic's Claude [51][55] - The article notes that while OpenAI continues to leverage its large user base for continuous application development, Google is catching up with innovative features in Gemini 3.0 [51][55] - The competitive dynamics in the AI industry are shifting, with Gemini 3.0's capabilities potentially altering user preferences and market positioning [55]
企业在院校设奖学金,不能简单地理解为“抢人”
Nan Fang Du Shi Bao· 2025-10-15 00:00
Group 1 - Tencent has launched the Qinyun Scholarship, focusing on fundamental research and application innovation in the field of artificial intelligence, targeting master's and doctoral students from mainland China and Hong Kong, Macau, and Taiwan [1] - The scholarship aims to select 15 winners in its first phase, each receiving a cash reward of 200,000 yuan and cloud heterogeneous computing resources valued at 300,000 yuan, along with potential internship or employment opportunities at Tencent [1] - The emphasis on applicants having a forward-looking research vision highlights the need for disruptive innovation in AI, as current large language models are seen as inadequate for true reasoning and scientific discovery [2][4] Group 2 - Young scholars in AI face significant funding challenges, as the cost of research increases with technological advancement, particularly in deploying large models that require expensive hardware [3] - The 300,000 yuan in cloud computing resources can support approximately three months of continuous use of cutting-edge GPU instances, providing crucial support for young AI researchers [4] - Establishing scholarships not only fulfills corporate social responsibility but also helps in talent acquisition and may lead to the discovery of future groundbreaking technologies, creating a win-win situation for companies, society, and students [4]
大模型追逐星辰大海,GPT和Gemini国际天文奥赛夺金
机器之心· 2025-10-13 04:21
Core Insights - The article discusses the remarkable advancements in artificial intelligence, particularly in large language models (LLMs) like GPT-5 and Gemini 2.5 Pro, which have achieved gold medal performances in the International Olympiad on Astronomy and Astrophysics (IOAA) [4][18]. Group 1: AI Model Performance - GPT-5 and Gemini 2.5 Pro excelled in the IOAA, demonstrating strong reasoning and problem-solving capabilities in astronomy and astrophysics [4][12]. - In the theoretical exams, GPT-5 scored an average of 84.2% while Gemini 2.5 Pro scored 85.6%, outperforming other models by 7 to 25 percentage points [12][13]. - The models achieved gold medal status, with GPT-5 scoring 86.8% in 2025, 89.6% in 2023, and 93.0% in 2022, consistently outperforming the best human participants [19][18]. Group 2: Evaluation Framework - The study introduced a more rigorous evaluation framework for assessing LLMs in scientific research, focusing on complex reasoning and problem-solving rather than simple knowledge recall [9][10]. - The IOAA was chosen as a benchmark due to its ecological validity, covering a wide range of astronomical topics and requiring multi-step reasoning [10][9]. Group 3: Error Analysis - The models showed a significant performance gap between different types of questions, with better accuracy in physics/mathematics problems (67-91%) compared to geometric/spatial problems (49-78%) [26]. - Common errors included conceptual misunderstandings and geometric reasoning challenges, indicating fundamental difficulties in achieving deep physical understanding [26][25].
X @Anthropic
Anthropic· 2025-10-09 16:28
Previous research suggested that attackers might need to poison a percentage of an AI model’s training data to produce a backdoor.Our results challenge this—we find that even a small, fixed number of documents can poison an LLM of any size.Read more: https://t.co/HGMA7k1Lnf ...
X @Anthropic
Anthropic· 2025-10-09 16:06
Previous research suggested that attackers might need to poison a percentage of an AI model’s training data to produce a backdoor.Our results challenge this—we find that even a small, fixed number of documents can poison an LLM of any size.Read more: https://t.co/HGMA7k1Lnf ...
真够卷的!DeepSeek更完智谱更:GLM-4.6,代码国内最强
量子位· 2025-09-30 08:26
Core Insights - The article discusses the launch of GLM-4.6 by Zhiyu, which is claimed to have the strongest coding capabilities among domestic models, surpassing Claude Sonnet 4 [2][5]. - GLM-4.6 has shown significant improvements in various benchmarks, aligning closely with Claude Sonnet 4 in most assessments [6]. - The model has reduced average token consumption by over 30% compared to its predecessor, GLM-4.5, making it the most efficient in its category [8]. Performance Testing - Zhiyu conducted tests in real programming scenarios, demonstrating GLM-4.6's ability to generate a shooting game in under a minute [14]. - The model successfully created an interactive animation using p5.js, showcasing its speed and efficiency in coding tasks [18]. - In a classic physics problem, GLM-4.6 accurately simulated a ball bouncing within a rotating hexagon, adhering to physical laws [22]. Mathematical and Reasoning Abilities - GLM-4.6 was tested with an AIME 2025 math problem, where it correctly identified the answer as 70, highlighting its mathematical and multimodal capabilities [25]. - The model's reasoning abilities have been enhanced, allowing it to call tools during inference [28]. Technological Advancements - GLM-4.6 has achieved a significant milestone by implementing FP8+Int4 mixed-precision quantization on domestic chips, marking the first successful integration of this technology [27]. - The context window has been expanded from 128K to 200K, enabling it to handle longer code and intelligent tasks [28]. - The model's deployment on the new generation of GPUs from Moer Thread demonstrates its compatibility and adaptability within the ecosystem [30]. Pricing Strategy - Zhiyu has reduced the pricing for its GLM Coding Plan, offering a subscription at one-seventh the cost of competitors while providing 90% of Claude's intelligence [34].
Shanghai Synyi Medical Technology Co., Ltd.(H0050) - Application Proof (1st submission)
2025-09-29 16:00
The Stock Exchange of Hong Kong Limited and the Securities and Futures Commission take no responsibility for the contents of this Application Proof, make no representation as to its accuracy or completeness and expressly disclaim any liability whatsoever for any loss howsoever arising from or in reliance upon the whole or any part of the contents of this Application Proof. Application Proof of Shanghai Synyi Medical Technology Co., Ltd. 上海森億醫療科技股份有限公司 (the "Company") (A joint stock company incorporated in t ...
Prediction: Wall Street's Most Valuable Public Company by 2030 Will Be This Dual-Industry Leader (No, Not Nvidia)
The Motley Fool· 2025-09-28 07:06
Core Insights - A historically inexpensive trillion-dollar business is positioned to surpass Nvidia, Apple, and Microsoft by the end of the decade [1] - Wall Street's trillion-dollar businesses, including Nvidia, Apple, Broadcom, and TSMC, are key drivers of ongoing market outperformance [2] Company Analysis - Only 11 publicly traded companies have reached a $1 trillion market cap, with 10 listed on U.S. exchanges, including the "Magnificent Seven" and Berkshire Hathaway [3] - Nvidia currently holds a market cap exceeding $4.3 trillion and is projected to potentially surpass $6 trillion based on optimistic analyst targets [6] - Nvidia's dominance in AI GPUs is supported by strong demand and significant order backlogs for its advanced AI chips [7] - Despite Nvidia's competitive advantages, historical trends suggest that its position may not be secure due to potential market corrections and competition [9][10] - Amazon is identified as a strong candidate to become Wall Street's most valuable company by 2030, leveraging its e-commerce and cloud services [14] - Amazon's e-commerce segment holds a 37.6% share of U.S. online retail sales, while its AWS platform commands a 32% share of global cloud infrastructure spending [15][17] - AWS is experiencing high-teens percentage growth year-over-year and is projected to generate over $123 billion in annual run-rate revenue [18][19] - Amazon's advertising and subscription services contribute significantly to its revenue, enhancing its pricing power [20] - Amazon is currently valued at only 8 times projected cash flow in 2029, indicating potential for substantial market value growth [22]
视远·正心明智——机器之心2025年度AI榜单正式启动
机器之心· 2025-09-26 03:31
Core Viewpoint - The article emphasizes the ongoing advancements in artificial intelligence (AI) as of 2025, highlighting the rapid iteration of large models and the emergence of new applications, particularly in China, where domestic models are approaching or surpassing international standards [2][3][4]. Summary by Sections AI Development Trends - In 2025, AI continues to evolve with significant breakthroughs in large models, including GPT-4.5, GPT-5, and Genie 3, enhancing capabilities in understanding, generation, and reasoning [3][4]. - The advancements in model capabilities are leading to new application forms, such as automated code generation and multi-step task completion in intelligent agents [4]. Domestic AI Landscape - China's AI development in 2025 is marked by domestic large models not only matching but also leading in performance compared to international counterparts, with a strong open-source ecosystem [4]. - Recent rankings show that all top 15 open-source AI models on the Design Arena leaderboard are from China [4]. Recognition of AI Leaders - The article outlines a curated list of top companies and products in AI for 2025, recognizing those with significant technological strength and innovation [6][7][8][9][10][11][12][13]. - Categories include: - **Top 10 Companies with Strong Technical Strength**: Companies that have made long-term investments in AI technology and maintain a leading position in the field [7]. - **Top 20 AI Leading Companies**: Firms that have established comprehensive operational capabilities and competitive advantages in AI technology and applications [8]. - **Top 20 Best Large Models**: Recognizing representative and powerful foundational models in the domestic market [9]. - **Top 20 Best Large Model Products**: Highlighting valuable new products and applications based on large models [10]. - **Top 10 Leading Companies in Embodied Intelligence**: Companies with systematic technology layouts and continuous innovation in the field of embodied intelligence [12]. - **Top 10 Leading Companies in ScienceAI**: Firms focusing on the intersection of AI and other scientific disciplines, driving industry development through innovative solutions [13].