Gemini 2.5 pro
Search documents
没人味的 GPT-5 更新了,但变尬了
3 6 Ke· 2025-11-14 01:44
Core Insights - OpenAI's GPT-5.1 has been released, but it has received criticism for not meeting user expectations and lacking emotional depth compared to its predecessor GPT-4o [3][25][28] - The updates in GPT-5.1 include improvements in instruction adherence and response time, but the overall performance is perceived as underwhelming [8][17][21] Group 1: Product Performance - GPT-5.1 struggles with basic tasks, such as providing correct responses to simple prompts, indicating a decline in performance compared to GPT-4o [5][12][14] - The emotional engagement of GPT-5.1 is criticized, as it appears to analyze feelings from a distance rather than empathizing with users [12][25] - The adaptive thinking feature in GPT-5.1 allows for better time management on different difficulty levels, which may benefit API users by reducing costs [17][21] Group 2: Market Position - OpenAI's market share is reportedly declining, with competitors gaining ground rapidly, suggesting a challenging landscape for the company [25][28] - Despite initial success, the excitement surrounding GPT-5 has diminished, leading to concerns about its long-term viability in a competitive AI market [25][28] - Similarweb's October statistics indicate that users are increasingly turning to alternative AI products, highlighting the need for OpenAI to enhance its offerings [28][29]
谁家AI更会赚钱?大模型投资竞赛中国AI包揽前二
Di Yi Cai Jing Zi Xun· 2025-11-04 09:13
Core Insights - The AI model investment competition "Alpha Arena" concluded with two Chinese models, Qwen3 Max and DeepSeek chat v3.1, winning first and second place, respectively, while all four leading American models incurred losses, with GPT-5 suffering the largest loss of over 62% [1][4]. Group 1: Competition Overview - The competition was initiated by the startup Nof1, providing each model with $10,000 in starting capital to trade cryptocurrencies in real markets, rather than through simulated trading [4]. - Qwen3 Max achieved a return of 22.32%, ending with a balance of $12,232, while DeepSeek chat v3.1 followed with a return of 4.89% and a balance of $10,489 [4]. - The other models, including Claude Sonnet 4.5, Grok 4, Gemini 2.5 pro, and GPT-5, ranked third to sixth, all experiencing losses exceeding 30%, with GPT-5's balance dropping to $3,734 [4][5]. Group 2: Model Performance and Strategies - DeepSeek's stable performance is attributed to its parent company, a quantitative firm, employing a straightforward strategy without frequent trading or stop-loss measures [7]. - Qwen3 Max utilized an aggressive "All in" strategy on a single asset with high leverage, which, despite previous losses, resulted in the highest profitability [7]. - Grok 4 was characterized by an aggressive trading style with high-frequency trend tracking, leading to significant volatility [7]. - Gemini 2.5's trading style was likened to that of retail investors, frequently changing strategies and incurring higher trading costs due to excessive trading [7]. Group 3: Future of AI in Finance - Nof1's team expressed the belief that financial markets represent the next optimal training environment for AI, similar to how DeepMind used games to advance AI technology a decade ago [8]. - The team aims for AI to evolve through open learning and large-scale reinforcement learning to tackle complex challenges [8]. - Some financial professionals remain skeptical about the reliability of AI in investment decisions, citing concerns over AI's understanding of individual user circumstances and the inherent limitations of AI in predicting future outcomes [8].
谁家AI用一万美元赚翻了?DeepSeek第一 GPT 5垫底
Di Yi Cai Jing· 2025-10-21 12:33
Core Insights - The article discusses a live investment competition called "Alpha Arena" initiated by the startup Nof1, where six AI models are trading real cryptocurrencies with a starting capital of $10,000 each [3][4] - The competition began on October 18 and will last for two weeks, concluding on November 3, with real-time tracking of performance and trading strategies [4][6] - The AI models participating include DeepSeek chat v3.1, Claude Sonnet 4.5, Grok 4, Qwen3 Max, Gemini 2.5 pro, and GPT 5, with varying performance and trading styles observed [4][6] Performance Summary - As of the fourth day, DeepSeek has maintained a stable performance, initially achieving a return close to 40% but stabilizing around 10% after market fluctuations [4][6] - Grok 4 showed aggressive trading but faced volatility, while Claude improved from third to second place, closely following DeepSeek [6][8] - Gemini 2.5 and GPT 5 experienced significant losses, with Gemini 2.5 down over 30% and GPT 5 down over 40% [6][8] Trading Styles - DeepSeek's strategy is characterized by stability and a diversified portfolio, employing a straightforward approach without frequent trading [8][10] - In contrast, Gemini 2.5's erratic trading style has been likened to that of retail investors, leading to higher trading costs and losses [10][12] - Grok 4 is noted for its aggressive trading style, while Claude is recognized for its analytical capabilities but struggles with decisiveness [12][13] AI's Role in Investment - The competition highlights the potential of AI in trading, with some users already adopting DeepSeek's strategies [12][13] - However, industry experts caution that AI lacks understanding of individual investors' circumstances and cannot predict future market movements [12][13] - The consensus is that while AI can provide logical investment strategies, the combination of rational tools and human insight may yield the best results [13]
六大AI模型一万美元投资对决:DeepSeek收益领跑,GPT 5垫底,目前亏损超40%
第一财经· 2025-10-21 12:12
Core Viewpoint - The article discusses the ongoing AI investment competition called "Alpha Arena," initiated by the startup Nof1, where various AI models are trading real cryptocurrencies with a starting capital of $10,000 each, showcasing their investment capabilities in a live environment [3][5]. Group 1: Competition Overview - The competition began on October 18 and will last for two weeks, ending on November 3, featuring six AI models: DeepSeek chat v3.1, Claude Sonnet 4.5, Grok 4, Qwen3 Max, Gemini 2.5 pro, and GPT 5 [5][9]. - As of October 21, DeepSeek was leading with a return of approximately 10%, having previously reached nearly 40% [5][7]. - Other models like Grok 4 and Claude have shown varying performance, with Grok 4 initially close to DeepSeek but later fluctuating around the breakeven point [7][9]. Group 2: Performance Analysis - DeepSeek's stable performance is attributed to its professional background in quantitative trading, employing a straightforward strategy without frequent trading [9][11]. - In contrast, Gemini 2.5 has been criticized for its erratic trading style, leading to significant losses, with a decline exceeding 30% at one point [11][13]. - Grok 4 is noted for its aggressive trading approach, while Claude's analytical skills are hampered by indecision, resulting in frequent trading mistakes [13][14]. Group 3: Insights on AI Trading - The competition highlights the distinct "personalities" of the AI models, akin to human traders, with each model exhibiting unique trading strategies and risk profiles [9][11]. - Despite the potential benefits of AI in providing logical investment strategies, industry experts caution that AI lacks the ability to predict future market movements and does not understand individual investor circumstances [13][14]. - The article emphasizes that while AI can help mitigate emotional biases in trading, the combination of rational tools and human insight may yield the best investment outcomes [14].
谁家AI用一万美元赚翻了?DeepSeek第一,GPT 5垫底
Di Yi Cai Jing· 2025-10-21 11:24
Core Insights - The article discusses a live investment competition called "Alpha Arena," initiated by the startup Nof1, where six AI models are trading real cryptocurrencies with a starting capital of $10,000 each [5][9] - The competition began on October 18 and will last for two weeks, ending on November 3, showcasing the performance of AI models in a volatile market [5][7] - The AI models participating include DeepSeek chat v3.1, Claude Sonnet 4.5, Grok 4, Qwen3 Max, Gemini 2.5 pro, and GPT 5, with varying trading strategies and performance [5][9] Performance Summary - As of the fourth day, DeepSeek has maintained a stable performance, initially achieving a return close to 40% but stabilizing around 10% after market fluctuations [5][7] - Grok 4 showed aggressive trading but faced significant volatility, while Claude improved from third to second place, closely following DeepSeek [7][9] - Gemini 2.5 and GPT 5 have been underperforming, with losses exceeding 30% and 40% respectively, indicating a struggle in their trading strategies [7][9] Model Characteristics - DeepSeek's success is attributed to its professional background and straightforward trading strategy, maintaining a full position without frequent adjustments [9][11] - Gemini 2.5 has been criticized for its erratic trading style, resembling that of retail investors, leading to higher transaction costs and losses [11][13] - Grok 4 is characterized by high-frequency trading and significant exposure to multiple assets, while Claude is noted for its analytical skills but indecisiveness in execution [13][14] Industry Perspectives - The competition highlights the potential and limitations of AI in trading, with industry experts noting that AI lacks understanding of individual investor circumstances and cannot predict future market movements [13][14] - AI's strength lies in its ability to provide logical, emotion-free analysis, but it is not a substitute for human judgment in navigating complex market dynamics [14]
AI纪,且为阅读祈祷
Jing Ji Guan Cha Bao· 2025-06-30 06:20
Group 1 - The article discusses the overwhelming access to information and knowledge brought by AI language models, which has transformed the way individuals interact with knowledge [2][3][8] - It highlights the shift from a knowledge-scarce era to one where knowledge is abundant, allowing individuals to explore vast information without intermediaries [2][4] - The article critiques the superficial engagement with knowledge facilitated by AI, suggesting that reliance on AI for information may lead to a decline in critical thinking and cognitive abilities [10][11][12] Group 2 - The impact of AI on reading and writing is examined, indicating that while AI can summarize and condense information, it may ultimately degrade the quality of human cognition and creativity [10][12][13] - The article emphasizes the importance of deep reading and the traditional value placed on literature, contrasting it with the rapid consumption of information promoted by AI [15][16] - It warns that the ease of access to information through AI could lead to a devaluation of literary works and a loss of appreciation for the effort involved in creating and understanding complex texts [17][20]
高考数学全卷重赛!一道题难倒所有大模型,新选手Gemini夺冠,豆包DeepSeek并列第二
机器之心· 2025-06-10 17:56
Core Viewpoint - The article evaluates the performance of various AI models in solving high school mathematics exam questions, highlighting both improvements and areas needing enhancement in mathematical reasoning and image recognition capabilities of these models [2][26]. Group 1: Objective Questions Performance - The AI models were tested on 14 objective questions and 5 subjective questions from the 2025 mathematics curriculum, with a total score of 150 points [3][9]. - The models showed similar performance in objective questions, with the highest score difference being only 3 points, while the image-based question (Question 6) posed significant challenges for most models [7][20]. - The scores for the objective questions were generally high, with models like Doubao, Qwen3, Gemini 2.5 Pro, and DeepSeek R1 achieving scores around 68 points, while o3 performed the worst [20][21]. Group 2: Subjective Questions Performance - The subjective questions were identified as a major area of weakness for the models, with only Gemini 2.5 Pro achieving a perfect score of 77 points [8][11]. - Other models like Doubao and DeepSeek R1 lost only one point each, while o3 lost two points, indicating varying levels of performance [8][9]. - The overall scores for subjective questions revealed that models like hunyuan-t1-latest and 文心 X1 Turbo performed poorly, scoring 68 and 66 points respectively [9][11]. Group 3: Image Recognition Challenges - All participating models struggled with the image recognition question (Question 6), indicating a significant shortcoming in their ability to integrate visual and textual information [27]. - The models' failure to accurately interpret the image-based question highlights the need for further development in multi-modal understanding capabilities [26][27]. Group 4: Overall Assessment - The evaluation concluded that while there has been notable progress in the mathematical reasoning abilities of the AI models, substantial improvements are still required, particularly in complex reasoning, rigorous proof, and multi-step calculations [26][28]. - The results suggest that the current AI models have potential but need to address their limitations in both mathematical problem-solving and image recognition to enhance their overall effectiveness [26][27].
AI 创业者的反思:那些被忽略的「快」与「长」
Founder Park· 2025-06-10 12:59
Core Insights - The article emphasizes the importance of "speed" and "long context" in AI entrepreneurship, highlighting that these factors are crucial for product direction and technology application [1]. Group 1: Importance of Speed - The author reflects on the significance of speed in user experience, noting that convenience can greatly influence user habits, as seen with ChatGPT and Perplexity [3][4]. - A previous underestimation of speed's impact led to a decline in usage rates, reinforcing the idea that fast-loading and smooth experiences are invaluable [4]. Group 2: Long Context Utilization - The article discusses the realization of the practical effects of long context in AI models, particularly with the introduction of models capable of handling 1 million tokens, which significantly enhances product capabilities [7][8]. - The author critiques previous industry assumptions about context usage, asserting that many claims about enterprise knowledge bases were misleading until effective models emerged [7]. Group 3: Market Dynamics and Product Strategy - The text highlights a shift in market dynamics where low Average Revenue Per User (ARPU) products can now offer strong sales and customized experiences, challenging previous notions about product distribution [6]. - The author suggests that traditional marketing strategies are being disrupted by AI capabilities, allowing for more effective customer engagement and retention strategies [6]. Group 4: Product Development and Experimentation - The article stresses the need for product managers to engage deeply with AI models, advocating for hands-on experimentation and A/B testing to refine product features [9]. - It points out that understanding the underlying model capabilities is more critical than merely focusing on user interface and experience [9]. Group 5: Future of AI Products - The author predicts that the most successful products in the AI era will be those that maximize the potential of recommendation algorithms and user-generated content ecosystems [10]. - The article concludes with a reference to the strategic focus of leading tech companies on developing superior models, suggesting that successful business models will follow [10].
看好了,这才是7家大模型做高考数学题的真实分数。
数字生命卡兹克· 2025-06-08 22:05
Core Viewpoint - The article emphasizes the importance of conducting a fair, objective, and rigorous assessment of AI models' mathematical capabilities, particularly in the context of high school entrance examinations [1]. Testing Methodology - The testing utilized the 2025 National Mathematics Exam, focusing solely on objective questions and excluding subjective ones to ensure clarity in scoring [1]. - LaTeX was used to format the questions, ensuring accurate representation of mathematical symbols, thus avoiding potential misinterpretations from image recognition [1]. - The testing excluded a specific question that involved a chart to prevent ambiguity in understanding [1]. Scoring System - The scoring followed the principles of the actual high school entrance examination, with specific point allocations for different types of questions: single-choice questions (5 points each), multiple-choice questions (6 points each), and fill-in-the-blank questions (5 points each) [3]. - Each question was answered three times by the AI models to minimize errors, with the final score calculated based on the proportion of correct answers [3]. - The models were tested without external prompts, internet access, or coding capabilities to ensure a pure assessment of reasoning skills [3]. Model Performance - The models tested included OpenAI o3, Gemini 2.5 Pro, DeepSeek R1, and others, with results indicating varying levels of performance across the board [5]. - Gemini 2.5 Pro achieved the highest accuracy, while other models like DeepSeek and Qwen3 performed less favorably due to minor errors in specific questions [10]. - The overall results suggested that the differences in performance among the models were minimal, with most errors attributed to small misinterpretations rather than significant flaws in reasoning capabilities [10]. Conclusion - The article concludes that the rigorous testing process provided valuable insights into the mathematical abilities of AI models, highlighting the need for objective and fair evaluation methods in AI assessments [10].
DeepSeek新版R1直追OpenAI o3!实测来了:“小版本升级”着实不小
量子位· 2025-05-29 01:08
Core Viewpoint - DeepSeek has released a significant update with version R1-0528, which is comparable to leading models like OpenAI's o3-high, indicating a major advancement in capabilities [1][10]. Group 1: Model Performance - The new R1 model can solve complex numerical problems that challenge top models such as o3, Gemini 2.5 pro, and Claude 4 [4]. - The model has shown improved reasoning abilities, allowing for deeper analysis similar to Google's models [10]. - In practical tests, the R1 model demonstrated enhanced programming skills and could generate executable solutions in a shorter time frame [17][20]. Group 2: Features and Improvements - The R1 model has improved writing tasks, producing more natural and better-formatted outputs [10]. - It can think for extended periods, with a maximum contemplation time of 30-60 minutes per task [10]. - The model's unique reasoning style is characterized by being quick yet thoughtful, and it considers the interest level of the answers for the user [14] [15]. Group 3: Community and Open Source Impact - The release of R1-0528 is seen as a significant victory for open-source AI, as it competes effectively with closed-source models [31]. - The community has actively engaged with the new model, sharing insights and testing results, which highlights the collaborative nature of open-source development [9][28].