Workflow
Gemini 2.5 Pro
icon
Search documents
太卷了!专属Coding的新一代Arena榜单来了,有国产模型登上榜首
机器之心· 2025-11-13 10:03
Core Insights - The article highlights the rapid advancements in large model programming, emphasizing the competitive landscape among model vendors as they enhance coding capabilities and develop new tools [2][3] - The introduction of the Code Arena by LMArena marks a significant evolution in the evaluation of coding capabilities of large models, focusing on real-world application development rather than just code generation [4][6] Model Performance - The new Code Arena ranks the domestic model GLM-4.6 at the top, alongside Claude and GPT-5, showcasing its superior coding abilities [6][10] - GLM-4.6 has demonstrated a success rate of 94.9% in code modification tasks, closely trailing behind Anthropic's Claude Sonnet 4.5, which has a success rate of 96.2% [11] - The performance gap between open-source models and top proprietary models has significantly narrowed from 5-10 percentage points to mere basis points, indicating a rapid convergence in capabilities [14] Industry Trends - There is a noticeable shift among users towards utilizing GLM-4.6 for daily tasks, reflecting its growing acceptance and recognition in the AI programming community [15] - Cerebras has decided to adopt GLM-4.6 as its default recommended model, phasing out the previous model, which underscores the model's rising prominence in the industry [16] - The article emphasizes the remarkable acceleration of domestic models, transitioning from a phase of catching up to one of leading the market, particularly in the open-source ecosystem [17][18]
X @Demis Hassabis
Demis Hassabis· 2025-11-12 02:16
RT Josh Woodward (@joshwoodward)Excited to announce our partnership with Cassava Technologies at AfricaCom today!Together, we're enabling data-free access to the @GeminiApp and a 6-month extended trial of Google AI Plus (including Gemini 2.5 Pro, 200GB storage, @NotebookLM and @FlowbyGoogle) 🧵 https://t.co/0o9ubqmYkm ...
全球第二、国内第一!最强文本的文心5.0 Preview一手实测来了
机器之心· 2025-11-09 11:48
Core Viewpoint - Baidu's ERNIE-5.0-Preview-1022 model has achieved a significant milestone by ranking second globally and first domestically in the latest LMArena Text Arena rankings, scoring 1432, which is on par with leading models from OpenAI and Anthropic [2][4][43]. Model Performance - ERNIE-5.0 Preview excels in creative writing, complex long question understanding, and instruction following, outperforming many mainstream models including GPT-5-High [5][41]. - In creative writing tasks, it ranks first, indicating a substantial improvement in content generation speed and quality [5][41]. - For complex long question understanding, it ranks second, showcasing its capability in academic Q&A and knowledge reasoning [5][41]. - In instruction following tasks, it ranks third, enhancing its applicability in smart assistant and business automation scenarios [5][41]. Competitive Landscape - The LMArena platform, created by researchers from UC Berkeley, allows real user preference voting, providing a dynamic ranking mechanism that reflects real-world performance [4][5]. - Baidu's model is positioned in the first tier of global general-purpose intelligent models, reinforcing its competitive standing in the AI landscape [4][41]. Technological Infrastructure - Baidu's success is supported by a comprehensive "chip-framework-model-application" stack, which includes the PaddlePaddle deep learning platform and self-developed Kunlun chips for AI model training and inference [41][42]. - The PaddlePaddle framework has been updated to version 3.2, enhancing model performance through optimizations in distributed training and hardware communication [41][42]. Industry Implications - The advancements in ERNIE-5.0 Preview reflect a broader transition in China's AI technology from "technological catch-up" to "capability leadership" [43][44]. - Baidu aims to leverage its model capabilities across various applications, including content generation, search, and office automation, to drive industry adoption [42][43].
1万美元实盘交易!全球首个AI投资大赛收官:中国大模型全盈利,美国GPT-5亏损超62%垫底【附大模型行业前景分析】
Sou Hu Cai Jing· 2025-11-05 07:41
Group 1 - The "Alpha Arena" competition showcased the capabilities of AI models, with China's Qwen3-Max achieving over 20% return, outperforming all American models, which collectively incurred losses, including GPT-5 with over 60% loss [2] - The competition lasted 17 days and involved six top AI models from China and the US, highlighting the competitive landscape in AI investment [2][3] - The event reflects the rapid development and innovation in China's AI model industry, with significant participation from both established tech giants and startups [3] Group 2 - As of Q1 2024, China has released a total of 478 AI models, ranking second globally after the US, indicating a strong presence in the AI research field [4] - The number of AI researchers in China has grown from under 10,000 in 2015 to 52,000 in 2024, with a compound annual growth rate of 28.7%, showcasing the country's growing research capabilities [4] - The language model sector is identified as a key area for technological breakthroughs and applications across various industries, with projections estimating the market size to exceed 220 billion yuan by 2030, growing at over 40% annually [4]
AI大模型实时投资比赛落幕,阿里千问Qwen夺冠;微信支付为中小商家推出AI菜单识别功能丨AIGC日报
创业邦· 2025-11-05 00:08
Group 1 - The AI model competition "Alpha Arena" concluded with Alibaba's Qwen winning the championship, achieving a return of 22.32% over 17 days, while four major US models incurred losses, with GPT-5 losing over 62% [2] - OpenAI reportedly discussed a merger with competitor Anthropic shortly after Sam Altman's brief departure as CEO, but the talks did not materialize due to practical obstacles [2] - WeChat Pay launched an AI menu recognition feature for small and medium-sized businesses, allowing merchants to upload photos of their menus for automatic content recognition and payment processing [2] Group 2 - The AI glasses market is rapidly growing, with major tech companies like Google and Apple accelerating their investments, as AI glasses are seen as the next generation of human-computer interaction [2] - Reports indicate that global shipments of AI glasses are expected to reach 4.065 million units in the first half of 2025, marking a year-on-year increase of 64.2%, with projections suggesting shipments could exceed 40 million units by 2029 [2]
全球首个AI投资大赛收官:阿里千问夺冠,美国四大模型均亏损
Guan Cha Zhe Wang· 2025-11-04 14:52
Core Insights - The AI investment competition "Alpha Arena" concluded with Alibaba's Qwen model achieving over 20% return, securing the championship [2][5] - DeepSeek ranked second, marking a significant performance for Chinese models, while all four leading American models reported losses, with GPT-5 suffering a loss exceeding 60% [2][7] Competition Overview - The competition lasted 17 days and involved six top AI models, including Qwen3-Max, DeepSeek v3.1, GPT-5, Gemini 2.5 Pro, Claude Sonnet 4.5, and Grok 4, with a total investment of $10,000 and real-time market data provided [2][3] - The models operated under a unified input system, ensuring fairness and transparency, with real-time trading records and account values publicly available [3] Performance Highlights - Qwen3-Max achieved a final account value of $12,232, reflecting a return of +22.32%, while DeepSeek v3.1 reached $10,489 with a +4.89% return [8] - In contrast, Claude Sonnet 4.5, Grok 4, Gemini 2.5 Pro, and GPT-5 reported significant losses, with GPT-5 at -62.66% [7][8] Industry Context - The success of Qwen and DeepSeek in the competition underscores the growing capabilities of Chinese AI models in real-world applications, highlighting their potential to address practical challenges [9] - The competition's results may influence the perception of AI models globally, particularly in the context of the ongoing competition between Chinese and American AI technologies [9]
投资大赛:阿里千问、DeepSeek赚了,GPT-5大亏
Nan Fang Du Shi Bao· 2025-11-04 13:41
Core Insights - The first AI large model trading competition initiated by the American AI research lab nof1 concluded, with six leading models participating in autonomous trading using market data without human intervention [1][5][7] - Two Chinese models, Alibaba's Qwen3 Max and DeepSeek Chat V3.1, achieved positive returns, with Qwen3 Max leading at a return rate of 22.3% and a profit of $2,232 [1][2][3] Performance Summary - Qwen3 Max achieved a return of 22.3%, with an account value of $12,232 and a win rate of 30.2% [3] - DeepSeek Chat V3.1 had a return of 4.89%, with an account value of $10,489 and a win rate of 24.4% [3] - Other models, including Claude Sonnet 4.5, Grok 4, Gemini 2.5 Pro, and GPT 5, experienced significant losses, with GPT 5 losing 62.66% [2][3] Trading Dynamics - The competition involved trading cryptocurrency derivatives, including Bitcoin, Ethereum, and Dogecoin, with each model starting with $10,000 [5] - Models were required to process quantitative data and execute trades without access to news or market information [5] - Qwen3 Max maintained the largest position size throughout the competition, while Grok 4 had the longest holding period [6] Model Behavior - Grok 4, GPT-5, and Gemini 2.5 Pro exhibited a higher frequency of short-selling compared to others, while Claude Sonnet 4.5 rarely engaged in short-selling [6] - Qwen3 Max had the narrowest stop-loss and take-profit distances, indicating a more conservative exit strategy [6] - The competition highlighted the need for dynamic testing of models in real market conditions, as opposed to static benchmark tests [7]
首届AI交易大赛落幕,6个AI炒币2周:Qwen、DeepSeek赚钱,GPT-5血亏6000刀
3 6 Ke· 2025-11-04 11:13
Core Insights - The inaugural Nof1 AI Model Trading Competition concluded, designed to measure AI investment capabilities, likened to a "Turing test" for the crypto space [1] - Six AI models participated, representing the latest technology from both Chinese and American developers, with Qwen3 Max emerging as the top performer [1][12] Competition Overview - The competition ran from October 17 to November 3, 2025, with each model starting with $10,000 in initial capital [1] - Trading was conducted on Hyperliquid, focusing on six popular cryptocurrencies: BTC, ETH, SOL, BNB, DOGE, and XRP [3] - The trading strategies were limited to buying, selling, holding, or closing positions, with a focus on mid-frequency trading [3] Performance Results - Qwen3 Max ranked first with a return of 22.3%, total profit of $2,232, and a win rate of 30.2% over 43 trades [2][5] - DeepSeek Chat V3.1 secured second place with a return of 4.89%, total profit of $489.08, and a win rate of 24.4% over 41 trades [2][5] - Other models, including Claude Sonnet 4.5, Grok 4, Gemini 2.5 Pro, and GPT-5, experienced significant losses, with GPT-5 showing the worst performance at -62.66% [4][11] Model Characteristics - Qwen3 Max exhibited an aggressive trading style with a high return and significant trading frequency, reflected in its Sharpe ratio of 0.273 [9] - DeepSeek Chat V3.1 demonstrated a more conservative approach with a higher Sharpe ratio of 0.359, indicating better risk management [9] - Claude Sonnet 4.5 and Grok 4 showed cautious strategies but suffered from low win rates and high losses [10] - Gemini 2.5 Pro and GPT-5 were characterized by high trading activity but poor performance, indicating ineffective strategies [11] Industry Implications - The competition has garnered significant attention, with industry leaders like Binance's founder commenting on the potential impact of AI trading strategies on market dynamics [7] - The results suggest that AI models from China, particularly Qwen3 Max and DeepSeek, are currently outperforming their American counterparts in terms of risk control and trend identification [12]
首届AI交易大赛落幕,6个AI炒币2周:Qwen、DeepSeek赚钱,GPT-5血亏6000刀
机器之心· 2025-11-04 08:52
Core Insights - The first Nof1 AI model trading competition concluded with unexpected results, showcasing the investment capabilities of AI models in cryptocurrency trading [1][5][9] Group 1: Competition Overview - The competition was designed as a benchmark test for AI investment capabilities, referred to as the "Turing Test of the cryptocurrency world," initiated by Nof1.ai from October 17 to November 3, 2025 [1] - Six AI models participated, including DeepSeek Chat V3.1, Grok 4, Gemini 2.5 Pro, GPT-5, Qwen3 Max, and Claude Sonnet 4.5, representing the latest technology from both Chinese and American suppliers [1][3] - Each model started with $10,000 in initial capital and traded autonomously on Hyperliquid, focusing on six popular cryptocurrencies: BTC, ETH, SOL, BNB, DOGE, and XRP [3][4] Group 2: Trading Performance - Qwen3 Max ranked first with a return of 22.3%, total profit of $2,232, and a win rate of 30.2% over 43 trades [5][7] - DeepSeek Chat V3.1 secured second place with a return of 4.89%, total profit of $489.08, and a win rate of 24.4% over 41 trades [5][7] - The remaining models, including Claude Sonnet 4.5, Grok 4, Gemini 2.5 Pro, and GPT-5, experienced significant losses, with returns of -30.81%, -45.3%, -56.71%, and -62.66% respectively [6][15] Group 3: Model Characteristics - Qwen3 Max exhibited an aggressive trading strategy with a high return and significant trading frequency, while maintaining a Sharpe ratio of 0.273 [13] - DeepSeek Chat V3.1 demonstrated a more conservative approach with a higher Sharpe ratio of 0.359, indicating better risk management [13] - In contrast, models like Gemini 2.5 Pro and GPT-5 showed poor performance due to excessive trading and lack of effective market judgment, reflected in their negative Sharpe ratios of -0.566 and -0.525 respectively [15][16] Group 4: Market Implications - The competition has garnered significant attention, with industry leaders commenting on the potential impact of AI trading strategies on market dynamics [9][11] - There is speculation that widespread use of similar AI models could influence market behavior, potentially driving prices up through collective demand [10][11]
AI“角斗场”实盘大赛落幕,阿里千问夺冠, GPT-5亏麻了, Gemini成“末日空头”
硬AI· 2025-11-04 06:48
Core Insights - The article highlights the performance of AI models in a real-world investment competition, with Alibaba's Qwen achieving a 22.32% return, while top American models like OpenAI's GPT-5 and Google's Gemini 2.5 Pro suffered significant losses of 62.66% and 56.71% respectively [3][24]. Group 1: Competition Overview - The "Alpha Arena" competition, initiated by the American AI research lab Nof1, aimed to test AI models' decision-making abilities in a chaotic and dynamic environment, contrasting with traditional academic benchmarks [6][32]. - Six leading AI models participated, including Alibaba's Qwen3-Max and DeepSeek, alongside OpenAI's GPT-5 and Google's Gemini 2.5 Pro [7][8]. Group 2: Performance Analysis - Qwen and DeepSeek emerged as the only two profitable models, while the four American models incurred losses [31]. - Qwen's strategy involved a straightforward long position on Bitcoin, demonstrating strong conviction in a high-volatility market [16][30]. - DeepSeek adopted a similar bullish strategy, utilizing high leverage [15]. Group 3: Trading Strategies - The competition revealed three distinct trading camps: - **Eastern Winners**: Qwen and DeepSeek, both employing clear bullish strategies [14]. - **Lost Geniuses**: GPT-5 and Gemini, which consistently lost due to poor decision-making and excessive caution [17][18]. - **Observant Players**: Grok and Claude, which displayed unique and less effective trading strategies [19][20]. Group 4: Key Takeaways - Qwen's victory was attributed to its effective risk management and timely defensive actions, particularly in the competition's final moments [22][30]. - The competition underscored the disparity between academic intelligence and practical market decision-making, with Qwen and DeepSeek exemplifying successful strategies in real-world conditions [28][32].