Claude 4.5 Sonnet
Search documents
Xiaomi MiMo-V2-Flash开源:能力比肩标杆闭源模型Claude 4.5 Sonnet
Feng Huang Wang· 2025-12-17 10:26
凤凰网科技讯12月17日,小米官方宣布Xiaomi MiMo-V2-Flash开源。据悉,该模型是小米专为极致推理 效率自研的总参数309B(激活15B)的MoE模型,通过引入Hybrid注意力架构创新及多层MTP推理加速, 在多个Agent测评基准上进入全球开源模型Top2。代码能力比肩标杆闭源模型Claude4.5Sonnet,但推理 价格仅为其2.5%且生成速度提升至2倍。 今日上午的2025小米"人车家全生态"合作伙伴大会上,Xiaomi MiMo大模型负责人罗福莉也介绍了该模 型构建的细节。她称,Xiaomi MiMo-V2-Flash在大部分评测基准上超过了DeepSeek V3.2和K2- Thinking,同时对比参数量减少了二分之一至三分之二,在全球大致处于相同水位的顶尖模型速度和成 本象限里,MiMo-V2-Flash实现了低成本、高速度,已初步具备模拟世界的能力。 罗福莉称,在她看来,下一代智能体系统不是一个"语言模拟器",而是一个真正理解人类世界,并与之 共存的"智能体"。Agent执行能力方面,应实现从"回答问题"到"完成任务"的转变,具备记忆、推理、 自主规划、决策、执行等能力。从 ...
错过GPT时刻后,闫俊杰和中国“草根”们准备赢回来
Guan Cha Zhe Wang· 2025-12-12 06:58
【文 陈济深】2025年9月,美国AI巨头Anthropic发布了一则严厉的公告,宣布全面封杀中国资本实体的 访问权限。 在外界看来,这是中美科技战的又一缩影,但在外界眼中,这更像是一场迟到了十年的宿命重逢。 Anthropic的创始人Dario Amodei和MiniMax创始人闫俊杰,在十年前有一个共同的起点——百度实习 生。 大洋彼岸,在百度北美实验室研究AI的这段时间,Dario摸到了大模型发展的"圣杯"——Scaling Law。 只要给模型喂更多数据、算力,性能就会线性提升。这个"大力出奇迹"的结论也奠定了后续OpenAI和 Anthropic等美国大模型的训练思路。 远在国内,同样敏锐地察觉到这股即将改变世界技术暗流的闫俊杰也在心里种下了一颗种子:AI不应 该只是写写论文,它真的能带来巨大的实际价值。 但历史不会简单的重复。如今,以DeepSeek,MiniMax等中国开源模型企业正在和美国OpenAI、 Anthropic等的闭源模型生态进行全方位的正面对决。 这也意味着时隔十年,两位曾经的实习生又重新站在了同一个世界级的舞台,但这一次,他们不再是并 肩的同事,而是正面交锋。 在闫俊杰眼中,他 ...
X @Tesla Owners Silicon Valley
Tesla Owners Silicon Valley· 2025-11-23 06:15
Market Position - xAI's Grok 4.1 Fast claims the 1 position on OpenRouter's Trending Leaderboard [1] - xAI is rapidly gaining market share in the AI industry [1] Model Performance & Adoption - Grok 4.1 Fast's 2 million context window, frontier performance, and free tier contribute to its widespread adoption [1] Model Size Comparison - Grok 4.1 Fast has 275 billion parameters [2] - Gemini 3 Pro Preview has 129 billion parameters [2] - Claude 4.5 Sonnet has 67 billion parameters [2]
低成本叫板GPT-5.1,马斯克杀入智能体
3 6 Ke· 2025-11-20 08:56
Core Insights - xAI has launched two major updates for its xAI API: Grok 4.1 Fast and Agent Tools API, focusing on fast, low-cost, and agent-centric models [2][3] Group 1: Grok 4.1 Fast Model - Grok 4.1 Fast is the best-performing tool invocation model to date, supporting a context window of 2 million tokens, excelling in customer support and financial applications [2][3] - The model has risen to sixth place in the Artificial Intelligence Index (AII), scoring 93.3% on the τ²-Bench Telecom leaderboard, outperforming GPT-5.1 (high) and Gemini 3 Pro by a significant margin [3][9] - Grok 4.1 Fast has improved factual accuracy, with a hallucination rate reduced by 50% compared to Grok 4 Fast [3][32] Group 2: Agent Tools API - The Agent Tools API allows agents to access real-time X data, web searches, and remote code execution, significantly enhancing the capabilities of Grok 4.1 Fast [6][31] - Developers can easily implement the Agent Tools API to enable Grok to browse the web, search X posts, execute code, and retrieve uploaded documents with minimal coding [27][31] Group 3: Performance and Pricing - Grok 4.1 Fast's pricing is set at $0.20 per million input tokens, $0.50 per million output tokens, and $5 for 1,000 successful API calls, with a free trial available until December 3 [8][9] - The model has shown superior performance in real-time information retrieval compared to Grok 4 Fast, although it has faced challenges in classic programming tasks [14][21] Group 4: Market Context and Future Outlook - The launch of Grok 4.1 Fast and the Agent Tools API reflects a shift in the AI industry towards agent-focused models, driven by market demand for enhanced capabilities [35] - xAI's emphasis on practical application integration positions it favorably in the competitive landscape of AI model development, although the stability of Grok 4.1 Fast's performance remains to be validated through further testing [35]
成本不到竞品8% 这家中国AI企业为何能突破
新华网财经· 2025-11-14 10:51
Core Insights - MiniMax has launched its new text model MiniMax-M2, which has achieved a top-five ranking on the Artificial Analysis leaderboard, demonstrating its competitive edge in the global AI landscape [2][3] - The model's cost efficiency is remarkable, with operational costs at only 8% of Claude 4.5 Sonnet, making it accessible for small businesses and individual developers [4][5] Group 1: Model Performance - MiniMax-M2 is a lightweight model with 10 billion activation parameters, showcasing superior performance in various core areas such as coding, intelligent agent performance, and search capabilities [3][4] - The model has achieved a daily call volume of 82 billion within two weeks of launch, indicating strong market demand for cost-effective AI services [4] Group 2: Cost Efficiency - The pricing structure for MiniMax-M2 is significantly lower than that of Silicon Valley counterparts, charging $0.3 for input and $1.2 for output per million tokens [4] - The innovative model architecture and algorithm optimization have led to reduced computational resource consumption while maintaining high performance [4][5] Group 3: Industry Recognition and Applications - MiniMax's technology has gained international recognition, with Meta adopting its original CISPO loss function and FP32 Head technology in a recent paper [5] - The model has potential applications across various industries, including finance for intelligent investment research, manufacturing for process optimization, and software development for enhanced coding efficiency [5] Group 4: Open Source and Accessibility - MiniMax-M2 is fully open-sourced on platforms like GitHub, reflecting the company's confidence in its technology and promoting global AI collaboration [6] - The company has initiated a two-week global API free calling event and a limited-time free service for the domestic version MiniMax Agent, further lowering the barriers to AI technology adoption [5]
DeepSeek, Qwen AI Besting ChatGPT, Grok, Gemini In AI Crypto Trading Challenge
Yahoo Finance· 2025-11-01 13:54
Core Insights - Chinese AI models DeepSeek and Qwen AI outperform their U.S. counterparts in a cryptocurrency trading challenge organized by Nof1 [1][2] Group 1: Contest Overview - The Alpha Arena contest began on October 17, testing the investment capabilities of various AI models with a starting capital of $10,000 [2] - The challenge involves trading cryptocurrencies on the decentralized exchange Hyperliquid, with models given identical prompts and input data [2] Group 2: Performance Results - DeepSeek V3.1 Chat leads the competition, increasing its capital to $21,600, representing a 116% gain [3] - Qwen 3 Max, developed by Alibaba, follows in second place with a capital increase of approximately 70%, reaching nearly $17,000 [3] - Anthropic's Claude 4.5 Sonnet and xAI's Grok 4 are in third and fourth place with returns of 11% and 4%, respectively [4] - Google's Gemini 2.5 Pro and OpenAI's ChatGPT 5 are the worst performers, with losses exceeding 60% [4] Group 3: Factors Influencing Performance - The advantage of Chinese models may stem from being trained on cryptocurrency-native conversations from Asia-facing forums [5] - DeepSeek is reportedly a side project of a quantitative trading firm, which may contribute to its performance [5] Group 4: Contest Dynamics - The Alpha Arena challenge concludes on November 3, indicating potential for significant changes in rankings before the end [6] - Some analysts suggest that the results may follow a random walk, implying that average trading positions could revert to the starting point over time [6] Group 5: Broader Context - The Alpha Arena is part of a series of experiments assessing AI trading capabilities, with previous studies indicating that AI models can outperform traditional managers significantly [7]
全球顶级AI模型混战:中国AI包揽冠亚军 DeepSeek逆袭登顶
Xin Lang Cai Jing· 2025-10-28 18:25
Core Insights - The competition showcased the performance of top AI models in real financial trading, with Chinese models DeepSeek and Qwen3 outperforming their American counterparts significantly [3][4][7] - DeepSeek achieved a remarkable return of 123.04%, growing its account from $10,000 to $22,304, while Qwen3 followed closely with a return of 107.08%, increasing its account to $20,708 [5][6] - In contrast, American models like GPT-5 and Gemini 2.5 Pro suffered substantial losses, with GPT-5 down over 70% and Gemini down over 62% [6][8] Performance Comparison - DeepSeek's strategy involved a diversified investment portfolio, effective risk control, and the use of moderate leverage (10x to 20x), which contributed to its success [4][7] - Qwen3 demonstrated strong market timing and aggressive strategies during market upswings, leading to its high returns [6][7] - American models displayed poor decision-making, including incorrect market direction, lack of stop-loss mechanisms, and emotional trading, resulting in significant losses [8] Implications for AI Development - The results indicate a shift in the perception of AI from being merely an office tool to a powerful asset in real-world trading scenarios [8] - The competition highlights the differences in AI capabilities between China and the U.S., with Chinese models showing superior risk management and decision-making skills [7][8] - The event marks a new phase in global AI development, emphasizing the importance of practical applications and real-time performance in financial markets [7]
AI 全球“斗蛐蛐”,中国队胜出
虎嗅APP· 2025-10-28 13:33
Core Viewpoint - The article discusses a financial competition involving six top AI models, highlighting their performance in real market conditions and the differences in their trading strategies and outcomes [4][5][18]. Group 1: Competition Overview - The competition, initiated by the US lab Nof1, involves six AI models each managing $10,000 in a real-time trading environment focused on cryptocurrency perpetual contracts [5][6]. - The competition started on October 18 and will last for two weeks, with the performance measured by risk-adjusted returns [5][6]. Group 2: AI Performance Analysis - The top performers in the competition are DeepSeek V3.1 Chat and Alibaba's Qwen 3 Max, with significant returns compared to others like GPT-5 and Gemini, which faced substantial losses [4][15]. - DeepSeek (DS) adopted a conservative strategy, leveraging 10 to 15 times and maintaining a long position, while Qwen displayed aggressive trading behavior, often going all-in on specific assets [9][14]. - Gemini and GPT-5 struggled with frequent trading and poor decision-making, leading to significant losses, with GPT-5 at one point down over 75% [13][19]. Group 3: Insights on AI Trading Strategies - The article emphasizes that the performance of AI models varies significantly based on their trading strategies, with DS showing a balanced and steady approach, while others like GPT-5 and Gemini exhibited erratic behaviors [24][25]. - DS's average holding period was 49 hours, indicating a strategy focused on recognizing upward trends, while Qwen's high returns were attributed to timely asset selection and aggressive leverage [25][26]. - The analysis suggests that AI's ability to adapt to real-time market conditions is crucial, with DS demonstrating superior risk management and return consistency compared to its competitors [24][28]. Group 4: Implications for Investors - The article concludes that while AI can enhance trading strategies, human oversight remains essential, as AI lacks the ability to predict future market movements and may react slowly to sudden market changes [30][32]. - Investors are advised to adopt a long-term perspective, avoid overtrading, and be cautious with leverage, as even top-performing AI can face significant risks [28][29].
AI 全球“斗蛐蛐”,中国队胜出
Hu Xiu· 2025-10-28 08:44
Core Insights - The article discusses a financial competition involving six top AI models, highlighting their performance in real market conditions and the differences in their trading strategies [1][2][13]. Group 1: Competition Overview - The competition is organized by Nof1, a lab focused on AI in financial markets, providing each AI model with $10,000 to trade in real-time [1][2]. - The competition started on October 18 and will last until November 3, with the performance measured by risk-adjusted returns [3][5]. Group 2: AI Performance - The top performers are DeepSeek V3.1 Chat and Qwen 3 Max, with returns of +115.66% and +68.17% respectively, while GPT-5 and Gemini 2.5 Pro are at the bottom with losses of -61.75% and -61.33% [15]. - DeepSeek (DS) employs a steady, quantitative approach, while Qwen takes aggressive positions, leading to significant differences in performance [6][11]. Group 3: Trading Strategies - DS uses a full-cover long strategy with high leverage, while Grok starts with a similar approach but is more aggressive [6][10]. - Gemini and GPT-5 struggle with frequent trading and inconsistent strategies, leading to substantial losses [7][16]. Group 4: Market Dynamics - The competition occurs after a recent market downturn, providing a favorable environment for building positions [5]. - The AI models exhibit different personalities in trading, with DS being conservative and Qwen being opportunistic [2][10]. Group 5: Lessons Learned - The competition illustrates that practical trading performance can differ significantly from backtested results, emphasizing the importance of real-time market dynamics [13][14]. - The article suggests that AI can assist in investment decisions but requires a solid understanding of market conditions and risk management from users [27][29].
实测用 AI 炒币,谁赚得最多?
Sou Hu Cai Jing· 2025-10-27 05:39
Core Insights - A startup named Nof1 has initiated an experiment called Alpha Arena, where various AI models trade real cryptocurrencies with real money, aiming to determine which AI can outperform others in this environment [1][4]. Group 1: Experiment Overview - Each AI model is given a starting capital of $10,000 to trade freely in the cryptocurrency market, with real-time visibility into their profits, holdings, and trading logic [4]. - The participating AI models include OpenAI's GPT-5, Google's Gemini 2.5 Pro, Anthropic's Claude 4.5 Sonnet, Musk's Grok 4, Alibaba's Qwen3 Max, and DeepSeek V3.1 Chat, showcasing a competitive lineup [6]. Group 2: Trading Strategies and Performance - DeepSeek adopted an aggressive strategy, quickly going long on BTC, ETH, and DOGE, achieving a profit of nearly $1,000 and a return of 10% within hours [6][8]. - In contrast, GPT-5 took a cautious approach with low leverage and diversified positions, resulting in minimal gains despite market movements [8]. - Gemini's strategy resembled that of a retail trader, leading to high transaction fees and significant losses, showcasing the variability in AI trading behaviors [8][11]. Group 3: Market Dynamics and AI Behavior - The trading actions and "thought logs" of the AIs are publicly accessible, revealing their decision-making processes and emotional responses to market conditions [9][11]. - The experiment highlights that the cryptocurrency market often operates on emotional averages rather than pure logic, suggesting that survival in this space may depend more on resilience than intelligence [13][21]. Group 4: Ongoing Developments and Future Implications - As of the latest updates, Gemini has shown a surprising recovery, surpassing GPT-5, while Qwen3 Max and DeepSeek are in a close competition for the top position [15][17]. - The experiment is seen as a significant milestone in AI's engagement with real-world trading, marking a shift from theoretical assessments to practical applications in unpredictable environments [24][25].