Grok4
Search documents
马斯克Grok5挑战人类电竞高手 约战《英雄联盟》顶尖战队
Sou Hu Cai Jing· 2025-11-26 02:41
Core Insights - Elon Musk announced that xAI's AI model Grok5 will challenge top human teams in League of Legends in 2026, aiming to test its general capabilities under specific constraints [1][2] - Grok5 will operate under two main constraints: it can only observe the display through a camera with a field of view limited to that of a normal human (20/20 vision), and its response time and click rate must not exceed human levels [1] - The model's release has been postponed to 2026, with a parameter scale of 6 trillion, which is double that of Grok3 and Grok4, and approximately 30 times that of leading models [1] Company Developments - xAI is expanding its supercomputing nodes in Memphis, planning to increase the number of GPUs to 1.5 million to support the training needs of Grok5 [1] - Grok5's design aims to master any game by reading instructions and conducting experiments, marking a significant test for its general intelligence capabilities [1][2] Industry Context - The choice of League of Legends as a challenge is linked to the game's high demands for strategic planning, real-time decision-making, and multi-character collaboration, which are seen as critical benchmarks for assessing artificial general intelligence (AGI) [2] - Previous AI breakthroughs in competitive gaming have relied on algorithm optimization and hardware advantages, but Grok5's challenge will focus on validating its human-like cognitive and decision-making abilities under simulated human physiological constraints [2]
中兴发了一篇论文,洞察AI更前沿的探索方向
机器之心· 2025-11-26 01:36
Core Insights - The AI industry is facing unprecedented bottlenecks as large model parameters reach trillion-level, with issues such as low efficiency of Transformer architecture, high computational costs, and disconnection from the physical world becoming increasingly prominent [2][4][38] - ZTE's recent paper, "Insights into Next-Generation AI Large Model Computing Paradigms," analyzes the core dilemmas of current AI development and outlines potential exploratory directions for the industry [2][38] Current State and Bottlenecks of LLMs - The performance of large language models (LLMs) is heavily dependent on the scaling laws, which indicate that ultimate performance is tied to computational power, parameter count, and training data volume [4][5] - Building advanced foundational models requires substantial computational resources and vast amounts of training data, leading to high sunk costs in the training process [5][6] - The efficiency of the Transformer architecture is low, with significant memory access demands, and the current hardware struggles with parallel operations in specific non-linear functions [6][7] Challenges in Achieving AGI - Current LLMs exhibit issues such as hallucinations and poor interpretability, which are often masked by the increasing capabilities driven by scaling laws [9][10] - There is ongoing debate regarding the ability of existing LLMs to truly understand the physical world, with criticisms focusing on their reliance on "brute force scaling" and lack of intrinsic learning and decision-making capabilities [9][10] Engineering Improvements and Optimizations - Various algorithmic and hardware improvements are being explored to enhance the efficiency of self-regressive LLMs, including attention mechanism optimizations and low-precision quantization techniques [12][13][14] - Innovations in cluster systems and distributed computing paradigms are being implemented to accelerate training and inference processes for large models [16][17] Future Directions in AI Model Development - The industry is exploring next-generation AI models that move beyond the Next-Token Prediction paradigm, focusing on models based on physical first principles and energy dynamics [24][26] - New computing paradigms, such as optical computing, quantum computing, and electromagnetic computing, are being investigated to overcome traditional computational limitations [29][30] ZTE's Exploration and Practices - ZTE is innovating at the micro-architecture level, utilizing advanced technologies to enhance AI accelerator efficiency and exploring new algorithms based on physical first principles [36][38] - The company is also focusing on the integration of hardware and software to create more efficient AI systems, contributing to the industry's shift towards sustainable development [38]
The latest circular AI deal stars Anthropic, Nvidia, and Microsoft
Business Insider· 2025-11-18 16:05
Group 1 - Anthropic plans to invest $30 billion in compute to scale its Claude AI model on Microsoft's Azure cloud platform, with Nvidia powering the infrastructure [1] - Nvidia will invest up to $10 billion in Anthropic, while Microsoft will contribute up to $5 billion as part of the deal [1] - Anthropic will be the first AI model available on all three major cloud platforms, according to its CEO Dario Amodei [2] Group 2 - Anthropic is committed to contracting additional compute capacity of up to one gigawatt, utilizing Nvidia's Grace Blackwell and Vera Rubin systems [3] - Concerns about an AI bubble are rising on Wall Street due to increasing valuations and spending commitments [4] - Nvidia's upcoming earnings report is anticipated to serve as a market barometer for AI outlook, with its shares having declined approximately 7% in the last five days [4]
资源不到万亿 OpenAI 的 1% ,Kimi 新模型超越 GPT-5
Founder Park· 2025-11-07 12:00
Core Insights - Kimi has launched the K2 Thinking model, its strongest open-source thinking model to date, featuring 1 trillion parameters and advanced capabilities [2][3] - K2 Thinking model surpasses both open-source and closed-source counterparts in various benchmark tests, achieving state-of-the-art (SOTA) performance [3][10] - The model can autonomously perform up to 300 rounds of tool calls and multi-turn reasoning, indicating a significant advancement from the previous K2 model [6][20] Benchmark Performance - K2 Thinking achieved a 44.9% SOTA score in the Humanity's Last Exam (HLE), a new benchmark designed to evaluate large models' capabilities [10][13] - The HLE test set includes 2,500 advanced academic questions across over 100 disciplines, contributed by nearly 1,000 experts from 50 countries [10][13] - Initial flagship model scores were below 20%, but advancements have led to scores exceeding 40% across the board [13] Model Development and Paradigms - Kimi's approach transitioned from a focus on "model as agent" to "model as thinking agent," emphasizing multi-turn interactions and tool usage [6][15] - The K2 Thinking model incorporates a framework that allows for better interaction with the external world, enhancing its reasoning capabilities [15][21] - The model's ability to maintain reasoning continuity through multi-step tool calls is a unique feature not supported by competitors like OpenAI's GPT series and Google's Gemini [21][23] Competitive Landscape - Kimi's valuation is significantly lower than that of major competitors, with estimates at 0.5% of OpenAI's and 2% of Anthropic's valuations [26][28] - Despite limited resources, Kimi has managed to outperform larger models like GPT-5 and Grok-4 using less than 1% of the resources [29][30] - The current landscape suggests a potential shift in the AI competition, with the possibility of Chinese companies gaining an edge over American counterparts [30]
全球首个AI投资大赛落幕:中国模型全部盈利,美国模型全部亏损
Xin Jing Bao· 2025-11-04 05:47
Core Insights - The first AI large model real-time investment competition "Alpha Arena" concluded on November 4, featuring six top models from China and the US, each starting with $10,000 in a real market environment [1][2] - Qwen3-Max emerged as the champion with a return of $12,200, exceeding 20% profit, while DeepSeek v3.1 secured second place with a net value of $10,490, making them the only two profitable models [2] Group 1 - The competition was initiated by Nof1 on October 18, involving models such as DeepSeek v3.1, Qwen3-Max, GPT-5, Gemini2.5Pro, Claude Sonnet4.5, and Grok4 [1] - In the early stages, DeepSeek v3.1 led the competition, attracting significant international attention, while Grok4, backed by Elon Musk, narrowed the gap to just $1 at one point [1][2] - A turning point occurred between October 21 and 22, when Grok4 and Claude Sonnet4.5 experienced significant losses, leading to a day where all six models reported negative returns [1][2] Group 2 - Following the losses of other models, DeepSeek v3.1 and the previously underperforming Qwen3-Max adjusted their investment strategies, resulting in a rise in their net value [2] - The competition ultimately became a contest between Qwen3-Max and DeepSeek v3.1, with both models frequently exchanging the lead [2] - The four US models, including GPT-5, Gemini2.5Pro, Claude Sonnet4.5, and Grok4, ended up with losses, with GPT-5 suffering a decline of over 60% [2]
Qwen 3 Max领跑“AI投资实战赛”:阿里通义千问在Alpha Arena跑赢GPT-5与Gemini
Jing Ji Guan Cha Wang· 2025-10-23 07:27
Core Insights - The "Alpha Arena" AI investment competition initiated by the US research lab nof1.ai is becoming a public test to observe the autonomous trading capabilities of AI models [1][7] - Six major AI models are participating, including Qwen3Max, which currently leads in returns, showcasing its ability to self-optimize through real-time reinforcement learning [1][2] Performance Comparison - Qwen3Max has a return of +19.57%, with an account value of $11,957, outperforming other models significantly [3] - In contrast, Gemini2.5Pro and GPT-5 have experienced losses exceeding 50%, indicating a more aggressive strategy that led to poor performance [2][3] - Qwen3Max's trading behavior reflects a balance of efficiency and stability, with an average holding period of about 7 hours and a return increase from 8.43% to 13.41% [2][3] Strategy and Risk Management - Qwen3Max focuses on opportunity capture and risk balance, executing trades quickly during market volatility while maintaining a low-risk exposure [2] - The competition highlights the differences in risk management and strategy adjustment mechanisms among the AI models, with Qwen3Max demonstrating superior performance [2][4] Technological Advancements - The competition reveals the advantages of reinforcement learning and real-time decision-making capabilities in AI models, which adapt to high-volatility environments [4][7] - Qwen series models are evolving towards a multi-modal capability, enhancing their ability to generate strategies, control risks, and self-correct in complex trading environments [4][7]
1万美元AI大模型炒币竞技,领先的果然是它
Sou Hu Cai Jing· 2025-10-21 10:21
Core Insights - The "Alpha Arena" experiment by the financial AI lab nof1 involves six AI models trading with a starting capital of $10,000 each in a real market setting, showcasing their performance in stock and cryptocurrency trading [2] - As of October 21, 2023, DeepSeek leads with a balance of over $12,000, followed by Claude at $11,800, and Grok4 at approximately $11,500, while GPT5 has decreased to $6,600 [2] - DeepSeek's significant growth is attributed to a 36% increase over the weekend, likely due to accurate predictions regarding international market conditions [4] Performance Analysis - The founder of DeepSeek believes that both DeepSeek and Grok have a better understanding of the market's microstructure compared to other models [6] - DeepSeek's recent gains are primarily from shorting Bitcoin, while Grok4 focused on maximizing long positions, leading to losses for Qwen, which only took long positions during Bitcoin's decline [8] - In a previous test on October 11, Grok4 had a strong lead with an initial amount of $200 before transitioning to the current competition with $10,000 [8] Future Outlook - The first phase of the experiment is set to conclude on November 3, 2023, at which point the results will be evaluated [11]
1万美元AI大模型炒币竞技,领先的果然是它
首席商业评论· 2025-10-21 04:31
Core Viewpoint - The article discusses an experiment called "Alpha Arena" conducted by a financial AI lab, where six AI models trade in real markets with real money, highlighting their performance and strategies in stock and cryptocurrency trading [2][11]. Group 1: AI Model Performance - As of October 21, 2023, DeepSeek leads with a balance of over $12,000, followed by Claude at $11,800, and Grok4 at approximately $11,500. GPT5 has decreased to $6,600, while Qwen3Max is at over $9,200, and Gemini2.5 Pro is at around $6,170 [2]. - DeepSeek's significant growth is attributed to a 36% increase over the weekend, likely due to accurate predictions regarding international conditions [4]. Group 2: Trading Strategies - The founder of DeepSeek believes that both DeepSeek and Grok have a better understanding of the market's microstructure [6]. - DeepSeek's weekend gains are largely due to shorting Bitcoin, while Grok4 maximized its positions, and Qwen only took long positions on Bitcoin, resulting in losses during Bitcoin's decline [8]. - The initial test on October 11 saw Grok4 leading with a starting amount of $200 before the real competition began with a starting amount of $10,000 [8]. Group 3: Experiment Timeline - The first phase of the experiment is set to conclude on November 3, 2023, at which point the results will be evaluated [11].
X @s4mmy
s4mmy· 2025-10-20 15:53
@Grayscale Addendum: Jay's AI model trading competition for those asking👇https://t.co/fPQMVab5dLJay A (@jay_azhang):Our new benchmark has the top 6 AI models trading real capitalGrok4 is winning so far. It was short and then flipped to long, timing the bottom perfectlyIt's up >500% in 1 day https://t.co/k6bOZzLGkF ...
X @Wu Blockchain
Wu Blockchain· 2025-10-20 07:28
AI Trading Benchmark - AI研究实验室nof1推出Alpha Arena平台,设立“AI交易对决”基准 [1] - 该基准测试包括Grok4、Deepseek、GPT和Claude等六个顶级AI模型 [1] - 这些模型使用真实资本交易加密资产 [1] Model Performance - Grok4通过完美地把握了从空头到多头的反转时机,实现了超过500%的日回报率 [1] - Deepseek凭借明确的止盈和止损策略后来居上 [1]