Workflow
Claude Opus
icon
Search documents
MiniMax发布M2.5模型:1美元运行1小时,价格仅为GPT-5的1/20,性能比肩Claude Opus
硬AI· 2026-02-13 13:25
Core Viewpoint - MiniMax has launched its latest M2.5 model series, achieving a significant breakthrough in both performance and cost, aiming to address the economic feasibility of complex agent applications while claiming to have reached or refreshed the industry SOTA (state-of-the-art) levels in programming, tool invocation, and office scenarios [3][4]. Cost Efficiency - The M2.5 model demonstrates a substantial price advantage, costing only 1/10 to 1/20 of mainstream models like Claude Opus, Gemini 3 Pro, and GPT-5 when outputting 50 tokens per second [3][4]. - In a high-speed environment of 100 tokens per second, the cost for continuous operation for one hour is just $1, and it can drop to $0.3 at 50 tokens per second, allowing a budget of $10,000 to support four agents working continuously for a year [3][4]. Performance Metrics - M2.5 has shown strong performance in core programming tests, winning first place in the Multi-SWE-Bench multi-language task, with overall performance comparable to the Claude Opus series [4]. - The model has improved task completion speed by 37% compared to the previous generation M2.1, with an end-to-end runtime reduced to 22.8 minutes, matching Claude Opus 4.6 [4]. Internal Validation - Internally, MiniMax has validated the M2.5 model's capabilities, with 30% of overall tasks autonomously completed by M2.5, covering core functions such as R&D, product, and sales [4]. - In programming scenarios, M2.5-generated code accounts for 80% of newly submitted code, indicating high penetration and usability in real production environments [4]. Task Efficiency - M2.5 aims to eliminate cost constraints for running complex agents by optimizing inference speed and token efficiency, achieving a processing speed of 100 TPS (transactions per second), approximately double that of current mainstream models [7]. - The model has reduced the total token consumption per task to an average of 3.52 million tokens in SWE-Bench Verified evaluations, down from 3.72 million in M2.1, allowing for nearly unlimited agent construction and operation economically [9]. Programming Capability - M2.5 emphasizes not only code generation but also system design capabilities, evolving a native specification behavior that allows it to decompose functions, structures, and UI designs from an architect's perspective before coding [11]. - The model has been trained in over 10 programming languages, including GO, C++, Rust, and Python, across tens of thousands of real environments [12]. Testing and Validation - M2.5 has been tested on programming scaffolds like Droid and OpenCode, achieving pass rates of 79.7% and 76.1%, respectively, outperforming previous models and Claude Opus 4.6 [14]. Advanced Task Handling - In search and tool invocation, M2.5 exhibits higher decision maturity, seeking more streamlined solutions rather than merely achieving correctness, saving approximately 20% in rounds consumed compared to previous generations [16]. - For office scenarios, M2.5 integrates industry-specific knowledge through collaboration with professionals in finance and law, achieving an average win rate of 59.0% in comparisons with mainstream models, capable of producing industry-standard reports, presentations, and complex financial models [18]. Technical Foundation - The performance enhancement of M2.5 is driven by large-scale reinforcement learning (RL) through a native Agent RL framework named Forge, which decouples the underlying training engine from the agent, supporting integration with any scaffold [23]. - The engineering team has optimized asynchronous scheduling and tree-structured sample merging strategies, achieving approximately 40 times training acceleration, validating a near-linear improvement in model capabilities with increased computational power and task numbers [23]. Deployment - M2.5 is fully deployed in MiniMax Agent, API, and Coding Plan, with model weights to be open-sourced on HuggingFace, supporting local deployment [25].
Meta、OpenAI 争抢收购 OpenClaw!创始人艰难抉择:月入不到2万刀赔钱养项目,Offer拿到手软,对几十亿融资没兴趣
AI前线· 2026-02-13 08:08
Core Insights - The article discusses the challenges faced by Peter Steinberger, founder of OpenClaw, after the project's sudden rise to fame, including name changes and harassment from the crypto community [1][2] - OpenClaw is currently operating at a loss, relying on donations and limited corporate support, and is considering acquisition offers from major companies like OpenAI and Meta, with a focus on maintaining open-source status [1][2] Name Change Challenges - The project initially named Wa-Relay faced pressure from Anthropic to change its name, leading to a stressful and chaotic renaming process [4][6] - The renaming involved securing various domain names and social media handles, which proved to be a complex and time-consuming task [6][10] - The crypto community's aggressive behavior added to the pressure, resulting in account hijacking and the spread of malicious software [7][8] Technical Insights - Steinberger expressed concerns about the AI industry's exaggerated safety fears, suggesting that incidents like MoltBot are more about entertainment than real privacy threats [2][17] - He highlighted the importance of efficient collaboration in AI development, warning against overly complex agent orchestration [2][21] Security Concerns - The article addresses the security challenges faced by AI systems, emphasizing that many reported vulnerabilities stem from user misconfigurations rather than inherent system flaws [22][23] - Collaboration with VirusTotal aims to enhance security by scanning skills before deployment, although no solution can guarantee complete safety [22][23] Development Philosophy - Steinberger advocates for a shift in mindset when working with AI agents, suggesting that developers should design projects to be easily understood by agents rather than solely based on personal preferences [32][35] - The article emphasizes the importance of iterative development, where learning and adaptation occur through hands-on experience with AI tools [36][37] Future Directions - The future of AI interaction is expected to evolve towards more integrated systems that combine personal assistance with development capabilities, moving beyond current chat-based interfaces [54][56] - The article suggests that the current state of AI development is still in its early stages, with significant potential for improvement in user interaction and system integration [56][57]
未知机构:前两天市场热议的Pony终于官宣并非DeepSeekV4而是智-20260213
未知机构· 2026-02-13 02:30
前两天市场热议的 "Pony"终于官宣,并非 DeepSeek V4,而是智谱的 GLM-5。 中国 AI 公司智谱(Zhipu)即将发布其旗舰大语言模型 GLM-5,该模型参数数量是前代的两倍,旨在应对复杂 的编码和智能体任务,并已与 Anthropic 的 Claude Opus 系列进行直接对比测试。 此举意在抢在 DeepSeek 于农历新年期间发布下一代架构之前抢占先机 前两天市场热议的 "Pony"终于官宣,并非 DeepSeek V4,而是智谱的 GLM-5。 中国 AI 公司智谱(Zhipu)即将发布其旗舰大语言模型 GLM-5,该模型参数数量是前代的两倍,旨在应对复杂 的编码和智能体任务,并已与 Anthropic 的 Claude Opus 系列进行直接对比测试。 此举意在抢在 DeepSeek 于农历新年期间发布下一代架构之前抢占先机,加速国内 AI 竞赛。 智谱科技今年初上市后,本周股价已飙升超 50%,公司正从为中国企业客户提供定制化 AI 方案,转向向全球 用户提供服务。 ...
MiniMax发布M2.5模型:1美元运行1小时,价格仅为GPT-5的1/20,性能比肩Claude Opus
Hua Er Jie Jian Wen· 2026-02-13 02:15
2月13日,MiniMax公布的数据显示,M2.5展现了显著的价格优势。在每秒输出50个token的版本下,其价格仅为Claude Opus、Gemini 3 Pro以及 GPT-5等主流模型的1/10至1/20。 在每秒输出100个token的高速运行环境下,M2.5连续工作一小时的成本仅需1美元,若降至50 token/秒,成本进一步下探至0.3美元。 这意味着1万美元的预算足以支撑4个Agent连续工作一年,极大地降低了构建和运营大规模Agent集群的门槛。 MiniMax推出了其最新迭代的M2.5系列模型,在保持行业领先性能的同时,大幅降低了推理成本,试图解决复杂Agent应用在经济上不可行的痛 点,并宣称其在编程、工具调用及办公场景中已达到或刷新了行业SOTA(当前最佳)水平。 在性能维度,M2.5在核心编程测试中表现强劲,并在多语言任务Multi-SWE-Bench上取得第一,整体水平比肩Claude Opus系列。同时,模型优 化了对复杂任务的拆解能力,在SWE-Bench Verified测试中,完成任务的速度较上一代M2.1提升了37%,端到端运行时间缩短至22.8分钟,与 Claude O ...
倒反天罡,Claude“反向”操控人类,公司估值冲2万亿跃居全球第二
3 6 Ke· 2026-01-19 12:45
当一段「AI指挥人类写代码」的视频刷屏时,全球顶级资本正在疯狂涌入Claude的公司Anthropic,成为OpenAI之后第二家顶级AI独角兽。 你去查一下这个API的文档,把这段代码重构一下,注意保持风格一致。 你去发一个X消息、你去。。。 这里的Reverse还真就是「倒反天罡」,让AI指挥人类干活了! 人类工程师反倒变成了那个敲键盘执行命令的角色。 先从一场「荒诞的实验」讲起。 2026年1月17日,Midjourney的一位工程师在X上发布了一段视频,标题是「Reverse Claude Code」。 视频里,Claude Code没有在等待人类的指令,而是反过来给人类下达任务: 社区纷纷表示,这才是AI的正确用法。 | Arthur (agi/arc) @ @arthur hyper88 . 1月17日 | | | | | --- | --- | --- | --- | | this is how you're supposed to use Claude, @GeorgeInTheMeta | | | | | 这才是使用 Claude 的正确方式, @GeorgeInTheMeta | | | ...
Manus和它的“8000万名员工”
虎嗅APP· 2026-01-13 00:49
Core Viewpoint - Manus represents a significant paradigm shift in AI applications, transitioning from merely generating content to autonomously completing tasks, marking a "DeepSeek moment" in the industry [6][7]. Group 1: Manus's Unique Model - Manus has created over 80 million virtual computer instances, which are crucial to its operational model, allowing AI to autonomously handle complex tasks [9][10]. - This model signifies a shift in core operators from humans to AI, establishing Manus as an "artificial intelligence operating system" [11]. - The Manus model is expected to lead to a 0.5-level leap in human civilization, as AI takes over digital economy-related jobs [12]. Group 2: AI Application's "DeepSeek Moment" - Manus achieved an annual recurring revenue (ARR) of over $100 million within a year, indicating its strong market performance [20]. - The introduction of multi-agent systems has shown a 90.2% performance improvement in handling complex tasks compared to single-agent systems, emphasizing the importance of collaboration among AI [14][17]. - The transition from AI as a tool to AI as a worker signifies a major evolution in AI applications, moving beyond the "toy" and "assistant" phases [20]. Group 3: Technological Foundations of Multi-Agent Systems - Manus's multi-agent system relies on several core technologies, including virtual machines for secure execution environments and resource pooling for efficient resource utilization [22][24]. - The virtual machine architecture allows for independent task execution, addressing safety and reliability issues in AI applications [25]. - Intelligent orchestration ensures optimal resource allocation and task management, enhancing overall system efficiency [26][27]. Group 4: Competitive Landscape and Industry Dynamics - Major tech companies are rapidly advancing in multi-agent systems, with Meta, Google, Microsoft, and Amazon all integrating these capabilities into their platforms [30][32]. - In the domestic market, companies like Alibaba, Tencent, and Baidu are also making significant strides in developing multi-agent technologies [31]. - The emergence of new players like Kimi, which has raised $500 million for multi-agent system development, indicates a growing competitive landscape [33]. Group 5: Evolution of Human Roles - The relationship between humans and AI is shifting from operator-tool dynamics to manager-team dynamics, where humans define tasks while AI executes them [35]. - This evolution will likely reduce the demand for lower and mid-level creative jobs while amplifying the value of high-level creative work [37]. - The traditional hierarchical structure of organizations may flatten as multi-agent systems can handle the entire workflow from strategy to execution [38]. Group 6: Underestimated Risks - Data ownership and system security are critical concerns in multi-agent systems, as data becomes a currency for AI collaboration and system evolution [40][41]. - The complexity of multi-agent systems introduces new security challenges, including process safety, collaboration safety, and evolution safety [42][43]. - Balancing security and efficiency remains a fundamental challenge, as overly secure systems may hinder performance while efficient systems may expose vulnerabilities [44]. Group 7: Irreversible Development Path - The proliferation of Manus's 80 million virtual machines signals a new era of productivity, redefining the nature of work itself [47]. - In the short term, vertical applications of multi-agent systems are expected to explode across various industries, leading to intense market competition [48]. - Over the long term, human-AI collaboration will evolve into a more integrated system, blurring the lines between human and machine contributions [49].
喝点VC|YC 内部内部复盘:AI 正在进入稳定期,并逐渐形成一套可复用的AI原生公司构建路径
Z Potentials· 2026-01-11 02:00
Core Insights - The AI economy is stabilizing, with clear differentiation between model, application, and infrastructure layers, leading to a more mature path for building AI-native companies [32][20][17] - Anthropic has surpassed OpenAI as the most preferred API among YC founders, with a usage rate exceeding 52% in the latest Winter26 batch, marking a significant shift in the competitive landscape [7][5][6] - The emergence of various models, including Gemini, is reshaping preferences, with Gemini gaining traction and accounting for approximately 23% of usage in the Winter26 batch [8][10] Group 1: AI Model Preferences - Anthropic's rapid growth is attributed to its performance in coding tools and the emergence of vibe coding, which has created significant value [7][6] - The competitive landscape is shifting from model capabilities to productization, as models become commoditized and computational power becomes cheaper [7][8] - Founders are increasingly using multiple models for specific tasks, indicating a trend towards model orchestration in AI applications [15][16] Group 2: AI Bubble Discussion - Concerns about an AI bubble are likened to the telecom bubble of the 1990s, where excess infrastructure investment ultimately led to the emergence of successful applications like YouTube [17][18] - The current phase is seen as an installation stage, with heavy capital investment in infrastructure, which will eventually lead to a deployment phase where applications flourish [20][21] - The competitive dynamics among AI labs and model companies are expected to benefit startups entering the application layer, similar to the opportunities seen during the internet boom [19][18] Group 3: Trends in AI Startups - There is a growing interest in establishing smaller models and niche applications, reminiscent of the early days of SaaS startups [26][27] - The ability to fine-tune models for specific domains, such as healthcare, is becoming more prevalent, with some startups outperforming larger models like OpenAI in specific benchmarks [28][29] - The expectation is that as more models become available, there will be an increase in AI applications tailored for various tasks, driven by advancements in open-source models and reinforcement learning [28][27] Group 4: Workforce and Efficiency - AI has improved efficiency for startups, but the expectation for higher performance has led to continued hiring rather than a reduction in workforce [36][35] - The trend indicates that while AI can enhance productivity, the demand for skilled personnel remains high to meet growing customer expectations [39][36] - The narrative around AI's impact on employment is evolving, with some believing it will lead to fewer employees needed, while others argue it will necessitate more hiring to maintain service quality [39][36]
AI三国杀:OpenAI狂卷,DeepSeek封神,却被Mistral偷了家?
3 6 Ke· 2025-12-03 11:55
Core Insights - Mistral has launched two significant products: the Mistral Large 3 model and the Ministral 3 series, both of which are open-source, multimodal, and designed for practical applications [1][3]. Mistral Large 3 - Mistral Large 3 features a MoE architecture with 41 billion active parameters and 675 billion total parameters, showcasing advanced image understanding and multilingual capabilities, ranking 6th among open-source models [3][6]. - It has achieved a high ELO score, placing it in the top tier of open-source models, comparable to Kimi K2 and slightly behind DeepSeek v3.2 [6][10]. - The model performs on par with larger models like DeepSeek 37B and Kimi K2 127B across various foundational tasks, indicating its competitive strength [8][10]. - Mistral has partnered with NVIDIA to enhance the model's stability and performance by optimizing the underlying inference pathways, making it faster and more cost-effective [10][16]. Ministral 3 Series - The Ministral 3 series includes models of 3B, 8B, and 14B sizes, all capable of running on various devices, including laptops and drones, and optimized for performance [11][18]. - The instruct versions of the Ministral 3 models show significant improvements in performance, with scores of 31 (14B), 28 (8B), and 22 (3B), surpassing the previous generation [11][29]. - The 14B version of Ministral has demonstrated superior performance in reasoning tasks, outperforming competitors like Qwen 14B in multiple benchmarks [25][28]. Strategic Positioning - Mistral aims to address enterprise needs by providing customizable AI solutions that are cost-effective and reliable, contrasting with the high costs associated with proprietary models from competitors like OpenAI and Google [29][33]. - The company is evolving into a platform that not only offers models but also integrates various functionalities such as code execution and structured reasoning through its Mistral Agents API [33][37]. - Mistral's approach reflects a shift towards a more decentralized AI model, emphasizing accessibility and usability across different devices and environments, which could reshape the global AI landscape [37][39].
Bitcoin bounces back, Dell founder gifts $6 billion for 'Trump accounts'
Youtube· 2025-12-02 22:17
Market Overview - The stock market is experiencing a rebound, with the Dow up over 200 points, indicating a recovery from previous risk-off sentiment [2][3] - The Nasdaq has increased by 0.75%, while the S&P 500 is up about 0.5%, reflecting a general positive trend in the market [2][3] - The VIX volatility index has decreased, suggesting reduced market volatility compared to recent weeks [3] Sector Performance - Technology stocks are leading the market, with a notable increase of 1.11%, driven by ongoing interest in AI [5] - Energy stocks have seen a decline of 1.4%, marking them as the biggest losers in the current trading session [5] - The semiconductor sector continues to perform well, with the Philly semiconductor index up for seven consecutive days, highlighting strong investor interest [7][8] Cryptocurrency Market - Bitcoin is holding steady just below $92,000, showing a recovery of over 7% from previous lows [11][12] - Ethereum has also seen a significant increase of over 9%, indicating a positive trend in the cryptocurrency market [13] - The SEC is considering an innovation exemption for digital asset companies, which could further bolster the crypto market [12] Automotive Industry - November auto sales are estimated at 15.7 million, showing a slight improvement from October but a decline from the previous year [65] - SUVs and trucks remain the most popular vehicle types among American consumers, while compact and midsize car sales continue to decline [68][70] - The impact of tariffs on vehicle pricing has been relatively muted, with a year-over-year price increase of about 4% attributed mainly to inflation [74][76] Health Insurance Sector - Curative, a health insurance startup, has raised $150 million, achieving a valuation of $1.3 billion, focusing on preventative care [90][92] - The company reports a 30% reduction in inpatient hospital admissions within six months of employers adopting its model [92] - Curative's zero out-of-pocket cost model encourages preventive health visits, resulting in high member engagement [100][102] AI and Technology - Major firms like Bank of America and BlackRock assert that the AI boom is not a speculative bubble, with expectations for sustained growth driven by AI advancements [42][44] - The K-shaped economy is highlighted, where higher-income consumers are driving growth while lower-income consumers struggle [49][51] - OpenAI faces increasing competition from companies like Google and Anthropic, prompting a strategic shift to focus on enhancing capabilities rather than expanding offerings [55][56]
AI是「天才」还是「话术大师」?Anthropic颠覆性实验,终揭答案
3 6 Ke· 2025-10-30 10:13
Core Insights - Anthropic's CEO Dario Amodei aims to ensure that most AI model issues will be reliably detected by 2027, emphasizing the importance of explainability in AI systems [1][4][26] - The new research indicates that the Claude model exhibits a degree of introspective awareness, allowing it to control its internal states to some extent [3][5][19] - Despite these advancements, the introspective capabilities of current AI models remain unreliable and limited, lacking the depth of human-like introspection [4][14][30] Group 1 - Anthropic has developed a method to distinguish between genuine introspection and fabricated answers by injecting known concepts into the model and observing its self-reported internal states [6][8] - The Claude Opus 4 and 4.1 models performed best in introspection tests, suggesting that AI models' introspective abilities may continue to evolve [5][16] - The model demonstrated the ability to recognize injected concepts before generating outputs, indicating a level of internal cognitive processing [11][12][22] Group 2 - The detection method used in the study often fails, with Claude Opus 4.1 only showing awareness in about 20% of cases, leading to confusion or hallucinations in other instances [14][19] - The research also explored whether the model could utilize its introspective abilities in practical scenarios, revealing that it can distinguish between externally imposed and internally generated content [19][22][25] - The findings suggest that the model can reflect on its internal intentions, indicating a form of metacognitive ability [26][29] Group 3 - The implications of this research extend beyond Anthropic, as reliable introspective capabilities could redefine AI transparency and trustworthiness [32][33] - The pressing question is how quickly these introspective abilities will evolve and whether they can be made reliable enough to be trusted [33] - Researchers caution against blindly trusting the model's explanations of its reasoning processes, highlighting the need for continued scrutiny of AI capabilities [27][30]