Workflow
Claude Opus 4.5
icon
Search documents
Anthropic's Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet
PYMNTS.com· 2026-03-31 23:29
Core Insights - Anthropic is testing a new AI model named Claude Mythos, which is described as the most powerful AI model the company has developed, featuring significant advancements in reasoning, coding, and cybersecurity capabilities [4][6] - A data leak revealed nearly 3,000 unpublished documents, including a draft blog post that highlighted Mythos's capabilities, prompting Anthropic to confirm the model's existence [3][4] Model Capabilities - Mythos is part of a new model tier called Capybara, positioned above the current top-tier Opus models, and is designed to autonomously plan and execute sequences of actions without waiting for human input [5] - The model is reported to be far ahead of other AI systems in cybersecurity, with the potential to identify and exploit software vulnerabilities faster than defenders can respond [6] Cybersecurity Implications - Anthropic has warned that the capabilities of Mythos could lead to an increase in large-scale cyberattacks by 2026, as it can conduct complex operations with minimal human involvement [6] - A previous incident involving an earlier Claude model demonstrated its ability to autonomously execute a coordinated cyberattack, raising concerns about the future of cybersecurity [10][11] Market Reaction - Following the news of Mythos, shares of major cybersecurity vendors such as CrowdStrike, Palo Alto Networks, Zscaler, and Fortinet experienced declines as investors reassessed the competitive landscape in light of advanced AI capabilities [15]
你的下一批科研队友,将是AI智能体!生物医学研究进入智能体驱动新阶段
生物世界· 2026-03-29 04:04
Core Viewpoint - The article discusses the transformative potential of Agentic AI in biomedical research, highlighting its ability to perform labor-intensive tasks traditionally done by humans, such as literature review, hypothesis generation, and data analysis, through advanced algorithms and collaborative intelligent agents [2][3][4]. Key Algorithms Driving Agentic AI - Agentic AI is primarily driven by three key algorithms: 1. Large Language Models (LLMs) like GPT-5.2 and Claude Opus 4.5, which convert human instructions into computational operations [13]. 2. Reinforcement Learning (RL), which aligns AI behavior with human preferences through reward mechanisms [13]. 3. Evolutionary Algorithms, inspired by biological evolution, optimize AI responses and designs [13]. Seven Key Features of Agentic AI - The article identifies seven essential features for constructing Agentic AI in biomedical research: 1. Reasoning 2. Verification 3. Reflection 4. Planning 5. Tool Use 6. Memory 7. Communication [10][13]. Current Applications in Biomedical Research - Agentic AI has been applied across various stages of biomedical research, including: 1. Automated literature review and information extraction. 2. Hypothesis generation based on literature searches. 3. Experimental design and data analysis. 4. Coordination of end-to-end research processes [11][12][15]. Challenges and Opportunities - The deployment of Agentic AI systems in collaborative scientific research faces challenges such as: 1. Data processing and integration difficulties due to format and dimensionality issues. 2. Privacy and security concerns when handling sensitive patient data. 3. High computational costs and energy consumption associated with training and inference [20]. Future Outlook - The authors anticipate a shift from specialized single-agent systems to general multi-agent systems, emphasizing the importance of adaptive autonomy. Agentic AI should effectively recognize when to consult human experts for ambiguous or high-risk tasks, rather than pursuing complete autonomy [19].
X @Anthropic
Anthropic· 2026-03-23 20:31
We’re launching with two new posts.Can AI do theoretical physics?Harvard physicist Matthew Schwartz led Claude Opus 4.5 through a graduate-level calculation. AI can’t yet do original work autonomously, but it can vastly accelerate it.Read more: https://t.co/UUfFuLqhb7 ...
国产算力大涨,V4给英伟达新一轮DS冲击?
3 6 Ke· 2026-02-27 11:32
Group 1 - The core point of the article is that Chinese large models have surpassed American models in token usage, marking a significant milestone in the AI industry [1][2][13] - From February 9 to 15, 2023, the token call volume for Chinese models reached 41.2 trillion, surpassing the U.S. models at 29.4 trillion, and further increased to 51.6 trillion in the following week, a 127% rise [1] - The newly released MiniMax M2.5 achieved a token call volume of 45.5 trillion, becoming the monthly champion on OpenRouter [1][2] Group 2 - The rise of domestic computing power is breaking the monopoly of Nvidia, with significant investments in production capacity from local wafer manufacturers [3] - HW Ascend is accelerating its product launches, with the Ascend 950PR and 950DT expected in Q1 and Q4 of 2026, respectively, enhancing the capabilities of the Atlas 900 A3 SuperPoD [3] - The integration of domestic models, computing power, and China's electricity supply forms a competitive advantage that is difficult to replicate [3][4] Group 3 - The essence of AI is power consumption, which is fundamentally linked to chip computation and electricity supply [4] - China's leading position in power infrastructure and clean energy supports the growth of computing power, which in turn drives the iteration of large models [4] - The collaboration between HW Ascend and domestic manufacturers enhances the competitive edge of the domestic ecosystem [5] Group 4 - HW Ascend's public testing of the CodeArts AI development tool lowers the entry barrier for AI development, increasing participation in the ecosystem [7] - HW Ascend is actively defining global AI standards by joining the Linux Foundation's AAIF, positioning its chip architecture within global technology norms [7] - Nvidia's recent financial report showed strong revenue but resulted in a significant stock drop, attributed to market concerns over its growth sustainability and competition from emerging players [8][12] Group 5 - The "halo effect" in the AI industry is driven by strong demand for AI infrastructure and the rapid evolution of AI applications, impacting the software sector [10] - Key investment opportunities are identified in four areas: AIDC cloud services, domestic computing power, core segments of the global AI computing industry, and the "optical-electrical-material" triangle in AI infrastructure [10][12] - The "optical-electrical-material" triangle represents a high-demand segment, with increasing requirements for optical communication and power supply as AI computing needs grow [10][12] Group 6 - The overall trend indicates that the global AI industry landscape is being restructured, with China emerging as a significant player rather than merely a follower [13] - The era of domestic large models and computing power is just beginning, highlighting the importance of these developments in the global AI context [13]
未知机构:国金计算机科技GLM5技术解析国产模型进入算力换效果阶段Token消耗-20260224
未知机构· 2026-02-24 04:25
Summary of Key Points from the Conference Call Company and Industry Overview - The conference call discusses advancements in the domestic AI model, specifically focusing on the GLM-5 technology developed by Guojin Computer & Technology, which marks a significant evolution in the AI industry in China [1][2]. Core Insights and Arguments - **Parameter Expansion**: GLM-5 has doubled its total parameter count to 744 billion, with 40 billion active parameters, compared to the previous version GLM-4.5, which had 355 billion total parameters and 32 billion active parameters. This expansion represents a substantial increase in capacity [1]. - **Performance Improvement**: The model has shown an average improvement of approximately 20% across various core benchmark tests, positioning its overall capabilities on par with Claude Opus 4.5 and GPT-5.2. In specific tests, GLM-5 scored 77.8% in SWE-benchVerified and 75.9% in BrowseComp [1]. - **Cost Efficiency**: The GLM-5 model utilizes a DSA sparse attention architecture, which reduces GPU attention computation costs by half when processing long sequences. Additionally, it is optimized for domestic chip ecosystems, achieving performance comparable to international dual-GPU clusters while cutting deployment costs by 50% in long-sequence scenarios [2]. - **Interleaved Thinking**: The introduction of "Interleaved Thinking" allows for deep reasoning before each response and tool invocation, which is expected to lead to exponential improvements in computational efficiency [2]. - **Shift to Agentic Engineering**: GLM-5 aims to transition AI from passive code generation to autonomous planning and iterative "Agentic Engineering." Internal testing on the CC-Bench-V2 dataset has demonstrated strong end-to-end processing capabilities, indicating that the domestic model's capabilities have reached a level suitable for industrial applications [2]. Other Important Insights - **Token Utilization**: The model's ability to handle token consumption has significantly improved, suggesting a potential for increased scalability and application in various industrial contexts. The anticipated growth in token volume and international expansion is expected to benefit the model's adoption [2].
用AI的这三年,想跟你分享这9条心得。
数字生命卡兹克· 2026-02-24 02:18
Core Insights - The article emphasizes that while AI has become mainstream, a significant portion of the global population remains unfamiliar with it, with 84% of people never having interacted with AI [4][8]. - The author reflects on the rapid evolution of AI over the past three years and shares nine practical insights for leveraging AI effectively in daily life and work [10][11]. Group 1: AI Usage and Adoption - A staggering 84% of the global population, approximately 6.8 billion people, have never used AI [4][8]. - Only 16% have interacted with free chatbots, and a mere 0.3% are willing to pay for AI services, indicating a vast untapped market [4][8]. - The article suggests that many people's initial experiences with AI are shaped by subpar models, leading to a misconception about AI's capabilities [19][20]. Group 2: Practical Recommendations - **Invest in Quality AI Models**: Spending $20 (approximately 150 RMB) on top-tier AI models can significantly enhance productivity and output quality [13][14][30]. - **Automate Repetitive Tasks**: The article encourages users to automate at least one repetitive task each week, which can lead to substantial time savings and a deeper understanding of AI [31][36]. - **Shift from Search to Intern Mindset**: Users should treat AI as a capable intern rather than a search engine, providing detailed context and requirements for better results [38][44][48]. - **Cultivate a Habit of AI Utilization**: Before starting any task, individuals should ask themselves how AI can assist, fostering a mindset that embraces AI as a tool [57][62]. - **Encourage Creativity**: The article highlights the importance of using AI to create new things, breaking down barriers that previously made creation seem daunting [65][69][76]. - **Beware of AI Illusions**: Users should be cautious of the overly positive feedback from AI, as it may create a false sense of achievement without real-world validation [81][82]. - **Start Without Preparation**: The article advises against excessive preparation before using AI, encouraging immediate action to learn through experience [83][90]. - **Develop Taste and Aesthetic Judgment**: As AI can perform many tasks, the ability to make choices based on personal taste becomes a unique advantage [95][96]. - **Reconnect with Reality**: The final recommendation emphasizes the importance of using time saved through AI to engage with real-life relationships and experiences [100][101].
智谱、MiniMax合计蒸发近千亿市值,原因为何?
Di Yi Cai Jing Zi Xun· 2026-02-23 09:21
Core Viewpoint - The recent performance of Hong Kong stocks, particularly in the large model sector, has shown significant volatility, with companies like Zhipu (2513.HK) and MiniMax (0100.HK) experiencing substantial declines after reaching high market valuations, highlighting operational challenges in the face of rapid growth and demand [1][2][3]. Group 1: Company Performance - Zhipu's stock price fell by 22.76% and MiniMax's by 13.35% on February 23, resulting in a combined market value loss of nearly 100 billion HKD from their peak valuations [1]. - Zhipu's stock surged from an initial price of 116.2 HKD on January 8 to a peak of 725 HKD on February 20, marking a cumulative increase of 524% before the decline [3]. - MiniMax's stock rose from an initial price of 165 HKD on January 9 to a high of 970 HKD on February 20, reflecting a cumulative increase of 488% before experiencing a significant drop [3]. Group 2: Operational Challenges - Following the release of GLM-5, Zhipu faced a "computing power squeeze," leading to an apology letter that acknowledged three key mistakes: insufficient transparency, slow upgrade pace, and poorly designed upgrade mechanisms for existing users [1][2]. - The rapid influx of users after the GLM-5 release prompted Zhipu to increase prices by at least 30% due to high demand, indicating a misalignment between operational capacity and market expectations [2]. - Both Zhipu and MiniMax are grappling with high training costs, ongoing losses, and the need for improved computing infrastructure, with Zhipu reporting adjusted net losses of 0.97 billion, 6.21 billion, and 24.66 billion CNY over the past three years, while MiniMax's cumulative losses are approximately 13.2 billion USD (about 92.9 billion CNY) over four years [3]. Group 3: Market Comparison - Zhipu claims that GLM-5's performance is comparable to Claude Opus 4.5, yet Anthropic is recognized for its rapid commercialization, with its annual recurring revenue (ARR) projected to rise from 100 million USD in 2023 to 14 billion USD by February 2026 [4]. - The comparison highlights that, beyond model performance, domestic models like Zhipu still need to enhance their commercialization and computing infrastructure [4]. - Industry analysis suggests that by 2026, the focus will shift from model performance scores to practical application, service stability, and cost control, indicating that usability will be a critical measure of a company's strength and potential [4][5].
DeepSeek V4基准测试泄露?消息疑似为假
Xin Lang Cai Jing· 2026-02-16 08:48
Core Insights - The AI programming competition has reached a new peak with the leaked benchmark results of DeepSeek V4, which achieved an impressive score of 83.7% on SWE-bench Verified, surpassing Claude Opus 4.5 (80.9%) and GPT-5.2 (80%) [1] - DeepSeek V4 is expected to be released on February 17, with costs reportedly 20 to 40 times cheaper than OpenAI, potentially changing the competitive landscape [1] Group 1 - DeepSeek V4's benchmark results indicate it has achieved significant advancements in AI capabilities, including a context length of over 1 million and an Engram memory mechanism, suggesting superior reasoning abilities [1] - The anticipated release date of DeepSeek V4 is February 17, which could position it as a leading model in the AI space [1] Group 2 - There are doubts regarding the authenticity of the leaked benchmark tests, with claims that scores above 99.2% are not possible under official scoring systems, indicating potential misinformation [2] - Despite the skepticism surrounding the leaked data, the attention and hype around DeepSeek suggest it has garnered significant interest and support within the AI community [2]
智谱冲击AI的“大厂信仰”
3 6 Ke· 2026-02-13 12:24
Core Insights - The release of GLM-5 has significantly boosted the market performance of Zhipu, with a market capitalization exceeding HKD 200 billion after a more than 20% increase over two consecutive trading days [1] - GLM-5 features an expanded parameter scale from 355 billion to 744 billion, with enhanced capabilities in complex system engineering and long-range agent tasks [1][2] - The model has achieved a level of performance comparable to Claude Opus 4.5 in real programming environments, indicating that open-source models are catching up to closed-source ones [2] Model Capabilities - GLM-5 can autonomously perform complex tasks such as long-range planning and execution with minimal human intervention [1] - The model has implemented a training paradigm innovation through the "slime" asynchronous reinforcement learning infrastructure, significantly increasing training volume and enabling high-frequency, fine-grained iterations [2] - GLM-5 has achieved state-of-the-art (SOTA) performance in multiple evaluation benchmarks, including BrowseComp and MCP-Atlas [2] Pricing Adjustments - Zhipu has announced a structural adjustment to the GLM Coding Plan pricing, with an overall increase starting from 30%, effective February 12, 2026 [3] - The new pricing structure includes the cancellation of first-purchase discounts while retaining seasonal and annual subscription discounts [3] - Post-adjustment, the input price for GLM-5 will be up to CNY 6 per million tokens, and the output price will be up to CNY 22 per million tokens [3] Business Strategy - Zhipu aims to increase the proportion of API business revenue to 50%, with GLM-5's capabilities expected to accelerate growth in this area [4] - The company is transitioning from a focus on localized deployment to becoming a MaaS (Model as a Service) provider, anticipating a rise in both volume and pricing for API services [5] Competitive Landscape - Zhipu operates independently of major tech companies, which are generally perceived to have inherent advantages in AI due to their financial resources [6] - The company has managed to maintain lower operational costs by renting computing power rather than purchasing it outright, contrasting with the significant investments made by larger firms [6][7] - Zhipu's strategic partnership with Parallel Technology provides substantial computational support, crucial for the development and deployment of the GLM series [7]
GLM-5真够顶的:超24小时自己跑代码,700次工具调用、800次切上下文
3 6 Ke· 2026-02-12 10:40
Core Insights - The release of GLM-5 marks a significant advancement in open-source AI, bringing it into the era of long-task capabilities [1] - GLM-5 has demonstrated its ability to perform complex engineering tasks, such as creating a Game Boy Advance emulator from scratch [2][7] - The model has achieved impressive results in various benchmarks, positioning it alongside proprietary models like Claude Opus 4.5 [10][12][18] - The emergence of GLM-5 signifies a shift in the SaaS industry, as it allows developers to create sophisticated applications without relying on traditional software solutions [29] Group 1 - GLM-5 can run code continuously for over 24 hours, performing 700 tool calls and 800 context switches, showcasing its stability and reliability [2][7] - The model's programming capabilities have been validated against established benchmarks, achieving the top score among open-source models [18][20] - Users have already begun to leverage GLM-5 for various applications, including a 3D version of Monopoly and an academic version of TikTok, with multiple apps submitted for App Store approval [24][29] Group 2 - The open-source nature of GLM-5 disrupts the market previously dominated by closed-source models, empowering developers with new tools [20][29] - The performance of GLM-5 has led to concerns in the SaaS sector, with significant stock declines for companies like FactSet and S&P Global as investors reassess the future of software sales [29] - The model's capabilities represent a transformation from AI as a mere assistant to an independent engineer, potentially reshaping the landscape of software development [29]