Claude 3.7 Sonnet

Search documents
“强烈反对”美国AI公司反华言论,姚顺宇宣布跳槽!
Xin Lang Cai Jing· 2025-10-09 10:25
据香港《南华早报》10月8日报道,一名人工智能(AI)领域的中国学者宣布从美国AI初创公司 Anthropic离职,加入其竞争对手谷歌的DeepMind实验室。他表示,Anthropic的"反华言论"是自己离职 的重要原因之一。 《南华早报》报道称,近年来,包括OpenAI在内的多家美国AI公司对中国的负面言论增加,包括直接 点名来自中国的竞争者DeepSeek公司。一名要求匿名的前员工透露,OpenAI内部部分来自中国等国的 技术人员对公司的相关言论感到不安。 根据姚顺宇(Shunyu Yao)6日在个人博客发布的文章,他在大语言模型Claude的开发商Anthropic工作 不到一年就离开。他说自己"强烈反对"该公司的"反华言论"。上个月,Anthropic公司宣布将停止向"中 国实体控股的公司"提供人工智能服务,并在内部文件中将中国列为"敌对国家"。对此,姚顺宇在文中 写道:"需要说明的是,我相信Anthropic的大多数员工并不同意这种定性,但我认为,我已没有办法继 续留下来。" 报道称,公开资料显示,姚顺宇本科毕业于清华大学,后在斯坦福大学获得理论与数学物理学博士学 位,并曾在加州大学伯克利分校从事 ...
另一位Yao Shunyu也跳槽了:与Anthropic价值观有根本分歧
量子位· 2025-10-08 04:25
衡宇 发自 麦蒿寺 量子位 | 公众号 QbitAI 另一位"尧舜禹"也转会了! 刚刚,谷歌DeepMind迎来一位新研究科学家,他叫 姚顺宇 —— 清华大学物理系校友、清华本科特等奖学金获得者、本科时期就登上《Physical Review Letters》的超级学霸。 他的个人主页上明确写道,姚顺宇 已于9月19日离开Anthropic,并且在10天后正式加入谷歌DeepMind ,担任高级研究科学家,继续从事 AI方面的研究。 在Anthropic工作的1年时间里,他参与组建了公司的强化学习基础团队,负责了Claude 3.7 Sonnet框架,以及Claude 4系列背后的基本强 化学习理论。 在个人主页的随笔里,姚顺宇用了一句颇为潇洒的话作为告别: So Ant, it was good with you, but it is better without you :) 在Anthropic工作很愉快,但没有你俺会更好 除了忙忙忙忙忙以外,从一个理论物理研究者,突然变成AI大模型团队的科学家,这中间还有哪些体悟? 姚顺宇在亲笔文章《My infant year as an AI researcher ...
速递|Claude与OpenAI都在用:红杉领投AI代码审查,Irregula获8000万美元融资估值达4.5亿
Z Potentials· 2025-09-18 02:43
Core Insights - Irregular, an AI security company, has raised $80 million in a new funding round led by Sequoia Capital and Redpoint Ventures, bringing its valuation to $450 million [1] Group 1: Company Overview - Irregular, formerly known as Pattern Labs, is a significant player in the AI assessment field, with its research cited in major AI models like Claude 3.7 Sonnet and OpenAI's o3 and o4-mini [2] - The company has developed the SOLVE framework for assessing model vulnerability detection capabilities, which is widely used in the industry [3] Group 2: Funding and Future Goals - The recent funding aims to address broader goals, focusing on the early detection of new risks and behaviors before they manifest [3] - Irregular has created a sophisticated simulation environment to conduct high-intensity testing on models before their release [3] Group 3: Security Focus - The company has established complex network simulation environments where AI acts as both attacker and defender, allowing for clear identification of effective defense points and weaknesses when new models are launched [4] - The AI industry is increasingly prioritizing security, especially as risks from advanced models become more apparent [4][5] Group 4: Challenges Ahead - The founders of Irregular view the growing capabilities of large language models as just the beginning of numerous security challenges [6] - The mission of Irregular is to safeguard these increasingly complex models, acknowledging the extensive work that lies ahead [6]
大模型碰到真难题了,测了500道,o3 Pro仅通过15%
机器之心· 2025-09-14 03:07
Core Insights - The article discusses the development of a new benchmark called UQ (Unsolved Questions) to evaluate the capabilities of large language models, focusing on unsolved problems that reflect real-world challenges [2][3][5] - UQ consists of 500 challenging questions sourced from the Stack Exchange community, designed to assess reasoning, factual accuracy, and browsing capabilities of models [3][8] - The study highlights the limitations of existing benchmarks, which often prioritize difficulty over real-world applicability, and proposes a continuous evaluation method through community validation [1][5] Group 1 - UQ is a test set of 500 unsolved questions covering various topics, including computer science, mathematics, and history, aimed at evaluating model performance in a realistic context [3][8] - The selection process for UQ involved multiple filtering stages, reducing an initial pool of approximately 3 million questions to 500 through rule-based, model-based, and manual reviews [10][11] - The best-performing model in the UQ validation only succeeded in answering 15% of the questions, indicating the high difficulty level of the benchmark [5][7] Group 2 - The UQ validation process employs a composite verification strategy that leverages the strengths of different models to assess candidate answers without requiring standard answers [14][26] - The study found that using a composite validator significantly reduces self-bias and over-optimism in model evaluations, which is a common issue when models assess their own performance [24][25][26] - Results showed that a stronger answer generation model does not necessarily correlate with better answer validation performance, highlighting the complexity of model capabilities [27][28]
GPT-5:前端开发者的“选择自己的冒险路线”
3 6 Ke· 2025-09-05 10:33
Core Insights - OpenAI claims that GPT-5 excels in front-end coding, outperforming its predecessor in 70% of internal tests [2] - Mixed reviews from developers indicate that the initial excitement around GPT-5 may be overstated, with some users reporting a decline in performance [3][4] - A poll conducted by AI engineer Shawn Wang revealed that over 40% of respondents rated GPT-5 as "average" or "poor" [4] Developer Experiences - Influential developer Theo Browne initially praised GPT-5 but later expressed disappointment, stating that its performance had worsened over time [3] - A GitHub Copilot user criticized GPT-5 for its weak summarization and explanation capabilities, comparing it unfavorably to Claude Sonnet 4 [3] - Developers are exploring the potential of GPT-5 to create applications without traditional frameworks like React, suggesting a shift in front-end development practices [7][8] Performance Comparisons - The ability of GPT-5 to create websites without frameworks has impressed some developers, raising questions about the necessity of tools like React [8] - Differences in performance between various versions of GPT-5 have been noted, with some users experiencing less impressive results with non-premium versions [10] - A study by Sonar highlighted the varying coding styles and effectiveness of different AI models, indicating that GPT-5's coding personality is still being evaluated [11]
Anthropic的投资人最看好的40家AI公司 | Jinqiu Select
锦秋集· 2025-08-31 07:01
Core Trend - The AI industry is shifting from a focus on "showcasing generative capabilities" to building "operational and manageable automated workflows" [3][4]. Changes in Company Listings - In the 2025 IA40 list, the number of companies focused on workflow and agentification increased from 12 to 14, representing a rise from 26.7% to 31.1% of the total [5][6]. - Among the 28 new companies in 2025, 10 (approximately 36%) belong to the agentification category, including Distyl, Pylon, and Clarify [5]. Application Form Changes - The 2024 list included projects focused on "personal or single-point automation," which have now been replaced by companies deeply integrated into specific business processes [6]. - New entries like Pylon (customer support) and Clarify (CRM) indicate a transition of AI from peripheral tools to core operational processes within enterprises [6]. Ecosystem Support - The ecosystem supporting this "productionization" is evolving, with infrastructure companies now providing specialized components for the agent production process [7]. - Companies like CrewAI and Browserbase are enabling collaborative work among different AI agents and providing foundational environments for automated web operations [7]. Developer Workflow Enhancements - New entrants like Cursor and Lovable form a complete ecosystem from development to deployment, indicating that engineering teams are integrating "agent-based coding" into their main development processes [9]. Content Creation Trends - There is a noticeable decline in focus on design and content production, with the number of related companies decreasing from 5 to 3 [10]. - Conversely, the voice and audio sector saw a slight increase, with the number of companies rising from 1 to 2, reflecting a shift towards real-time dialogue and audio interaction applications [10]. Healthcare Sector Evolution - The healthcare sector is witnessing a shift from backend operations to frontline clinical applications, with the number of companies increasing from 1 to 2 [11]. - New entrants like Abridge focus on clinical documentation automation, indicating a move towards supporting clinical decision-making directly [11].
DeepSeek、GPT-5带头转向混合推理,一个token也不能浪费
机器之心· 2025-08-30 10:06
Core Insights - The article discusses the trend of hybrid reasoning models in AI, emphasizing the need for efficiency in computational resource usage while maintaining performance [12][11]. - Companies are increasingly adopting adaptive computing strategies to balance cost and performance, with notable implementations from major AI firms [11][12]. Group 1: Industry Trends - The phenomenon of "overthinking" in AI models leads to significant computational waste, prompting the need for adaptive computing solutions [3][11]. - Major AI companies, including OpenAI and DeepSeek, are implementing models that can switch between reasoning modes to optimize token usage, achieving reductions of 25-80% in token consumption [7][10][11]. - The emergence of hybrid reasoning models is expected to become the new norm in the large model field, with a focus on balancing cost and performance [11][12]. Group 2: Company Developments - OpenAI's GPT-5 introduces a routing mechanism that allows the model to select the appropriate reasoning mode based on user queries, enhancing user experience while managing computational costs [36][41]. - DeepSeek's v3.1 model combines reasoning and non-reasoning capabilities into a single model, offering a cost-effective alternative to competitors like GPT-5 [45][46]. - Other companies, such as Anthropic, Alibaba, and Tencent, are also exploring hybrid reasoning models, each with unique implementations and user control mechanisms [18][19][34][35]. Group 3: Economic Implications - Despite decreasing token costs, subscription fees for AI models are rising due to the demand for state-of-the-art (SOTA) models, which are more expensive to operate [14][16]. - The projected increase in token consumption for advanced AI tasks could lead to significant cost implications for users, with estimates suggesting that deep research calls could rise to $72 per day per user by 2027 [15][16]. - Companies are adjusting subscription models and usage limits to manage costs, indicating a shift in the economic landscape of AI services [16][43]. Group 4: Future Directions - The future of hybrid reasoning will focus on developing models that can intelligently self-regulate their reasoning processes to minimize costs while maximizing effectiveness [57]. - Ongoing research and development in adaptive thinking models are crucial for achieving efficient AI systems that can operate at lower costs [52][57].
从OpenAI离职创业到估值1700亿美元,Anthropic用4年时间引硅谷巨头疯狂押注
量子位· 2025-07-30 09:44
Core Viewpoint - Anthropic, the company behind Claude, is set to raise $5 billion in a new funding round, bringing its valuation to $170 billion, making it the second AI unicorn to reach a valuation of over $100 billion after OpenAI [1][2]. Funding and Valuation - In March, Anthropic's valuation was $61.5 billion, indicating a nearly threefold increase in less than six months [3][5]. - The latest funding round, led by Iconiq Capital, will significantly boost Anthropic's total funding to approximately $20 billion [8][16]. - Amazon, a major investor, is expected to participate in this funding round, further solidifying its position as Anthropic's largest investor with a total investment of $4 billion [9][14]. Competitive Landscape - The rapid growth of Anthropic's valuation puts pressure on competitors like OpenAI and xAI, both of which are also raising substantial funds for data centers and talent acquisition [4]. - OpenAI's latest valuation stands at $300 billion, while xAI aims for a valuation of $200 billion [4]. Product and Revenue Growth - Anthropic's Claude models, particularly Claude 3.7 Sonnet, have established a strong competitive edge in AI programming, outperforming GPT-4 in benchmark tests [20][22]. - The company generates 70-75% of its revenue from API usage, with significant earnings from token consumption, while traditional consumer services contribute only 10-15% [25][26]. - Annualized revenue has surged from $1 billion at the beginning of the year to $4 billion, with projections reaching $9 billion by year-end, driven by its advantages in code generation [27][28].
Agent爆火,华人赢麻了
36氪· 2025-07-24 10:36
Core Viewpoint - The article discusses the emergence of AI Agents, particularly highlighting the rapid growth and success of Chinese companies in this sector, such as MainFunc and its product Genspark, which achieved $36 million in annual recurring revenue (ARR) within 45 days of launch [4][5][25]. Group 1: Industry Trends - The AI Agent wave is characterized by a significant increase in user engagement and revenue, with Manus achieving 23 million monthly active users (MAU) shortly after its launch [9][19]. - The competitive landscape has shifted, with startups outpacing larger companies in the AI Agent space, as evidenced by the rapid ARR growth of Genspark compared to established firms [25][26]. - The article notes a decline in user engagement for some leading products, with Manus's monthly visits dropping from 23.76 million in March to 17.3 million in June [19][34]. Group 2: Key Players - MainFunc's Genspark and Manus are highlighted as leading products in the AI Agent market, with Genspark's rapid revenue growth and Manus's significant user base [5][9]. - Other notable players include Flowith, Fellou, and MiniMax, each achieving substantial web traffic and user engagement [15]. - The article emphasizes the role of Claude and Manus as catalysts for the current AI Agent boom, with Claude's advanced model capabilities enhancing the overall ecosystem [16][37]. Group 3: Challenges and Future Directions - Despite initial success, there are concerns about sustaining growth, as the novelty of AI Agents begins to wear off, leading to declining user metrics [19][34]. - The geopolitical landscape poses challenges for Chinese companies operating internationally, with Manus reportedly withdrawing from the Chinese market due to external pressures [20][21]. - The article suggests a potential shift from general-purpose Agents to vertical-specific Agents, as the latter may better meet user needs and provide a competitive edge against larger firms [37][40].
MiniMax再融22亿元?新智能体可开发演唱会选座系统
Nan Fang Du Shi Bao· 2025-07-17 04:58
Group 1: Company Developments - MiniMax is reportedly nearing completion of a new financing round of nearly $300 million, which will elevate its valuation to over $4 billion [1] - MiniMax has launched the MiniMax Agent, a full-stack development tool that allows users to create complex web applications using natural language input without programming skills [1] - The MiniMax Agent can deliver various functionalities such as API integration, real-time data handling, payment processing, and user authentication [1] Group 2: Industry Trends - The Agent technology has emerged as a significant trend in the tech industry, following the success of products like Manus and Devin, with a focus on code capabilities and information retrieval [3] - Major companies like OpenAI and Google are competing in the development of advanced agents with strong programming capabilities [3] - The industry is shifting towards hybrid reasoning models, exemplified by Anthropic's release of the Claude 3.7 Sonnet, which combines fast and slow thinking processes [3] Group 3: Technological Innovations - MiniMax introduced the MiniMax-M1, the first open-source large-scale hybrid architecture reasoning model, which is efficient in processing long context inputs and deep reasoning [4] - The hybrid architecture is expected to become mainstream in model design due to increasing demands for deployment efficiency and low latency [4] - Future research in hybrid attention architectures is encouraged to explore diverse configurations beyond simple stacking of attention layers [4]