Workflow
海外独角兽
icon
Search documents
Harness is the New Dataset:模型智能提升的下一个关键方向
海外独角兽· 2026-03-26 12:08
作者:Celia 编辑:Siqi 最 近 , harness engineering 又 成 了 继 prompt engineering 、 context engineering 之 后 新 一 代 的 buzzword。 这背后对应着一个越来越清晰的变化: 当基模能力逐渐成熟后, 现在 真正决定 agent 上限的,已 经不是模型本身,而是围绕模型搭建起来的整套系统。 尤其对于模型公司来说,谁更早把 harness 跑顺,谁就更早有机会捕获高质量的执行轨迹;谁能持 续捕获这些轨迹,谁就更有可能形成更强的数据飞轮。 Deepmind 的 Staff Engineer Philipp Schmid 甚至直接给出了一个判断:"The Harness is the Dataset. Competitive advantage is now the trajectories your harness captures (Harness 本身就是数据集。现在真 正的竞争优势,在于你的 harness 能捕获到怎样的执行轨迹) ." 所以我们最近深入研究了一下这个概念,梳理了 Anthropic、OpenAI、Goo ...
Harness Engineering 为什么是 Agent 时代的“控制论”?
海外独角兽· 2026-03-18 04:17
1948 年,Norbert Wiener 将这种模式命名为控制论(cybernetics)。 因此,真正值得追问的问题或许不是"AI 会不会取代程序员",而是:当反馈回路终于能够在"架构 决策"这一层闭合时,工程师需要做什么,才能让这套机制真正运转起来? 作者:George Zhang(OpenClaw 维护者 ) 本文是 George Zhang 对 Harness engineering 的解读,原文发布于他的 X。 今年 2 月,OpenAI 发布了一篇文章 Harness engineering: leveraging Codex in an agent-first world , 描述了一种新的工作方式:工程师不再直接编写代码,而是设计环境、制定规则,让 agent 在其中 完成编码。 这篇文章很快在技术圈引发了广泛讨论。有人认为这是软件工程的终结,也有人觉得不过是新的炒 作。事实上,围绕 AI coding 的叙事一直在演化:从最早的 prompt engineering,到 context engineering,再到如今的 harness engineering,工程师的关注点逐渐从"如何与 ...
OpenClaw 引爆 AI 安全焦虑,Armadin 的 Agent 攻防闭环会成为新范式吗?
海外独角兽· 2026-03-17 12:07
编译:Haozhen 编辑:Siqi OpenClaw 的迅速走红展示了 agent 自主执行复杂任务的强大能力,同时高权限操作也带来了前所未 有的网络安全风险:当 agent 可以持续运行、调用工具并执行复杂任务时,黑客同样可以自动化完 成侦察、漏洞利用和横向移动等攻击流程,使得网络攻击活动的规模、速度和覆盖范围显著扩大, 而传统依赖人工渗透测试和周期性扫描的安全体系,越来越难以覆盖真实的攻击路径。 历史上,每一次技术范式的转变都会催生新的安全公司,安全的战略地位也在不断前移。上周 Google 以约 320 亿美元的价格收购了云安全公司 Wiz,这不仅是 Google 历史上最大一笔收购,也 是全球网络安全史上金额最大的并购。这一交易释放出明确信号:AI 正让企业云环境更加动态、 边界更加模糊,传统静态防御架构正在失效,在 agent 时代,能够理解复杂环境中非线性攻击路径 的全局安全视角已成为必需。 Armadin 正是在这一背景下成立的一家网络安全公司。与传统低频率、低覆盖率的渗透测试不同, Armadin 构建了一套 agent swarm 系统,能够持续模拟真实攻击者的行为,在大规模环境中不断探 ...
Legora、Mercor 都在用,Reducto 能成为独立的 LLM 数据入口吗?
海外独角兽· 2026-03-12 12:08
Core Insights - Reducto addresses the critical bottleneck of "accurate data ingestion" in AI applications, focusing on transforming complex documents into structured inputs that large language models (LLMs) can understand [2][3][4] - The company achieved a valuation of $600 million after completing two funding rounds led by Benchmark and a16z within six months, tripling its valuation [3][4] - The primary challenge for Reducto is whether its Agentic OCR technology will remain a standalone data ingestion layer or be absorbed by the capabilities of foundational models [2][6] Industry Pain Points - A significant portion of enterprise data (approximately 80%) exists in unstructured formats like PDFs and Excel files, which traditional OCR struggles to interpret accurately [3][4] - The demand for precise data analysis has increased as businesses transition from proof of concept (PoC) to production environments, where even minor parsing errors can be magnified in automated decision-making processes [3][4] Reducto's Technology and Market Position - Reducto employs a three-layer proprietary architecture that includes computer vision layout analysis, VLM semantic understanding, and Agentic OCR for multi-round self-correction, enabling it to outperform traditional competitors in complex document scenarios [4][5] - The company has secured clients across various sectors, including AI-native companies, data annotation firms, and Fortune 10 enterprises, indicating a broad market appeal [5][31] Competitive Landscape - Reducto faces competition from various players, including native multimodal models like Google Gemini, cloud infrastructure providers like AWS Textract, and AI data processing platforms like Unstructured.io [44][49] - The rise of multimodal model capabilities poses a significant threat to Reducto, particularly in simpler document scenarios where foundational models may soon surpass Reducto's accuracy and cost-effectiveness [6][47] Product Development and Features - Reducto's product has evolved from a document parsing API to a comprehensive data connection layer, offering functionalities such as document editing, structured information extraction, and content classification [16][21] - The company utilizes a usage-based pricing model, which may limit its market share due to relatively high costs compared to cloud providers [29][49] Team and Funding - Founded in 2023 by Adit Abraham and Raunak Chowdhuri, Reducto has raised a total of $108.4 million across four funding rounds, with a lean team primarily composed of engineers and researchers [55][61] - The company has demonstrated strong early traction, achieving an annual recurring revenue (ARR) of over $1 million with a small team [55][61]
为什么顶尖投行都选择了 Rogo 这个金融 Agent?
海外独角兽· 2026-03-05 12:07
Core Insights - The article discusses the emergence of Rogo, a company aiming to integrate AI into the financial analysis workflow, addressing the industry's pain points related to repetitive tasks and data accessibility [2][4][5]. Industry Pain Points - The global investment banking sector handles over $3.5 trillion in transactions annually, primarily relying on junior bankers who often work over 100 hours a week on repetitive tasks [4]. - Major banks like JP Morgan and Bank of America have implemented strict work hour limits due to severe burnout among employees, highlighting the low-value nature of many tasks performed [5]. - Financial workflows present three significant challenges for AI integration: low tolerance for errors, strong data barriers due to proprietary databases, and complex internal workflows that are difficult to automate [6][5]. Company Overview - Rogo was founded in January 2022 by Princeton alumni Gabriel Stengel and John Willett, who have firsthand experience in investment banking [7][10]. - The company aims to embed AI capabilities directly into existing analyst workflows, integrating with core data sources like Capital IQ and FactSet [2][12]. Product Development - Initially, Rogo's product was a natural language query interface for financial data, but it pivoted to a generative AI architecture following the success of ChatGPT [9]. - Rogo's platform now serves over 50 top financial institutions, with daily active users exceeding 25,000 and an annual recurring revenue (ARR) growth of 27 times within two years [3][10]. Product Features - Rogo's platform integrates research, modeling, document processing, and data operations into a single interface, enhancing the efficiency of financial analysts [12]. - The product includes a research assistant that provides access to over 50 million financial documents, allowing analysts to query data in natural language and receive structured answers with source citations [12][18]. Business Model - Rogo operates on a seat-based subscription model, charging several thousand dollars per seat annually, which can be offset by the savings from reducing the headcount of junior analysts [30]. - The company has established a prestigious client list, including major investment banks, which enhances its credibility and facilitates customer acquisition [30][31]. Market Potential - The core financial data and research retrieval market, dominated by companies like Bloomberg and S&P Capital IQ, generates annual subscription revenues of $25-30 billion [32]. - Rogo aims to convert high operational costs into low marginal costs through AI, targeting even a 10% reduction in inefficiencies could represent a vast total addressable market (TAM) [32][36]. Competitive Landscape - Rogo competes with AI-native players like Hebbia and Boosted.ai, each focusing on different aspects of financial analysis and document processing [54][66]. - Major AI model providers like Anthropic and OpenAI are also entering the financial services space, creating a competitive environment for Rogo [67].
国产模型春节大考:来自 MiniMax、GLM、Seedance 开发者的一线复盘|Best Ideas
海外独角兽· 2026-02-28 09:43
Core Insights - The article discusses the rapid advancement of domestic AI models in China, particularly in the context of the recent "Spring Festival Exam" for these models, highlighting their growing influence and competitive edge against international counterparts [5][6]. Group 1: Model Performance and Trends - Four domestic open-source models (MiniMax M2.5, Kimi 2.5, GLM-5, and DeepSeek V3.2) accounted for 84.4% of the top 5 models' total token usage, indicating a significant shift towards domestic capabilities in AI [6]. - The transition of model capabilities from verifiable tasks to fuzzy tasks is noted, emphasizing the need for models to evolve in self-criticism and self-improvement for complex tasks [10][12]. - Continual learning is identified as a key trend, with ongoing exploration needed to enhance models' ability to update their internal states post-deployment [12][13]. Group 2: Data and Infrastructure - The importance of data acquisition and processing is highlighted as a critical differentiator for model capabilities, with Seedance 2.0 showcasing significant advancements in handling long-tail data [14][15]. - Domestic companies face challenges in infrastructure compared to international players, particularly in inference speed and stability, but are innovating to improve efficiency [15][16]. - The labor-intensive nature of data collection in China is seen as an advantage, with a large number of independent video production teams capable of generating high-quality data [16][17]. Group 3: Market Dynamics and Commercialization - The AI market is expected to evolve into a diverse ecosystem rather than a winner-takes-all scenario, with various companies carving out their niches [10][11]. - The AI coding market is projected to be worth at least $100 billion, with the video AI market potentially being equally significant due to high user engagement on platforms like TikTok [11][12]. - The article warns of potential market "involution" due to low-price competition among domestic models, which could lead to thin profit margins similar to other manufacturing sectors [30][31]. Group 4: Future Outlook and Strategies - The article predicts a significant increase in token consumption driven by advancements in open-source models and the rise of video generation capabilities [29][30]. - Strategies for international expansion include leveraging partnerships and delayed open-sourcing to maximize revenue while maintaining competitive pressure on closed-source models [32][33]. - The focus on consumer-oriented products is emphasized as a potential path for domestic companies to navigate geopolitical challenges and achieve global market success [34][35].
OpenClaw 是一个信号|2026 Long-Horizon Agent 投资地图
海外独角兽· 2026-02-26 12:04
Core Insights - The emergence of OpenClaw signifies a new phase for AI Agents, transitioning from mere software tools to digital labor capable of executing long-term tasks and managing complex workflows [2][3] - Long-Horizon Agents (LHA) are redefining the economic model of software, shifting from functionality-based pricing to outcome-based pricing, where clients pay for results rather than features [4][5] - The article explores the implications of AI Agents moving from coding environments to real-world enterprise processes, highlighting the potential for significant value migration within industries [2][12] Group 1: AI Agents as Labor - OpenClaw represents a new form of AI Agent that can perform long-term tasks and operate across systems, approaching the capabilities of digital employees [3] - Long-Horizon Agents can break down vague goals into sub-tasks, maintain state over extended periods, and self-correct during execution, marking a significant advancement in AI capabilities [3][4] Group 2: Economic Transformation - The introduction of Service-as-Software is unlocking a $13 trillion labor expenditure market in the U.S., representing a 30x expansion opportunity compared to traditional SaaS [5] - Companies are increasingly adopting outcome-based pricing models, where payment is tied to completed tasks or saved labor costs, reflecting a shift from selling tools to selling labor [5][6] Group 3: Changing Competitive Landscape - The transition from System of Record to System of Action indicates a new competitive advantage for AI Agents, where execution data becomes a critical asset [7] - Workflow Data Gravity is emerging as a new moat, as agents accumulate unique execution data that enhances their performance in specific enterprise environments [7][8] Group 4: Future of AI Agents - By 2026, AI Agents are expected to evolve from passive responders to proactive participants, capable of observing environments and executing tasks autonomously [8][12] - The integration of Voice Agents is crucial, as they will serve as the interface for AI Agents, handling complex emotional interactions and compliance requirements [17][18] Group 5: Investment Opportunities - Companies that enable Long-Horizon capabilities to solve high-value business problems are poised for growth, particularly those focusing on Reasoning Orchestrators and Process Intelligence [12][14] - The article identifies key investment theses, including companies that sell labor directly, emphasizing those that charge based on FTE or outcome pricing [15][16] Group 6: Industry Applications - The article highlights various sectors where AI Agents can replace full-time employees, particularly in high-compliance industries like insurance and finance [45][46] - Companies like Serval and Distyl AI are leading the charge in automating enterprise workflows, demonstrating the potential for significant operational efficiencies [40][43]
当人读不懂 AI 代码,Traversal 如何做企业运维的 AI 医生?
海外独角兽· 2026-02-11 12:06
Core Insights - The article emphasizes the growing complexity of code operations due to advancements in AI coding, particularly highlighting the "Claude Hole" phenomenon where AI-generated code logic becomes difficult for humans to understand [2][4][14] - Traversal, a startup founded by professors and quantitative traders from MIT and Berkeley, aims to address these operational challenges by utilizing causal inference to create an autonomous decision-making SRE agent [2][4][5] Industry Pain Points - Traditional observability tools like Datadog can only display metrics without explaining the underlying causes, leading to high costs for engineers who must rely on experience for troubleshooting [4][10] - The increasing complexity of code, driven by AI coding, has resulted in a significant rise in operational difficulties, with companies facing annual losses of approximately $400 billion due to downtime [10][14] - The total addressable market (TAM) for software operations is estimated at $1.1 trillion, with a significant portion driven by the need for commercial software to replace self-built systems [8][9] Traversal's Unique Proposition - Traversal's causal inference-based architecture allows for precise fault localization by simulating scenarios and scanning code changes, achieving over 90% attribution accuracy in high-stakes incidents for major clients like American Express and Digital Ocean [4][5] - The founding team’s strong academic and quantitative background enables Traversal to approach SRE challenges from first principles, differentiating it from traditional log analysis methods [5][23] Competitive Landscape - Traversal faces competition from established observability giants like Datadog and emerging AI SRE tools, but its unique capabilities in causal analysis and automated remediation position it favorably in the market [3][6][62] - The article notes that while traditional tools focus on data visualization and correlation, Traversal aims to provide a comprehensive understanding of system behavior and root cause analysis [62][70] Business Model - Traversal employs a hybrid pricing model based on results, charging a fixed fee related to system scale and a variable fee based on the value created through successful incident resolutions [48][49] - This model addresses the common issue in traditional tools where costs increase with data volume without a corresponding increase in value [48] Customer Validation - Traversal has demonstrated significant improvements in operational efficiency for clients, with reported reductions in mean time to recovery (MTTR) by up to 90% and enhanced root cause analysis success rates [50][53] - Notable clients include Digital Ocean, American Express, and other Fortune 100 companies, highlighting the effectiveness of Traversal's solutions in real-world scenarios [50][53]
深度讨论 OpenClaw:高价值 Agent 解锁 10x Token 消耗,Anthropic 超越微软之路开启
海外独角兽· 2026-02-05 12:18
Core Insights - The article discusses the emergence of high-value Agents in 2026, showcasing their ability to take over complex tasks and integrate into core workflows, significantly impacting existing SaaS models and human-machine collaboration [4][6]. - OpenClaw, a notable product, is highlighted for its innovative features, including pre-installed Claude Skills, enabling it to operate continuously and proactively [8][10]. - The discussion emphasizes the shift in the value of Agents, with predictions of a tenfold increase in token consumption by 2026, driven by the demand for high-value tasks [23][24]. Group 1: OpenClaw and Its Features - OpenClaw's design allows for continuous operation on local devices or cloud virtual machines, transforming it into a proactive agent that can monitor tasks and push notifications [10][11]. - The integration of IM Gateway enables OpenClaw to embed itself into users' daily communication flows, enhancing its effectiveness compared to traditional chatbots [10][12]. - OpenClaw's success is attributed to its pre-installed Claude Skills, which lowers the barrier for user adoption by providing a ready-to-use ecosystem [10][11]. Group 2: Market Dynamics and Predictions - The article notes that high-value Agents are expected to disrupt enterprise salary budgets, as they can perform tasks traditionally done by human workers, leading to a shift in how companies allocate their budgets [21][22]. - Predictions indicate that token consumption will increase by at least ten times in 2026, driven by the efficiency of high-value task execution by Agents [23][24]. - The emergence of open-source models achieving a "usable lower limit" is seen as a catalyst for this token consumption explosion, allowing for broader commercial applications [25][27]. Group 3: The Future of Software and Agents - The article posits that software may evolve into mere tools as Agents take over more tasks, potentially leading to a significant reduction in the need for traditional software interfaces [48][49]. - There is a debate on whether Agents will completely replace software or merely transform it into a backend tool, emphasizing the need for stability and accuracy in enterprise applications [52]. - The article suggests that the future of Agents will require a robust infrastructure designed specifically for their needs, addressing current limitations in cross-platform task execution and security [38][39]. Group 4: User Adoption and Market Penetration - The article highlights the challenge of scaling Agent usage from millions to billions, proposing three distinct product paths targeting different user demographics [53][54]. - The first path focuses on technical users, the second on knowledge workers, and the third aims at the general public through social interaction, leveraging network effects for broader adoption [54][55]. - This multi-faceted approach is seen as essential for bridging the gap between current Agent usage and potential widespread adoption [53][54].
How To Play AI Beta:拾象 2026 AGI 投资思考开源
海外独角兽· 2026-02-02 01:14
Core Insights - The rapid evolution of AI is outpacing market expectations, with significant shifts in consensus and narratives occurring almost monthly [2] - The report aims to recalibrate the understanding of the current AI competitive landscape and identify key technological and product trends that may dominate by 2026 [2] Current Landscape - The leading AI models are dominated by OpenAI, Anthropic, and Google, forming a top tier where slight advantages in model capabilities translate into substantial commercial value [6] - The competitive state among AI labs is characterized by alternating leadership and differentiation [4] Trends in AI Development - **Trend 1: Differentiation in Technical Approaches** - OpenAI focuses on consumer applications, maintaining a significant lead with ChatGPT, which has around 480-500 million daily active users, compared to Gemini's approximately 90 million [7] - Anthropic targets business applications and coding, with Claude Opus 4.5 being a strong performer in software development [7] - Google prioritizes multimodal capabilities, with Gemini 3 leading in this area but still catching up in text and coding capabilities [8] - **Trend 2: Two Major Computing Camps** - The industry is forming two camps: GPU (NVIDIA) and TPU (Google), with Google creating an integrated ecosystem while NVIDIA supports a broader alliance [10] - Current performance favors GPUs, but TPUs show potential for better cost control [10] Future Predictions - **Prediction 1: Continued Learning as a Key Paradigm** - Continual Learning is emerging as a critical paradigm, with expectations for significant advancements by 2026 [15] - This approach emphasizes models' ability to learn autonomously from interactions, moving from static to dynamic learning [16] - **Prediction 2: AGI Competition as a Long-term Battle** - The race for AGI resembles a marathon, requiring extensive data collection and long-term investment [21] - Companies like Google and ByteDance are positioned as strong contenders due to their cash flow and talent density [23] Business Model Considerations - The market is questioning the sustainability of AI investments, particularly regarding OpenAI's projected $1.4 trillion financial obligations [24] - OpenAI's revenue potential is estimated to be between $200-300 billion, which may not cover its capital expenditures [25] Key Investment Strategies - The ideal AGI investment strategy involves betting on the most promising model companies, necessary computing infrastructure, and the benefits of leading model technologies [32] - A recommended AGI basket includes OpenAI, ByteDance, Google, Anthropic, NVIDIA, and TSMC [32] Emerging Trends - **Trend 1: Models as Products** - The concept of "models as products" highlights that significant product improvements often stem from advancements in underlying models [36] - **Trend 2: Voice Agents as New OS Interfaces** - Voice agents are evolving into a new operating system layer, with a shift towards real-time speech-to-speech solutions [53] - **Trend 3: LLM Cost Deflation** - The cost of LLM inference is rapidly decreasing, with a reported 1000-fold reduction since GPT-3's launch [60] Competitive Dynamics - The release of Gemini 3 has altered the competitive landscape, leading to a decline in ChatGPT's user engagement, although ChatGPT maintains higher user retention and engagement metrics [62][63]