Composer模型
Search documents
拜拜了SWE-Bench!Cursor刚发了个AI Coding评测基准,难哭Claude
量子位· 2026-03-14 03:51
Core Insights - The article discusses the launch of CursorBench, a new benchmark specifically designed to evaluate the efficiency of AI programming assistants in executing complex tasks, distinguishing it from traditional benchmarks like SWE-Bench [1][11][6] Group 1: Benchmarking Differences - CursorBench focuses on the efficiency of problem-solving, while SWE-Bench measures whether a program can solve a problem, highlighting a significant difference in evaluation criteria [3][5] - Claude Haiku 4.5 and Claude Sonnet 4.5 performed poorly on CursorBench, with scores dropping from 73.3 to 29.4 and from 77.2 to 37.9 respectively, indicating a stark contrast in performance under the new benchmark [2][8] Group 2: Issues with Existing Benchmarks - Existing benchmarks face three main issues: unrealistic task types, unreasonable scoring mechanisms, and data pollution, which undermine their effectiveness in reflecting real-world programming scenarios [12][16][20] - Traditional benchmarks often assume a single correct answer for problems, which does not align with the reality of multiple valid solutions in programming [17][18] Group 3: CursorBench Evaluation Methodology - CursorBench employs a hybrid evaluation method combining online and offline assessments, where models complete a set of standardized tasks evaluated on correctness, code quality, efficiency, and interaction behavior [22][23] - The tasks used in CursorBench are derived from real developer requests and internal codebases, ensuring relevance and reducing the risk of models having seen the tasks during training [26][29] Group 4: Task Characteristics - CursorBench features larger task scales, with the complexity of tasks increasing significantly, as evidenced by a doubling in code lines and average file numbers from the initial version to CursorBench-3 [30][31] - The tasks are designed to maintain a level of ambiguity, reflecting real-world interactions where developers communicate with AI in less precise terms [34] Group 5: Performance and User Experience - The performance of models on CursorBench shows a clearer distinction among leading models, with results indicating that the benchmark aligns more closely with real user experiences [49][51] - Cursor plans to develop the next generation of assessment tools to adapt to the evolving landscape of AI programming assistants, focusing on longer-running intelligent agents [54]
没KPI反而爆了?Cursor大神一人敲出核心功能!CEO上手7天不宕机,AI编程玩法被打假
AI前线· 2026-01-17 06:25
Core Insights - Cursor has developed a browser based on GPT-5.2, which has run continuously for a week and contains over 3 million lines of code, featuring a rendering engine built from scratch in Rust [2][3] - The development of coding agents has evolved significantly over the past year, transitioning from simple code completion to more complex interactions and multi-file management [7][8] - The acceptance and trust in coding agents have increased among developers, leading to a shift in how they interact with coding tools [9][10] Development and Features - The browser's capabilities include HTML parsing, CSS cascading, layout, text formatting, and rendering, along with a customized JavaScript virtual machine [2] - The coding agent has been able to autonomously write over 1 million lines of code across 1,000 files during its testing phase [3] - The team is focusing on enhancing multi-agent collaboration, allowing agents to work concurrently while minimizing conflicts and redundancy [8][9] User Interaction and Experience - Developers are increasingly relying on agents for coding tasks, with some top engineers using multiple agents simultaneously for efficiency [11][12] - The introduction of a debugging mode allows agents to generate logs for self-evaluation, enhancing the debugging process [12][13] - The interaction model is evolving towards a more natural dialogue-like experience, reducing the need for manual operations [23][24] Future Directions - The company anticipates that the trust in agents will lead to longer operational periods and more complex task handling [18][19] - The design of the integrated development environment (IDE) is crucial for the software development lifecycle, facilitating seamless integration of various functions [19] - Future developments may include more intuitive interaction modes, allowing users to communicate with agents in a more conversational manner [23][24] Internal Processes and Feedback - The internal workflow emphasizes high-frequency feedback and collaboration among engineers, which accelerates product iteration [25][26] - The product roadmap is influenced by both internal needs and external user feedback, with a significant portion driven by the desire to improve team efficiency [26][27] - The company maintains a lean operational structure, allowing for rapid development and deployment of new features [27][28]
Cursor估值飙升至293亿美元,四位创始人身价均超13亿美元
3 6 Ke· 2025-11-14 09:41
Group 1: Company Overview - Anysphere, the developer of the AI code editor Cursor, announced a $2.3 billion funding round, raising its valuation to $29.3 billion [1] - Cursor's core product is an AI-driven code editor that supports models from various companies, enabling automatic code writing, file editing, and error fixing [2] - The company has achieved over $100 million in annualized revenue and serves around 50,000 engineering teams, including major clients like NVIDIA, Adobe, and Uber [3] Group 2: Funding and Investment - The recent funding round was led by existing investor Accel and new investor Coatue, with participation from Google and NVIDIA [1] - The funding will be used for technology development and to enhance the in-house model Composer, which aims to reduce reliance on third-party models [3] Group 3: Strategic Acquisitions - Cursor announced the acquisition of Growth by Design Talent (GBD), a talent strategy company, to strengthen its organizational capabilities [8] - GBD has a history of building world-class teams for tech companies and will help Cursor in its rapid growth phase [9] Group 4: Founders and Wealth Growth - The four co-founders of Cursor, all under 30 and graduates of MIT, have become billionaires following the company's valuation increase [12] - Each founder holds approximately 4.5% of the company, with their net worth exceeding $1.3 billion based on the latest valuation [12]