上下文工程

Search documents
Manus季逸超:构建Manus的经验教训 | Jinqiu Select
锦秋集· 2025-07-19 05:00
Core Viewpoint - The article discusses the choice between end-to-end training and context engineering in developing general AI agents, highlighting the latter as a more adaptable approach in a rapidly evolving landscape of large models [1][3]. Group 1: Context Engineering Insights - Manus AI's decision to adopt context engineering was influenced by past experiences where self-trained models quickly became obsolete after the release of GPT-3, emphasizing the need for flexibility in model development [4][5]. - The article outlines six core practices derived from Manus's experience, which significantly reduced product iteration cycles from weeks to hours, showcasing an effective technical path for startups [2][3]. Group 2: Key Practices for KV-Cache Optimization - The KV-cache hit rate is identified as the most critical metric for AI agents in production, directly affecting latency and cost, with a notable example showing a 10x cost difference between cached and uncached tokens [7][8]. - Strategies to enhance KV-cache hit rates include maintaining stable prompt prefixes, using only appended context, and employing file systems as external memory to overcome context limitations [8][19]. Group 3: Managing Tool Complexity - The article advises against dynamically adding or removing tools in the agent's action space, suggesting instead to manage tool availability through context-aware masking of token logits to maintain stability [12][13]. - This approach helps prevent confusion in the model when previous actions reference tools that are no longer defined, thereby reducing the risk of erroneous actions [12][17]. Group 4: Utilizing External Memory - Manus employs a file system as an externalized memory solution to address the limitations of context windows, allowing for persistent and unlimited storage that can be directly manipulated by the agent [18][22]. - This method mitigates the risks associated with irreversible context compression, ensuring that critical information is not lost [22]. Group 5: Attention Manipulation Techniques - The use of a todo.md file to continuously update task goals serves as a mechanism to keep the model focused on its objectives, preventing it from losing track during complex tasks [23][26]. - This technique helps maintain the model's attention on the task at hand, especially in lengthy interactions requiring multiple tool calls [26]. Group 6: Learning from Errors - Retaining failed attempts in the context is emphasized as a crucial learning mechanism, allowing the model to adapt and reduce the likelihood of repeating mistakes [30][31]. - The article argues that error recovery is a significant indicator of an agent's performance, yet it is often underrepresented in academic benchmarks [30]. Group 7: Avoiding Few-Shot Traps - The article warns against the pitfalls of few-shot learning in agent systems, where repetitive patterns in context can lead to suboptimal decision-making [32][34]. - Introducing structured variability in actions and observations can help break these patterns and enhance the model's adaptability [34]. Conclusion - Context engineering is presented as an essential and emerging science for agent systems, with the design of context playing a pivotal role in defining agent behavior, speed, recovery, and scalability [35].
Manus「删博跑路」后,创始人首次深度复盘:公开产品细节,总结教训
3 6 Ke· 2025-07-19 01:15
Core Insights - Manus AI has abruptly withdrawn from the Chinese market, clearing all social media content and seemingly pausing the development of its Chinese version, following the relocation of its global headquarters to Singapore [1] - The co-founder of Manus AI, Ji Yichao, published a technical blog to refocus attention on the product's technology amidst the controversy, sharing valuable lessons learned during the development of Manus [3][9] Group 1: Company Developments - Manus AI has moved its global headquarters to Singapore and has offices in Tokyo and California, indicating a strategic shift in its operational focus [1] - The company has faced scrutiny and speculation regarding potential layoffs and whether it is abandoning the Chinese market [1] Group 2: Technical Insights from the Blog - The blog emphasizes the importance of context engineering over traditional model training, allowing for quicker product updates [6][10] - Key practices for improving KV-cache hit rates are outlined, including maintaining stable prompts, appending context only, and marking cache breakpoints [12][16][17] - The use of a file system for persistent context is recommended to manage the limitations of context windows in modern AI models [25][30] - The blog discusses the significance of maintaining attention through continuous updates to a todo list, which helps keep the model focused on its goals [31][34] - It highlights the importance of retaining error logs to improve model behavior and reduce the likelihood of repeating mistakes [35][38] - The introduction of structured variations in actions and observations is suggested to prevent the model from falling into repetitive patterns [39][41] Group 3: Future Implications - The article concludes that context engineering is essential for the future of agent systems, as it defines the behavior, speed, recovery, and scalability of AI agents [42]
来自 Manus 的一手分享:如何构建 AI Agent 的上下文工程?
Founder Park· 2025-07-18 18:51
Manus 官网昨天更新了一篇文章,分享了他们为 Manus 搭建合适的上下文工程的经验教训。 作者季逸超 (Peak),Manus 公司联合创始人、首席科学家。 文章基于 Kimi K2 翻译,我们进行了一些调整。 在 Manus 项目伊始,我和团队就面临一个关键抉择:是利用开源基础模型训练一个端到端的智能体,还是依托前沿模型的上下文学习能力,在其之上 构建智能体? 在我投身 NLP 的第一个十年里,我们并没有这种奢侈的选择。遥想当年 BERT 问世(没错,那已是七年前),模型必须先经过微调——还要评估—— 才能迁移到新任务。每次迭代往往耗时数周,尽管那时的模型体积与今日的 LLMs 相比微不足道。对于快速迭代的应用,尤其是 PMF 之前的阶段,如 此缓慢的反馈循环几乎是致命的。这是我上一家初创公司留下的惨痛教训:当时我从零开始训练模型,用于开放信息抽取和语义搜索。随后 GPT-3 与 Flan-T5 横空出世,我那些自研模型一夜之间便失去了意义。颇具讽刺意味的是,正是这些新模型开启了上下文学习的大门——也为我们指明了一条全 新的道路。 这个来之不易的教训让选择变得清晰:Manus 将押注于上下文工程。这让 ...
当 LLM 编程陷入“幻觉陷阱”,字节工程师如何用 ABCoder 精准控场
AI科技大本营· 2025-07-16 06:19
Core Insights - The article discusses the limitations of large language models (LLMs) in handling complex enterprise-level programming tasks, highlighting the "hallucination" problem where AI generates inaccurate or irrelevant code outputs [1] - A study by METR revealed that using AI programming assistants did not improve efficiency but instead increased development time by an average of 19%, due to high costs associated with reviewing and debugging AI-generated content [1] - ByteDance has introduced ABCoder, a tool designed to address these challenges by providing a clear and unambiguous code "worldview" through deep parsing of abstract syntax trees (AST), enhancing the model's contextual understanding [2] Group 1 - The hallucination problem in LLMs leads to inaccurate code generation, particularly in complex systems [1] - The METR study involved 16 experienced engineers completing 246 programming tasks, showing a 19% increase in development time when using AI tools [1] - ABCoder aims to improve the reliability of AI programming by enriching the model's context acquisition capabilities, thus reducing hallucinations and enabling more accurate code generation [2] Group 2 - ABCoder's implementation will be explained in a live session, showcasing its real-world applications in backend development [3] - The live session will feature a case study on the CloudWeGo project, demonstrating how ABCoder enhances code development efficiency and optimizes the programming experience [3] - ABCoder functions as a powerful toolbox for developers, offering tools for code understanding and conversion to tackle complex programming challenges [3]
DeepSeek流量暴跌,全球AI霸主地位遇滑铁卢;90后开发者6个月狂赚8000万;人形机器人A轮5亿融资|混沌AI一周焦点
混沌学园· 2025-07-11 07:55
Core Trends - The "Chaos AI Business Practical National Tour" has successfully commenced, aiming to ignite practical applications of AI across 20 innovative cities in China, with events already held in Changsha and Nanchang [1][2] - The AI application landscape is evolving with lower entry barriers due to open-source models and contextual engineering, enabling disruptive innovations that empower ordinary individuals [2] - AI penetration in vertical industries is increasing, particularly in pharmaceuticals, digital healthcare, and live service sectors, indicating potential transformative changes [2] AI Applications - Feishu has launched a comprehensive upgrade of its AI product matrix, including knowledge Q&A and AI meetings, along with the industry's first AI application maturity standard to facilitate enterprise AI adoption [3][4] - Google DeepMind's spinoff, Isomorphic Labs, is set to begin human trials for its AI-assisted cancer drug, marking a significant milestone in the pharmaceutical industry [12][13] Investment and Financing - Star Sea Map has raised over $100 million in its A4/A5 financing rounds, with a total pre-A and A round financing of nearly 1.5 billion yuan, reflecting strong capital interest in the embodied intelligence sector [6][7] - TARS, founded by former Huawei employees, completed a record $122 million angel round financing, showcasing investor confidence in embodied intelligence technologies [13] - Cloud Deep has secured nearly 500 million yuan in financing, positioning itself as a leader in the quadruped robot field with over 600 industry projects [14] - Star Motion Era has raised nearly 500 million yuan in its A round financing, emphasizing breakthroughs in humanoid robot technology and significant global demand [16] Business Cases - Wix's acquisition of AI startup Base44 for $80 million highlights the trend of AI enabling entrepreneurship, with Base44 allowing users to generate full-stack application code through natural language [7][8] - The AI personal finance assistant, Kapi Accounting, has gained over one million users in six months, indicating a shift in personal finance management through AI [21][22] Market Insights - The digital human market in China is projected to reach 30 billion yuan by 2025, with significant cost reductions in enterprise live streaming [19][20] - The rise of "contextual engineering" in Silicon Valley is reshaping AI model development, enhancing efficiency and application quality [18][20] Technology Developments - Baidu has open-sourced ten major models, significantly lowering the barriers for AI development and enhancing multi-modal capabilities [21] - The introduction of the Star Stream Agent, designed for Chinese designers, aims to revolutionize the design industry with automated processes and multi-modal content creation [24]
7月19日,相聚北京!一起聊聊ACL 2025爆点研究
机器之心· 2025-07-10 08:35
Core Insights - The AI field continues to be an exciting area in 2025, with numerous research releases from major tech companies and institutions [1] - The rapid pace of technological advancements in AI is overwhelming, with new models and paradigms emerging almost weekly [3][4] - Developers and researchers are increasingly engaging in conferences and academic sharing to stay updated on cutting-edge research [5] Event Overview - The ACL conference, a significant event in the NLP field, received over 8,000 submissions this year, marking a historical high [6] - The ACL 2025 conference will take place from July 27 to August 1 in Vienna, Austria, featuring various activities such as keynote speeches, paper presentations, roundtable discussions, and poster sessions [6][7] - The event aims to provide a platform for domestic AI talent, with a full schedule of presentations and discussions announced [6] Keynote Speakers and Topics - The keynote address on "Trends and Outlook for ACL 2025" will be delivered by Che Wanxiang, a prominent professor from Harbin Institute of Technology [9][17] - Liu Pengfei from Shanghai Jiao Tong University will present on "Reinforcement Learning and Complex Reasoning in Large Models" [11][19] Paper Presentations - Various papers will be presented, covering topics such as the intrinsic self-correction of large language models and the acceleration of inference in large language models [9][12] - The event will also feature poster sessions and opportunities for industry engagement [21]
苹果开发者自曝用Claude完成95%开发,开发应用已上架
量子位· 2025-07-07 09:35
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 苹果开发者自曝用AI开发应用程序, Claude含量95% ! 事情是这样的,一位苹果开发者最新发布了一款用于调试MCP服务器的原生macOS应用 Context —— 一款几乎完全由 Claude Code 构建的应用程序。 作者 indragiek 从2008年就开始为Mac开发软件。 这次,他的目标是使用Apple的SwiftUI框架,打造一款在macOS平台上使用起来很顺手且实用的开发者工具。 与以往不同的是,Claude Code承担了Context项目95%的工作量,indragiek声称: 在这个 20000行 代码的项目中,我亲手编写的代码估计 不到1000行 。 "工程师"Claude也是好起来了,能给苹果打工(doge)。 调侃归调侃,下面让我们来"学习"一下这位开发者是怎么用Claude的。 苹果开发者教你"驯服"Claude 作为一名经验丰富的工程师,Indragie像许多同行一样,拥有一个"烂尾项目"list。 尽管能够构建项目原型,但最后20%的交付工作往往耗费巨大时间和精力,导致项目搁置。 所以,他已经6年未能成功发布任何一个 ...
Karpathy最新脑洞「细菌编程」:优秀的代码应该具备细菌的三大特质
量子位· 2025-07-07 04:02
Core Viewpoint - The article discusses Andrej Karpathy's new concept of "Bacterial Code," which emphasizes small, modular, self-contained code blocks that are easy to copy and paste, inspired by the evolutionary strategies of bacteria [1][5][6]. Group 1: Concept of Bacterial Code - Bacterial Code has three main characteristics: small code blocks, modularity, and self-containment, allowing for easy replication [1][6][12]. - The idea is that open-source communities can thrive through "horizontal gene transfer," similar to how bacteria share genetic material [2][12]. - Karpathy's insights are derived from the survival strategies of bacteria, which have evolved to colonize diverse environments through efficient genetic coding [7][8]. Group 2: Principles of Bacterial Code - The first principle is "smallness," where each line of code consumes energy, leading to a natural self-optimization mechanism [8][11]. - The second principle is "modularity," where code should be organized into interchangeable modules, akin to bacterial operons, promoting high cohesion and low coupling [11][12]. - The third principle is "self-containment," meaning code snippets should be independent and not reliant on complex configurations or external libraries [13][14]. Group 3: Limitations and Future Directions - While Bacterial Code is effective for rapid prototyping, it is not suitable for building complex systems, which require more intricate structures like eukaryotic genomes [15][16]. - Karpathy suggests a hybrid approach, utilizing the strengths of both bacterial and eukaryotic coding strategies [16]. Group 4: Evolution of Software Development - Karpathy has previously introduced concepts like Software 3.0, which represents a shift towards programming with natural language models [18][25]. - He notes that software has undergone significant transformations in recent years, moving from traditional coding to model training and now to natural language programming [19][23][31]. - The future of software development will involve a collaboration between humans and large models, leading to semi-autonomous applications [28][30]. Group 5: Context Engineering - Context Engineering is highlighted as a crucial skill for effectively utilizing large language models (LLMs), requiring a balance of information to optimize performance [36][39]. - This discipline involves understanding the behavior of LLMs and integrating various elements like task descriptions and multimodal data [40][41].
腾讯研究院AI速递 20250707
腾讯研究院· 2025-07-06 14:05
Group 1 - Grok 4 achieved a score of 45% in the "Human Last Exam" (HLE), surpassing Gemini 2.5 Pro and Claude 4 Opus, sparking discussions [1] - Elon Musk stated that Grok 4 is built on "first principles" reasoning, analyzing problems from fundamental axioms [1] - Grok 4 is expected to enhance coding capabilities and may be released in two versions: Grok 4 and Grok 4 Code, anticipated after July 4 [1] Group 2 - Gemini CLI has been updated to support audio and video input, significantly expanding its multimodal interaction capabilities, although it currently only processes text, images, and PDF files [2] - The update enhances Markdown functionality, adds table rendering and file import features, and integrates VSCodium and Neovim editors to improve the development experience [2] - The technology stack has been upgraded to Ink 6 and React 19, introducing new themes, privacy management features, and optimizing historical record compression algorithms for better performance and stability [2] Group 3 - Kunlun Wanwei launched the new Skywork-Reward-V2 series reward model, refreshing the evaluation rankings of seven mainstream reward models, with parameter scales ranging from 600 million to 8 billion [3] - The model employs a "human-machine collaboration, two-stage iteration" data selection pipeline, filtering 26 million high-quality data samples from 40 million, achieving a balance between data quality and scale [3] - Smaller parameter models demonstrate "small but powerful" capabilities, with a 1.7 billion parameter model performing close to a 70 billion model, indicating that high-quality data can effectively offset parameter scale limitations [3] Group 4 - The German company TNG has open-sourced the DeepSeek-TNG-R1T2-Chimera model, developed based on three major DeepSeek models using an innovative AoE architecture [4] - The Chimera version improves inference efficiency by 200% compared to the R1-0528 version while significantly reducing inference costs, outperforming standard R1 models in multiple mainstream tests [5] - The AoE architecture utilizes MoE's fine-grained structure to construct specific capability sub-models from the parent model through linear time complexity, optimizing performance using weight interpolation and selective merging techniques [5] Group 5 - Shortcut has become the "first Excel Agent to surpass humans," capable of solving Excel World Championship problems in 10 minutes, ten times faster than humans with over 80% accuracy [6] - The tool offers near-perfect compatibility with Excel, handling complex financial modeling, data analysis, and visualization, even creating pixel art images [6] - Currently in early preview, users can log in with Google accounts for three free trial opportunities, though it has limitations in formatting capabilities, long dialogue performance, and handling complex data [6] Group 6 - Shanghai AI Lab, in collaboration with multiple organizations, launched the Sekai high-quality video dataset project, covering over 5,000 hours of first-person video from 750+ cities across 101 countries [7] - The dataset is divided into real-world Sekai-Real and virtual scene Sekai-Game parts, featuring multi-dimensional labels such as text descriptions, locations, and weather, with a curated 300-hour high-quality subset Sekai-Real-HQ [7] - An interactive video world exploration model, Yume, was trained based on the Sekai data, supporting mouse and keyboard control for video generation, aiding research in world generation, video understanding, and prediction [7] Group 7 - ChatGPT identified a long-standing medical issue as the MTHFR A1298C gene mutation, generating discussions on Reddit and being referred to as a "Go moment" in the medical field [8] - Microsoft's medical AI system MAI-DxO achieved an accuracy rate of 85% in diagnosing complex cases from NEJM, outperforming experienced doctors by more than four times at a lower cost [8] - Medical AI is evolving into a comprehensive solution from search to diagnosis, potentially transforming healthcare models and reducing ineffective medical expenditures [8] Group 8 - "Context Engineering" has gained popularity in Silicon Valley, supported by figures like Karpathy, and is seen as a key factor for the success of AI agents, replacing prompt engineering [9] - Unlike prompt engineering, which focuses on single texts, context engineering emphasizes providing LLMs with a complete system, including instructions, history, long-term memory, retrieval information, and available tools [9] - Context engineering is both a science and an art, focusing on providing appropriate information and tools for tasks, with many agent failures attributed to context rather than model issues, highlighting the importance of timely information delivery [9] Group 9 - Generative AI is reshaping market research, transitioning it from a lagging, one-time input to a continuous dynamic competitive advantage, with traditional research spending of $140 billion shifting towards AI software [10] - AI-native companies are utilizing "generative agent" technology to create "virtual societies," simulating real user behavior without recruiting real human samples, fundamentally reducing costs and enabling real-time research [10] - Successful market research AI does not require 100% accuracy; CMOs believe that 70% accuracy combined with faster speed and real-time updates offers more commercial value than traditional methods, emphasizing rapid market entry and deep integration over perfect accuracy [10] Group 10 - The core challenge of enterprise-level AI product entrepreneurship lies in transitioning from impressive demonstrations to practical products, addressing unpredictable user behavior and data chaos in real environments [11] - AI companies are growing at a rate far exceeding traditional SaaS firms, with top AI companies achieving annual growth rates exceeding ten times, driven by changes in enterprise purchasing behavior and AI's direct replacement of human budgets [11] - Establishing lasting competitive barriers is crucial, which can be achieved by becoming a source of data authority (SoR), creating workflow lock-in, deep vertical integration, and solidifying customer relationships [11]
Karpathy:我不是要造新词,是「上下文工程」对 Agent 来说太重要了
Founder Park· 2025-07-04 13:10
Core Viewpoint - The concept of "Context Engineering" has gained traction in the AI industry, emphasizing that the effectiveness of AI applications relies more on the quality of context provided than on the prompts used to query the AI [1][3]. Group 1: Definition and Importance of Context Engineering - Context Engineering is defined as the discipline of designing and constructing dynamic systems that provide appropriate information and tools to large language models (LLMs) at the right time and in the right format [19]. - The quality of context provided to an AI agent is crucial for its effectiveness, surpassing the complexity of the code or framework used [24]. - A well-constructed context can significantly enhance the performance of AI agents, as demonstrated by examples where rich context leads to more relevant and useful responses [25]. Group 2: Components of Context Engineering - Context Engineering encompasses various elements, including prompt engineering, current state or dialogue history, long-term memory, and retrieval-augmented generation (RAG) [15][11]. - The distinction between prompts, prompt engineering, and context engineering is clarified, with prompts being the immediate instructions given to the AI, while context engineering involves a broader system that dynamically generates context based on task requirements [15][19]. Group 3: Strategies for Implementing Context Engineering - Four common strategies for implementing Context Engineering are identified: writing context, selecting context, compressing context, and isolating context [26]. - Writing context involves saving information outside the context window to assist the agent in completing tasks, such as maintaining a calendar or email history [28][29]. - Selecting context refers to pulling necessary information into the context window to aid the agent, which can include filtering relevant memories or examples [36][38]. - Compressing context focuses on retaining only the essential tokens needed for task execution, often through summarization techniques [43][44]. - Isolating context involves distributing context across multiple agents or using environments to manage context effectively, enhancing task focus and reducing token consumption [47][50].