上下文工程
Search documents
AI写70%,剩下30%难得要命?Google工程师直言:代码审查已成“最大瓶颈”
猿大侠· 2025-11-26 04:24
编译 | 郑丽媛 出品 | CSDN(ID:CSDNnews) 如果你最近在团队里感受到一种奇怪的现象—— 写代码的人越来越轻松,审代码的人越来越痛苦 ——那你并不是一个人。 AI 写代码的速度飙升,GitHub Copilot、Gemini、Claude 等工具让从业十几年的老工程师都不得 不承认:"生产力确实变强了。"但现实却没想象中那么爽:PR 数量暴增、改一个 Bug 带来三个新 Bug、 "看着能跑"的 代码实际上很多冗余,以及最后那 30% 的工程细节变成团队里最耗时的部分。 而承担这一切压力的,往往是 负责 Code Review 的资深工程师 。 近来,Google Chrome & Gemini 工程师 Addy Osmani 在一档播客中拆解了这种现象,而他的观 点让许多 开发者产生强 烈 共鸣:"AI 是在提升产能,但也把代码审查推成了新的瓶 颈点。" 据了解,Google DORA 的最新报告显示: 换句话说,大家都在用,但大家也都心虚 。 AI 帮写 70%,但剩下最难的 30% 砸在你头上 这与最近开发者调查的趋势相符: 使用率上涨,但信任度在下降。 你看到的是能跑的 Demo,资 ...
查资料、劝老板、写周报,给上班人准备的大模型评测
晚点LatePost· 2025-11-25 15:01
Core Insights - The article highlights the rapid growth in the usage of large model assistants in China, with over 100 million daily users, marking a 900% increase since April last year [3] - A comprehensive evaluation of 14 large models was conducted, focusing on their performance in everyday work-related tasks rather than programming or deep research [3][5] - The evaluation involved blind assessments of the models' responses to various prompts, revealing differences in their capabilities and user experiences [5][8] Model Performance Summary - The evaluation included models from companies like OpenAI, Anthropic, Google, and several Chinese firms, with most models priced around $20 per month [4] - ChatGPT received the highest scores in the blind assessments, followed by StepFun and SenseNova, while MiniMax Agent scored the lowest due to its simplistic approach [8][13] - The models were tested on their ability to handle complex tasks, such as role-playing and brainstorming, with varying degrees of success [6][7] User Interaction and Feedback - Users reported that while the models showed improvements in their capabilities, the practical experience did not always align with the benchmark scores advertised by the companies [3][5] - The models were assessed on their ability to provide coherent and contextually relevant responses, with some models struggling with longer contexts or complex queries [8][23] Long Text Processing and Document Handling - The models were tested on their ability to process long documents, with none achieving perfect results, indicating ongoing challenges in this area [23][25] - Gemini and Yuanbao performed relatively well in extracting participant information from a lengthy conference manual, but issues like hallucinations and incomplete data were noted [25][26] Search and Information Retrieval - The article discusses the models' capabilities in replacing traditional search engines, with some models successfully retrieving specific articles and documents, while others struggled [53][60] - ChatGPT and Kimi excelled in finding relevant content, while models like DeepSeek and Qwen failed to provide accurate links or information [69] Conclusion - The evaluation indicates that while large models have made significant strides in user engagement and task performance, there are still notable gaps in their practical application and reliability [3][5][23]
Elastic(ESTC) - 2026 Q2 - Earnings Call Transcript
2025-11-20 23:00
Financial Data and Key Metrics Changes - Total revenue for Q2 was $423 million, representing a growth of 16% year-over-year and 15% on a constant currency basis [21] - Sales-led subscription revenue was $349 million, growing 18% as reported and 17% in constant currency [21] - Current remaining performance obligation (CRPO) was approximately $971 million, growing 17% as reported and 15% in constant currency [22] - Subscription gross margins were 82%, total gross margins were 78%, and operating margin was 16.5% [24] - Adjusted free cash flow was approximately $26 million in Q2, representing a margin of 6% [25] Business Line Data and Key Metrics Changes - Strong execution in sales-led subscription revenue growth of 18%, with significant contributions from both Elastic Cloud and self-managed offerings [5][21] - Over 30 commitments greater than $1 million in annual contract value were secured, with five exceeding $10 million [23] - The number of customers spending over $100,000 annually increased to more than 1,600 [5] Market Data and Key Metrics Changes - The company saw a 13% increase in the number of customers with annual contract values over $100,000, indicating strong market demand [23] - 23% of customers in the greater than $100,000 cohort are utilizing Elastic for GenAI use cases, up from 17% a year ago [24] Company Strategy and Development Direction - The company is focusing on AI and platform consolidation as top priorities for enterprises, driving demand for its solutions [20] - The introduction of new products like Agent Builder aims to enhance user interaction with data and simplify the operational lifecycle of AI agents [13] - The acquisition of Jina AI is part of the strategy to enhance capabilities in multilingual and multimodal embedding and re-ranking models [16][17] Management's Comments on Operating Environment and Future Outlook - Management expressed confidence in the market opportunity, driven by robust growth and a unique platform built for the AI era [19] - The company raised its full fiscal year 2026 revenue guidance, expecting total revenue in the range of $1.715 billion to $1.721 billion, representing approximately 16% growth [28] - Management noted that the business is seeing strong commitments and consumption, with a healthy sales pipeline [20][29] Other Important Information - The company initiated a $500 million share repurchase program, returning approximately $114 million in cash to shareholders during Q2 [25] - The company will begin providing guidance for sales-led subscription revenue, a key metric for measuring success with larger strategic accounts [26] Q&A Session Summary Question: Performance of non-AI-native customers - Management noted strong consumption trends across all customer segments, including traditional businesses, driven by increased commitments and consolidation onto the platform [30][31] Question: Billings lagging behind other metrics - Management acknowledged variability in billings due to seasonality and a government shutdown impacting renewals, but emphasized strong commitments and consumption [32][33][34] Question: Composition of sales-led subscription revenue guidance - Management clarified that the guidance focuses on commitments from both cloud and self-managed environments, with expectations for flat growth in self-serve monthly cloud [36][37] Question: Growth contributions from GenAI revenue - Management emphasized that consumption growth is driven by new workloads and increased data, rather than pricing changes alone [38][39][40] Question: Go-to-market changes and productivity - Management confirmed that changes made to the sales strategy are yielding positive results, with strong execution and commitments [41][42] Question: Competitive landscape in security - Management highlighted that the company is displacing incumbents in the security space, leveraging its data platform and AI capabilities [53][54][55] Question: Observability and security as two sides of the same coin - Management reiterated the importance of data in both observability and security, emphasizing the company's advanced capabilities in these areas [57][58]
Which Attention is All You Need?
机器之心· 2025-11-09 01:30
Core Insights - The article discusses the ongoing innovations and challenges in the Attention mechanism within AI and Robotics, highlighting the need for breakthroughs in algorithm design to address computational complexities and enhance performance [5][7]. Group 1: Attention Mechanism Innovations - The industry is focusing on optimizing the Attention mechanism due to the computational complexity of O(N^2) associated with standard self-attention, which poses a fundamental obstacle for efficient long-sequence modeling [9]. - Two main paths for improving Attention have emerged: Linear Attention, which aims to reduce complexity to O(N), and Sparse Attention, which seeks to limit calculations to a subset of important tokens [10][13]. - Kimi Linear, a recent development, has shown significant improvements over traditional full attention methods, achieving up to 75% reduction in KV cache requirements and processing contexts of up to 1 million tokens six times faster than full attention [11][12]. Group 2: Linear Attention Approaches - Linear Attention can be categorized into three main types: Kernelized methods, forgetting mechanisms, and in-context learning, each aiming to optimize the attention process while maintaining performance [10][11]. - The Kimi Linear architecture, which incorporates a channel-wise gating mechanism, optimizes memory usage in RNNs and demonstrates superior performance across various scenarios [12]. - The design of Kimi Linear includes a hierarchical mixed architecture that combines linear and full attention layers, enhancing its efficiency and effectiveness [12]. Group 3: Sparse Attention Strategies - Sparse Attention focuses on pre-selecting a subset of important tokens for attention calculations, utilizing methods such as fixed patterns, block-sparse, and clustering approaches [13][14]. - DeepSeek's NSA and DSA represent significant advancements in Sparse Attention, with DSA employing a token-wise sparse strategy that dramatically reduces attention complexity while maintaining performance [16][17]. - In tests, DSA has achieved a reduction in attention complexity from O(L^2) to O(Lk), resulting in cost reductions of 60%-70% during both pre-filling and decoding phases [17].
一篇论文,读懂上下文工程的前世今生
3 6 Ke· 2025-11-07 07:11
Core Concept - The article discusses the emerging field of "context engineering," defined as the art and science of providing the right information to prepare for subsequent reasoning, as proposed by Shopify CEO Tobi Lütke and AI expert Andrej Karpathy [1][3]. Summary by Sections What is Context Engineering? - Context engineering addresses the cognitive gap between humans and machines, where human communication is high-entropy and often ambiguous, while machines require low-entropy, clear instructions [3][14]. - The essence of context engineering is to reduce entropy through richer and more effective context, enabling better machine understanding of human intent [3][4]. Evolution of Context Engineering - Context engineering has evolved from a focus on translation (1.0 era, 1990s-2020) to a focus on instruction (2.0 era, 2020-present), with the introduction of large language models allowing for more natural interactions [5][11]. - The transition from context engineering 1.0 to 2.0 reflects a shift in how users interact with machines, moving from structured programming languages to natural language prompts [12][13]. AI Communication Gaps - The article identifies four main deficiencies in AI that contribute to the communication gap: limited sensory perception, restricted understanding capabilities, lack of memory, and scattered attention [14][15]. - These deficiencies necessitate the development of context engineering to facilitate better communication and understanding between humans and AI [15][16]. Framework of Context Engineering - A comprehensive context engineering framework consists of three components: context collection, context management, and context usage [16][24]. - Context collection involves multi-modal and distributed methods to gather information beyond simple text inputs, addressing AI's sensory and memory limitations [18][20]. - Context management focuses on abstracting and structuring high-entropy information into low-entropy formats that AI can understand, enhancing its learning capabilities [23][24]. - Context usage aims to improve AI's attention mechanisms, ensuring relevant information is prioritized during interactions [25][26]. Future of Context Engineering - The article anticipates the evolution of context engineering into 3.0 and 4.0 stages, where AI will achieve human-level and eventually superhuman intelligence, leading to seamless communication without the need for explicit context [30][34]. - Ultimately, the goal of context engineering is to become an invisible infrastructure that enhances AI usability without being a focal point of discussion [35].
「上下文工程」 已经30岁了,而你可能刚知道它
量子位· 2025-11-02 04:23
Core Insights - The article discusses the evolution of Context Engineering, emphasizing its significance in bridging the cognitive gap between humans and machines [3][12][21] - It highlights the transition from Era 1.0, characterized by limited machine understanding, to Era 2.0, where machines can comprehend natural language and context [22][40] - The future of Context Engineering is envisioned as a collaborative relationship between humans and AI, where machines not only understand but also anticipate human needs [92][98] Summary by Sections Context Engineering Overview - Context Engineering is defined as a process of entropy reduction aimed at bridging the cognitive gap between humans and machines [21] - The concept has evolved over 30 years, with significant milestones marking its development [12][24] Historical Context - The origins of Context Engineering can be traced back to the 1990s, with foundational work by researchers like Bill Schilit and Anind Dey [8][39] - The first era (1990s-2020) was marked by machines operating as state machines, requiring explicit commands from users [27][31] Era 1.0: Sensor Era - In this era, machines struggled to understand human intent, leading to cumbersome interactions requiring multiple steps to perform simple tasks [30][31] - The introduction of sensors aimed to enhance machine awareness of user context, but limitations remained in machine understanding [32][34] Era 2.0: Intelligent Assistant Era - The release of GPT-3 in 2020 marked a significant shift, enabling machines to process natural language and engage in more intuitive interactions [41][43] - Key advancements included multi-modal perception, allowing machines to interpret images, voice, and documents [45][46] - The ability of machines to handle high-entropy inputs and provide proactive assistance represented a major leap forward [49][51] Future Directions: Era 3.0 and Beyond - Predictions for Era 3.0 suggest a seamless integration of context collection, management, and usage, leading to more fluid human-AI collaboration [68][81] - The potential for AI to surpass human capabilities in certain tasks raises questions about the future of Context Engineering and its implications for human identity [92][94] Actionable Insights - The article emphasizes the need for a systematic framework for Context Engineering, focusing on collection, management, and usage of context [61] - It calls for researchers and developers to explore the ethical implications and practical applications of advanced context management systems [101][102]
为什么95%的智能体都部署失败了?这个圆桌讨论出了一些常见陷阱
机器之心· 2025-10-28 09:37
Core Insights - 95% of AI agents fail when deployed in production environments due to immature foundational frameworks, context engineering, security, and memory design rather than the intelligence of the models themselves [1][3] - Successful AI deployments share a common trait: human-AI collaboration design, where AI acts as an assistant rather than a decision-maker [3][21] Context Engineering - Context engineering is not merely about prompt optimization; it involves building a semantic layer, metadata filtering, feature selection, and context observability [3][12] - A well-structured Retrieval-Augmented Generation (RAG) system is often sufficient, yet many existing systems are poorly designed, leading to common failure modes such as excessive indexing or insufficient signal support [8][9] Memory Design - Memory should be viewed as a design decision involving user experience, privacy, and system impact rather than just a feature [22][23] - Effective memory design includes user preferences, team-level queries, and organizational knowledge, ensuring that AI can provide personalized yet secure interactions [27][29] Trust and Governance - Trust issues are critical for AI systems, especially in sensitive areas like finance and healthcare; successful systems incorporate human oversight and governance frameworks [18][21] - Access control and context-specific responses are essential to prevent information leaks and ensure compliance [20][21] Multi-Model Inference and Orchestration - The emerging design pattern of model orchestration allows for efficient routing of tasks to appropriate models based on complexity and requirements, enhancing performance and cost-effectiveness [32][34] - Teams are increasingly using a decision-directed acyclic graph (DAG) approach to manage model interactions, ensuring that the system can adapt and optimize over time [34] User Experience and Interaction - Not all tasks require conversational interfaces; graphical user interfaces may be more efficient for certain applications [39][40] - The ideal use of natural language processing occurs when it lowers the learning curve for complex tools, such as business intelligence dashboards [40][41] Future Directions - Key areas for development include context observability, portable memory systems, domain-specific languages (DSL), and delay-aware user experiences [43][44][46] - The next competitive barriers in generative AI will stem from advancements in memory components, orchestration layers, and context observability tools [49][52]
微调已死!「共识机制」实现提示词自我进化,性能飙升
量子位· 2025-10-28 01:18
Core Viewpoint - The article discusses a paradigm shift in the artificial intelligence field from "model fine-tuning" to "context engineering," emphasizing the importance of using clearer instructions and richer knowledge in inputs to enhance AI system performance without high training costs or reliance on open-source model weights [1][2]. Group 1: Context Engineering - Context engineering is becoming the core paradigm for building high-performance, scalable, and self-improving AI systems [1]. - The shift towards context engineering is recognized as a significant trend, with the phrase "fine-tuning is dead" gaining traction in the AI community [2]. Group 2: Multi-Prompt Collaboration - Single prompts have limited expressive power and often fail to comprehensively articulate all requirements of complex tasks [4]. - Multi-prompt collaboration is a natural solution to address the limitations of single prompts, allowing for better handling of specific inputs [4][5]. Group 3: C-Evolve Algorithm - The C-Evolve algorithm, proposed by a team from West Lake University, utilizes a consensus mechanism to evolve a group of prompts rather than optimizing a single prompt [6]. - C-Evolve aims to extract consensus from multiple outputs to achieve optimal task performance, introducing a "consensus voting score" as an evolutionary metric [6][7]. Group 4: Evolutionary Process - The evolutionary process of C-Evolve consists of two phases: a preheating phase based on individual performance and a consensus evolution phase based on group collaboration [14][22]. - The preheating phase uses individual scores as fitness ratings, while the consensus phase evaluates groups based on their collective performance [16][22]. Group 5: Performance Improvement - C-Evolve has shown significant performance improvements across various tasks, including retrieval question answering, mathematical reasoning, and instruction compliance, applicable to both open-source and closed-source models [29][30]. - Experimental results indicate that C-Evolve outperforms previous methods, achieving notable gains in task performance metrics [30]. Group 6: Implications for AI Development - The consensus mechanism provides a new approach to prompt optimization, enhancing model adaptability in complex tasks and potentially unlocking greater capabilities of large language models [34]. - The article highlights the practical significance of designing better prompts to leverage the capabilities of established commercial LLMs like Claude and GPT [34].
长上下文窗口、Agent崛起,RAG已死?
机器之心· 2025-10-19 09:17
Core Viewpoint - The article discusses the evolving landscape of Retrieval-Augmented Generation (RAG) and its potential obsolescence due to advancements in context engineering and agent capabilities, suggesting that RAG is not dead but rather transforming into a more sophisticated retrieval paradigm [2][5][21]. Group 1: RAG's Evolution and Current Status - RAG has become a standard solution for addressing the limitations of LLM input lengths, acting as an external knowledge base since 2022 [3][4]. - The emergence of long context windows and agent capabilities is challenging RAG's traditional role, leading to debates about its relevance [5][6]. - RAG is evolving into "agentic retrieval," where AI agents play a central role in advanced retrieval systems, moving beyond basic block retrieval [8][21]. Group 2: Stages of RAG Development - The first stage of RAG involves basic "Top-k" retrieval, where documents are split into chunks, and the most relevant chunks are retrieved based on user queries [10][11]. - The second stage introduces lightweight agents for automatic routing, allowing the system to intelligently select the appropriate retrieval method based on user queries [15]. - The third stage expands to composite retrieval APIs, enabling the system to handle multiple document formats efficiently [17][19]. Group 3: RAG's Future and Integration with Agents - The ultimate goal is to create a fully agent-driven knowledge system that can make intelligent decisions at every stage of the retrieval process [18][21]. - RAG is being redefined as a powerful component within an agent toolbox, rather than the default architecture for all applications [54]. - The future landscape will likely see a combination of various technologies tailored to specific application scenarios, emphasizing the importance of understanding the strengths and weaknesses of each paradigm [52][54].
腾讯研究院AI速递 20251017
腾讯研究院· 2025-10-16 23:06
Group 1: Google and AI Models - Google launched the video generation model Veo 3.1, emphasizing enhanced narrative and audio control features, integrating with Gemini API and Vertex AI [1] - The model supports 720p or 1080p resolution at 24fps, with a native duration of 4-8 seconds, extendable up to 148 seconds, capable of synthesizing multi-character scenes with audio-visual synchronization [1] - Users have generated over 275 million videos in Flow, but the quality improvement over Veo 3 is limited, with basic physics performance improved but issues in character performance and complex scheduling remaining [1] Group 2: Anthropic's Claude Haiku 4.5 - Anthropic released the lightweight model Claude Haiku 4.5, offering comparable encoding performance to Claude Sonnet 4 at one-third the cost (1 USD per million input tokens, 5 USD output) and more than doubling inference speed [2] - Scoring 50.7% on OSWorld benchmarks, it surpasses Sonnet 4's 42.2%, and achieves 96.3% in mathematical reasoning tests using Python tools, significantly higher than Sonnet 4's 70.5% [2] - The model targets real-time low-latency tasks like chat assistants and customer service, with a significantly lower incidence of biased behavior compared to other Claude models [2] Group 3: Alibaba's Qwen Chat Memory - Alibaba's Qwen officially launched the Chat Memory feature, allowing AI to record and understand important user information from past conversations, including preferences and task backgrounds [3] - This feature enables personalized recognition across multiple conversations, marking a significant step towards long-term companion AI, unlike short-term context-based memory [3] - Users can view, manage, and delete all memory content, retaining complete control, with the feature initially available on the web version of Qwen Chat [3] Group 4: ByteDance's Voice Models - ByteDance upgraded its Doubao voice synthesis model 2.0 and voice replication model 2.0, enhancing situational understanding and emotional control through Query-Response capabilities [4] - The voice synthesis model offers three modes: default, voice command, and context introduction, allowing control over emotional tone, dialect, speed, and pitch, with automatic context understanding [4] - The voice replication model can accurately reproduce voices of characters like Mickey Mouse and real individuals, achieving nearly 90% accuracy in formula reading tests, optimized for educational scenarios [4] Group 5: Google and Yale's Cancer Research - Google and Yale University jointly released a 27 billion parameter model, Cell2Sentence-Scale (C2S-Scale), based on the Gemma model, proposing a new hypothesis to enhance tumor recognition by the immune system [6] - The model simulated over 4,000 drugs through a dual-environment virtual screening process, identifying the CK2 inhibitor silmitasertib as significantly enhancing antigen presentation only in active immune signal environments, validated in vitro [6] - This research showcases the potential of AI models to generate original scientific hypotheses, potentially opening new avenues for cancer treatment, with the model and code available on Hugging Face and GitHub [6] Group 6: Anthropic's Pre-training Insights - Anthropic's pre-training team leader emphasized the importance of reducing loss functions in pre-training, exploring the balance between pre-training and post-training, and their complementary roles [7] - The current bottleneck in AI research is limited computational resources rather than algorithm breakthroughs, with challenges in effectively utilizing computing power and addressing engineering issues in scaling [7] - The core alignment issue involves ensuring models share human goals, with pre-training and post-training each having advantages, where post-training is suitable for rapid model adjustments [7] Group 7: LangChain and Manus Collaboration - LangChain's founder and Manus's co-founder discussed context engineering, highlighting performance degradation in AI agents executing complex long-term tasks due to context window expansion from numerous tool calls [8] - Effective context engineering involves techniques like offloading, streamlining, retrieval, isolation, and caching to optimally fill context windows, with Manus designing an automated process using multi-layer thresholds [8] - The core design philosophy is to avoid over-engineering context, with significant performance improvements stemming from simplified architecture and trust models, prioritizing context engineering over premature model specialization [8] Group 8: Google Cloud DORA 2025 Report - The Google Cloud DORA 2025 report revealed that 90% of developers use AI in their daily work, with a median usage time of 2 hours, accounting for a quarter of their workday, though only 24% express high trust in AI outputs [9] - AI acts as a magnifying glass rather than a one-way efficiency tool, enhancing efficiency in healthy collaborative cultures but exacerbating issues in problematic environments [9] - The report introduced seven typical team personas and the DORA AI capability model, including user orientation and data availability, which determine a team's evolution from legacy bottlenecks to harmonious efficiency [9] Group 9: NVIDIA's Investment Insights - Jensen Huang reflected on Sequoia's $1 million investment in NVIDIA in 1993, which grew to over $1 trillion in market value, achieving a 1 million times return, emphasizing the importance of first principles in future breakthroughs [10] - The creation of CUDA transformed GPUs from graphics devices to general-purpose acceleration platforms, with the 2012 AlexNet victory in the ImageNet competition marking a pivotal moment, leading to the development of the CUDNN library for faster model training [11] - The core of AI factories lies in system integration rather than chip performance, with future national AI strategies likely to combine imports and domestic construction, making sovereign AI a key aspect of national competition [11]