Workflow
Founder Park
icon
Search documents
来自 Manus 的一手分享:如何构建 AI Agent 的上下文工程?
Founder Park· 2025-07-18 18:51
Core Insights - The article emphasizes the importance of context engineering in building AI agents, highlighting that it allows for rapid improvements and adaptability in response to advancements in underlying models [3][33] - Manus has adopted a strategy focused on context engineering, which enables faster iterations and keeps their products aligned with the evolving capabilities of foundational models [3][33] Group 1: Context Engineering Principles - KV cache hit rate is identified as the most critical metric for production-level AI agents, significantly impacting latency and cost [6][7] - The article outlines several key practices to improve KV cache hit rates, including maintaining stable prompt prefixes and ensuring context remains additive rather than modifying previous actions or observations [10][11] - The use of a context-aware state machine to manage tool availability is recommended to prevent inefficient action selection as the action space grows [10][15] Group 2: Handling Context Limitations - The article discusses the challenges of context length in AI agents, noting that while modern LLMs support large context windows, practical limitations often arise [17][19] - Manus treats the file system as an ultimate context, allowing for unlimited capacity and persistent memory, which can be directly manipulated by agents [19][23] Group 3: Attention Management and Error Handling - A unique attention management strategy is employed by Manus, where a todo.md file is created and updated throughout task execution to keep the agent focused on its goals [24][27] - The article advocates for retaining erroneous actions in context to help the model learn from mistakes, thereby improving its adaptability and reducing the likelihood of repeating errors [28][31] Group 4: Avoiding Few-Shot Pitfalls - Few-shot prompting can lead to undesirable outcomes in agent systems, as models may overly rely on repetitive patterns from similar action-observation pairs [32] - Introducing controlled randomness in actions and observations is suggested to break fixed patterns and enhance model attention [32] Conclusion - Context engineering is presented as an emerging discipline essential for AI agent systems, influencing their speed, recovery capabilities, and scalability [33][34]
MiniMax 技术闭门会分享:长上下文是 Agent 的 Game Changer
Founder Park· 2025-07-18 18:24
Core Insights - The article discusses the advancements in Reinforcement Learning (RL) and its potential to enhance model capabilities, particularly in the context of limited context lengths and the importance of pre-training data diversity [6][8][10]. Group 1: RL and Model Capabilities - RL can indeed provide new capabilities to models, especially when dealing with limited context lengths, by altering the output distribution and reducing the number of tokens needed to solve specific problems [6]. - The pass@k metric is highlighted as a useful measure for evaluating model capabilities, with the definition of k being crucial depending on the problem context [7]. - Reward modeling remains a significant challenge in RL, particularly for non-outcome-based rewards, which complicates the training process [7]. Group 2: Pre-training and Data Distribution - Pre-training is essential for exposing models to diverse data distributions, which is currently more varied than the narrower distributions used in RL training [8]. - The article emphasizes that while RL can potentially fill gaps in pre-training, the quality and diversity of pre-training data are critical for effective model training [8]. Group 3: Long Context and Agent Workflows - Long context windows are identified as game-changers for agent workflows, allowing for the processing of extensive information in a single pass, which enhances output quality [15][16]. - The application of long context models is particularly beneficial in fields such as legal compliance analysis and customer research, where comprehensive data processing is required [17][18]. Group 4: Hybrid Architectures - Hybrid attention mechanisms are positioned as the future of model design, combining the strengths of linear and full attention models to improve efficiency and performance [19][20]. - The article notes that the effective deployment of hybrid architectures is currently limited by infrastructure challenges, despite their proven potential [20]. Group 5: Practical Applications and Challenges - The implementation of hybrid architectures in real-world applications is crucial, especially for handling large-scale requests efficiently [22]. - The article discusses the need for unified abstraction layers to optimize both traditional and hybrid architectures in inference engines [21]. Group 6: Future Directions - The exploration of latent reasoning and self-training models is highlighted as an exciting frontier in RL research, with implications for the development of more autonomous AI systems [13][14]. - The importance of evaluating model performance based on computational budgets rather than fixed output lengths is emphasized for a more accurate assessment of efficiency [24].
OpenAI核心研究员:比提示词工程更重要的,是spec-writing
Founder Park· 2025-07-18 11:37
程序员最有价值的技能已经不再是编写代码了,而是精确地向 AI 传达意图。 一份完善的规范才是包含完整意图的真正「源代码」。 这是 OpenAI 研究员 Sean Grove 在 AIEWF 2025 的演讲中提出的观点。前不久, Andrej Karpathy 也针对于提示词提出了他的观点 ,不同的是,Karpathy 聚焦如何给 AI「喂更多地料」,让 AI 更理解你的意图。Karpathy 认为,提供完整且恰当的上下文往往比编写好的提示词更重要。 Sean Grove 的视角则聚焦在如何形成一份完善、可执行的「规范」,以此精准地向 AI 传达意图。 在某种程度上,两者的观点都深刻地体现了一点:生成代码已经不是重点了,软件工程的本质是人与 AI 之间的「沟通」。 而且,这也可以看作是对 Jason Wei 提出的验证者规律 的回应,规范本身就是一种可验证的标准。 在演讲中,Sean Grove 从 AI 时代「新代码」的角度,分享了他对于软件工程的看法。Sean Grove 认为,提示词是规范,不应被用过一次后即被丢弃,捕 捉其中的意图和价值观非常重要, 最有价值的成果不是代码,而是源规范。 此外,Sean ...
4人团队,连做两款AI教育爆款,AI时代小团队创业取胜指南
Founder Park· 2025-07-18 11:37
Core Insights - The article discusses the emergence of small, efficient teams in the AI era, exemplified by Oleve, which has achieved significant revenue with a minimal workforce [3][7]. Group 1: Company Overview - Oleve is an AI startup with a team of only 4 people, generating an annual revenue of $6 million (approximately 43 million yuan) [3]. - The company has developed three products, including two educational applications: Quizard AI and Unstuck AI, with Quizard achieving a ranking of third in educational apps [3][9]. Group 2: Product Success - Quizard, launched in January 2023, quickly gained traction, achieving profitability within 9 months [9][17]. - The marketing strategy for Quizard included viral videos on TikTok, which garnered over 1 million views and converted into 10,000 users within 30 hours [13][15]. - Unstuck AI, the second product, reached 1 million users in just 2 months after its launch [19]. Group 3: Team and Growth Strategy - Oleve employs a "lean growth" strategy with six principles, including hiring multi-skilled individuals and prioritizing profitability [26][28]. - The team focuses on continuous process improvement and utilizes "super tools" to streamline operations and enhance flexibility [31][32]. - Oleve's approach to automation includes using AI to analyze social media trends and inform product decisions, aiming for a three-phase automation system [36]. Group 4: Future Outlook - The company is planning a third product that is not education-focused, which has already achieved profitability [35]. - Oleve's model showcases how small teams can leverage AI to create efficient workflows and innovative products, potentially leading to more "small and beautiful" AI startups in the education sector [36].
Kimi 员工复盘 K2:为什么聚焦 Agent、为什么开源,为什么选择 DSV3 架构?
Founder Park· 2025-07-18 09:39
Core Viewpoint - The article discusses the launch and features of the K2 model, highlighting its advancements in coding capabilities and its recognition in the AI community, particularly as an open-source flagship model [1][4][13]. Group 1: Model Performance and Features - K2 has become the top-ranked open-source model in the LMArena arena, showcasing its strong performance in coding capabilities [1][3]. - The model architecture includes a trillion-parameter MoE (Mixture of Experts) design, emphasizing its innovative approach to agent tool use and coding abilities [2][4]. - K2's coding capabilities have been acknowledged by various coding products integrating with it, indicating its effectiveness in practical applications [3]. Group 2: Development Insights - The development of K2 involved significant research into model structure and scaling experiments, leading to the decision to inherit the successful structure of the DSv3 model while optimizing parameters for cost efficiency [20][21]. - The team focused on maintaining training and inference costs comparable to DSv3, ensuring the model remains viable for a smaller company [20][21]. - The K2 model's design includes specific adjustments such as the number of experts and attention heads, aimed at improving performance while managing resource constraints [22][24][30]. Group 3: Open Source Strategy - The decision to open-source K2 is driven by the desire for greater visibility and community engagement, which can enhance the model's technical ecosystem [13][14]. - Open-sourcing allows for higher technical standards, compelling the company to produce better models and align more closely with the goal of achieving AGI (Artificial General Intelligence) [14][15]. - The article emphasizes that open-source models must demonstrate reproducibility and effectiveness, which can drive innovation and improvement in model development [15][13]. Group 4: Market Position and Competition - The article reflects on the competitive landscape, noting that many agent products rely heavily on foundational models like Claude, indicating the importance of strong underlying technology [16][19]. - Despite challenges in visibility and market presence, the company remains committed to focusing on core model development rather than diverting resources to less impactful areas [19]. - The success of competitors like DeepSeek is viewed positively, reinforcing the belief that strong model performance is the best form of promotion in the market [19].
OpenAI 发布 ChatGPT Agent:已向付费用户开放,与 Manus 相似
Founder Park· 2025-07-18 03:19
Core Viewpoint - The article emphasizes that the major theme of AI in 2025 is the emergence of "Agent" capabilities, transitioning from AI merely "talking" to actively "doing" tasks [1][31]. Group 1: Introduction of Agent Mode - OpenAI introduced the Agent mode, allowing users to directly request tasks from ChatGPT, such as purchasing items or generating presentations, with the AI autonomously executing these tasks in a virtual environment [2][5]. - The Agent mode can utilize three tools: text browser, visual browser, and terminal, enabling it to perform complex tasks efficiently [6][7]. Group 2: User Experience and Interaction - Users can interact with the Agent in real-time, providing confirmations and new requirements during task execution, enhancing the collaborative experience [5][12]. - The Agent's ability to autonomously switch between tools and execute tasks significantly improves efficiency compared to traditional methods [6][30]. Group 3: Integration of Previous Tools - The Agent mode is a combination of two previously launched tools, Operator and Deep Research, which were integrated to enhance user experience and task execution capabilities [15][17]. - This integration allows the Agent to perform tasks that require both browsing and in-depth research, streamlining the process of generating comprehensive reports [18][22]. Group 4: Performance Metrics and Comparisons - The Agent mode has shown significant improvements in performance metrics, achieving a score of 42% in the Humanities Last Exam, indicating a substantial enhancement in capabilities compared to previous models [22][30]. - While the Agent mode is still not on par with human performance in certain tasks, it demonstrates a notable advancement in web operation capabilities [30]. Group 5: Future Implications and Challenges - The rise of Agent capabilities raises questions about user trust and the extent of permissions granted to AI, as it begins to handle more complex real-world tasks [36][37]. - The article highlights the potential impact on the workforce, questioning whether AI will empower or threaten jobs as it takes on more responsibilities [37][38].
AI Video Is Eating The World,创作者、创业者的机会在哪?
Founder Park· 2025-07-17 11:25
Core Insights - AI video generation is transforming the short video creation ecosystem, leading to a new decentralized IP creation model that allows for low-cost, large-scale content production [2][7] - The emergence of AI-generated characters and content has the potential to create significant market value, with the first AI-native IP possibly being acquired by major platforms like Netflix [2][31] - The commercialization opportunities in AI video include creator monetization, platform support, and underlying model development, with a focus on balancing production costs and revenue generation [30][34] Group 1: AI Video Trends - AI video generation is rapidly evolving, with a significant increase in user engagement and content creation on platforms like TikTok and Instagram [8][7] - The formula for viral AI content combines familiarity with existing IP and novelty, capturing audience attention effectively [19][25] - The rise of decentralized characters, such as the "Italian brain rot" meme, showcases the potential for community-driven content creation [9][11] Group 2: Monetization Strategies - Various monetization strategies are emerging, including ad revenue from social platforms, merchandise sales, and subscription-based models [30][31] - High production costs remain a challenge, necessitating careful planning of monetization pathways to ensure a positive return on investment [32][30] - The potential for AI-generated content to serve as effective advertising tools is being recognized, with creators leveraging their viral content to attract business opportunities [30][31] Group 3: Content Creation Dynamics - The interaction between creators and AI tools is fostering a collaborative environment where ideas and techniques are shared, leading to innovative content [27][29] - The concept of "Prompt Theory" is evolving, exploring existential themes within AI-generated narratives, which adds depth to the content [43][44] - The ability to create relatable and engaging characters through AI is democratizing content creation, allowing diverse voices to emerge in the digital landscape [29][30] Group 4: Platform and Model Insights - The AI video ecosystem is characterized by a dual-layer structure, with application platforms simplifying model usage and core models providing the foundational technology [34][35] - The complexity of using certain models, such as Veo3, can deter creators, highlighting the need for user-friendly interfaces in the AI video space [36][35] - The ongoing trend of content arbitrage across platforms indicates that successful content can be repurposed for different audiences, reflecting the unique characteristics of each platform [50][51]
偷偷做一款 AI 硬件,在外滩大会惊艳所有人!
Founder Park· 2025-07-17 11:25
Core Insights - The article emphasizes that while embodied intelligence may take time to develop, the integration of AI with hardware is timely, with products like LOOI, Oura Ring, and PLAUD gaining market traction [1] - There is a call to action for an AI + hardware development competition aimed at discovering practical AI hardware solutions that address real user problems [2] Group 1: AI + Hardware Development Competition - The competition seeks innovative AI hardware products that can solve real-life pain points through the integration of AI and hardware [3][5] - The focus is on devices that incorporate AI capabilities such as perception, recognition, prediction, learning, and decision-making, enhancing user interaction in everyday scenarios [7] - The competition encourages participation from individuals or teams, with a preference for team entries not exceeding ten members [12] Group 2: Competition Rewards and Schedule - A total prize pool of 285,000 yuan is available, with various awards for top entries, including 100,000 yuan for the first prize and additional prizes for runners-up [11] - The competition timeline includes registration, initial proposal submissions, product development phases, and final presentations scheduled for September 11 [11][12] - Participants are encouraged to explore innovative concepts from scratch or to submit existing projects with preliminary validation [12]
黄仁勋交流会万字实录:谈中美芯片、H20、CUDA兼容,点赞DeepSeek、Qwen和Kimi
Founder Park· 2025-07-17 07:56
Core Viewpoints - Huang Renxun, CEO of Nvidia, emphasized the advanced and complex nature of China's supply chain, highlighting its infrastructure, ecosystem, technology, and manufacturing scale as impressive [1][5] - He noted the global interdependence of supply chains, with many multinational companies participating in the event, reflecting the interconnectedness of global technology ecosystems [1][5] - Huang expressed openness to compatibility with CUDA, stating that CUDA is inherently open and that he would not mind if Chinese companies developed compatible products [1][27] Group 1 - China's supply chain is a crucial foundation for global AI hardware and smart factory development, showcasing its manufacturing advantages [3] - The H20 chip has been reapproved for sale, which is expected to drive more Blackwell architecture products into China [3] - Huang praised Chinese electric vehicle companies like Xiaomi, NIO, and Xpeng, calling them a global surprise and noting their role in reshaping the competitive landscape [3][30] Group 2 - Huang highlighted the significant breakthroughs made by Huawei in chip technology, network solutions, and photonics, considering it a model to learn from [3] - He acknowledged that China's education system has produced a large number of top AI researchers, with about 50% of the world's AI researchers based in China [3] - Huang encouraged young people to maintain their passion for technology and emphasized that AI is still rapidly evolving [3] Group 3 - Huang discussed the importance of continuous investment to maintain growth in a fast-developing market, emphasizing that competitors are not standing still [7] - He mentioned that Nvidia's supply chain cycle currently takes nine months, indicating the time required from wafer ordering to AI supercomputer delivery [8] - Huang expressed optimism about the future of the Chinese market, viewing it as a vital and rapidly growing technology market [25] Group 4 - Huang noted that the U.S. government has encouraged his visits to China and expressed pride in Nvidia's market value surpassing $4 trillion [10] - He acknowledged the impact of U.S. semiconductor restrictions but indicated that most inventory could be restored through new customer demand [11] - Huang emphasized the need for companies to adapt to changing trade and tax policies, highlighting Nvidia's ability to adjust its supply chain accordingly [11] Group 5 - Huang described AI as a foundational technology that will become essential across all industries and countries, indicating that the development of AI infrastructure is still in its early stages [42] - He pointed out that the integration of AI and robotics will significantly reduce production costs and enhance overall efficiency [40] - Huang expressed confidence in the future of the robotics industry in China, citing the country's unique advantages in AI technology and manufacturing capabilities [45]
o1 关键人物 Jason Wei 回应「AI 下半场」:所有可验证的任务都会被 AI 解决
Founder Park· 2025-07-16 12:44
Meta 又从 OpenAI 挖到了大牛。 OpenAI 核心科学家、思维链提示词(CoT)核心作者、o1 关键人物 Jason Wei。 而在离开 OpenAI 之际,Jason Wei 连续更新了两篇博客,对于 RL 之后的发展提出了自己的想法—— 验证者定律 :训练 AI 解决某个任 务的容易程度,与该任务的可验证性成正比。所有既可能解决又容易验证的任务,都将被 AI 解决。 某种意义上来说,对于如何定义今天的 AI 能力,给出了自己的回答,也回应了之前 姚顺雨的 AI 下半场的讨论 。 简单来说,AI 的进步边界,首先受限于我们能否快速、客观地验证结果。未来 AI 会在那些「易验证」的领域持续突破,而「难验证」 的领域则进展缓慢。对于创业者来说,这也是选择合适的赛道的一个很好参考标准。 而第二篇文章,则是 RL 对于他人生的影响——想要超越学习的对象,就必须走出自己的路。 翻译版本转载自「腾讯科技」,Founder Park 有所调整。 Founder Park 联合外滩大会组委会、将门创投,征集能真正改变生活的 AI 硬件,寻找 AI 硬件的新可能。 通过例子理解验证的不对称性 如果你留心观察,会发 ...