Workflow
Agent开发
icon
Search documents
Agent “案底”可追溯:前 GitHub CEO 再创业,把思考过程写进 Git
Sou Hu Cai Jing· 2026-02-11 08:48
作者|冬梅 在很长一段时间里,Thomas Dohmke 都被视为"最不像 CEO 的 CEO"。 他会在深夜亲自回复 GitHub Issues,会在发布会上公开演示自己写代码的过程,也会在 Copilot 最早的内测阶段,反复强调一句话:"如果这个东西不能改 变开发者每天的工作方式,那它就不值得存在。" 但去年 8 月,这位 GitHub 首席执行官正式离职。外界一度猜测,他是否会加入另一家大厂,或转向 AI 创业投资。但几个月后,他给出的答案更直接 ——重新创业。 今年 48 岁的 Thomas Dohmke 创办了一家名为 Entire 的新公司,这是一个面向"智能编码时代"的开源开发者平台。 他在 x 上宣布了这一消息。 Grab 首席产品官 Philipp Kandal 在 x 上发帖表示祝贺。 1Entire 是谁? 由于代理使用此数据库及其 API 端点的频率可能远远高于人类使用 Git 存储库的频率,因此团队还需要考虑性能。 Dohmke 还表示,与传统的集中式 Git 仓库不同,这种新型数据库可以构建成一个全球分布式的节点网络。 对于需要(或希望)确保数据主权的用户来说,这是一个重要的卖点 ...
Agent Skills 落地实战:拒绝“裸奔”,构建确定性与灵活性共存的混合架构
AI前线· 2026-01-24 05:33
Core Insights - The article discusses the challenges and solutions in developing an enterprise-level "intelligent document analysis agent" using a hybrid architecture that combines Java, DSL encapsulated skills, and real-time rendering to ensure stability and security while retaining the flexibility of LLMs [2][28]. Group 1: Background and Challenges - The initial implementation faced challenges when users requested complex tasks, such as comparing DAU and revenue growth rates and generating Excel and PDF reports [3]. - The "pure skills" approach, which allowed LLMs to write code independently, led to significant issues in production, including arithmetic precision, file generation, and handling unstructured data [4][5]. Group 2: Architectural Evolution - The new architecture reclaims the "low-level operational rights" from LLMs, allowing them only "logical scheduling rights" [7]. - The system is divided into four logical layers: ETL layer (Java) for data flow and security, Brain layer (LLM) for intent understanding and code assembly, Skills layer (Python Sandbox) for executing calculations, and Delivery layer (Java) for rendering outputs [8][10]. Group 3: Input and Output Management - The input side now relies on Java for downloading and parsing files, ensuring that the data fed to LLMs is clean, safe, and standardized [10]. - The output strategy separates rendering and delivery, where LLMs output high-quality Markdown, which is then converted to PDF/Word by the Java backend [16]. Group 4: Skills Implementation - The implementation of DSL skills restricts LLMs from performing low-level operations directly, instead providing a set of encapsulated functions for file generation [11][14]. - A decision tree guides the LLM on when to write code and when to output text, ensuring structured and standardized outputs [14]. Group 5: Key Takeaways - The hybrid architecture retains the agent's ability to handle complex dynamic requirements while ensuring enterprise-level stability and compliance [28]. - The article emphasizes the importance of not overestimating LLMs' coding capabilities and maintaining Java's deterministic strengths in parsing, downloading, and security checks [28].
中信证券:建议关注以多模态为代表的应用机会 同步关注模型发展带来的算力新需求
智通财经网· 2025-11-20 01:00
Core Insights - The release of Google’s Gemini 3 Pro model emphasizes significant advancements in multimodal understanding and logical reasoning capabilities, with a notable lead in multimodal performance, suggesting a need for ongoing attention to the developments in native multimodal technology and the new application opportunities arising from multimodal reasoning [1][8] Multimodal Performance - Gemini 3 Pro is positioned as the "world's best multimodal understanding model," showcasing superior performance in various multimodal understanding tests, achieving scores of 81.0% and 87.6% in the MMMU-Pro and Video-MMMU tests respectively, surpassing GPT-5.1's scores of 76.0% and 80.4% [2] - The model demonstrates a correct rate of 72.7% in the ScreenSpot-Pro test for GUI interaction, significantly outperforming Claude Sonnet 4.5's 36.2%, indicating new potential in desktop application development [2] Reasoning Capabilities - Gemini 3 Pro shows exceptional performance in mainstream reasoning tests, scoring 91.9% in the GPQA Diamond test, slightly ahead of GPT-5.1, and achieving a 37.5% correct rate in the HLE test, compared to GPT-5.1's 26.5% [3] - The introduction of a deep thinking mode enhances the model's performance, with a correct rate of 41% in the HLE test and 45.1% in the ARC-AGI-2 test, showcasing its potential to solve new problems [3] Agent Development - The model exhibits improved capabilities in tool invocation and long-text retrieval, with enhanced task planning abilities, allowing for efficient multi-step task completion [4] - Official demonstrations highlight the model's potential in various scenarios, such as compiling recipes from handwritten notes in cooking or analyzing sports performance [4] Coding and UI Development - While Gemini 3 Pro does not significantly outperform previous models in code generation, it emphasizes front-end development capabilities, achieving a score of 1487 in the WebDev Arena, surpassing GPT-5.1 and Claude 4.5 Sonnet [5] - The model's ability to transform user interfaces in real-time is expected to revolutionize human-computer interaction, providing more intuitive and personalized feedback experiences [5] Ecosystem Development - Google has launched a new agent development platform, Google Antigravity, which integrates models, code assistants, external tools, and a visual development environment, enhancing the agent development workflow [6] - The Gemini App serves as a unified entry point for consumers, with over 650 million monthly active users and more than 70% of Google Cloud users utilizing Google’s AI services [6]
Agent开发中的坑与解_殷杰 百度智能云高级产品经理
Sou Hu Cai Jing· 2025-10-14 03:57
Core Insights - The report discusses the challenges and solutions in the development of Agents, highlighting the contrast between ideal expectations and real-world difficulties [1][2]. Pre-Launch Phase - Common pitfalls include unclear goals, neglecting data tools, lack of valuable business scenarios, and insufficient ROI evaluation [9][10]. - Solutions involve focusing on small, pain-point-driven topics, ensuring data accessibility and quality, clarifying customer needs, and setting quantifiable ROI metrics [9][10][11]. Development Phase - Issues faced during development include model selection difficulties, improper usage, cost overruns, vague prompts, chaotic knowledge management, and weak security measures [2][20]. - Strategies to address these include utilizing platforms like Baidu's Qianfan for model selection, designing clear prompts akin to PRD writing, optimizing knowledge management, and establishing a robust security framework [2][20][26]. Post-Launch Phase - Common problems after launch include lack of monitoring alerts, inadequate scaling and disaster recovery mechanisms, and insufficient user feedback systems [2][20]. - Recommendations include identifying resource dependencies, configuring redundant capacities, establishing comprehensive logging and monitoring systems, and enhancing user feedback mechanisms for continuous optimization [2][20]. Overall Development Approach - The development of Agents should adhere to a multi-faceted principle, balancing key elements to ensure high availability and continuous improvement, ultimately creating intelligent agents that meet user needs [1][2].
MiniMax推出Agent全栈开发功能!一句话聊出演唱会选座系统,可锁座可支付
Sou Hu Cai Jing· 2025-07-16 16:35
Core Insights - MiniMax, one of the "Six Little Tigers" of domestic large models, has officially launched its Agent full-stack development feature, allowing users to build complete applications with a single click, extending beyond just front-end display [2][4] Group 1: Features and Capabilities - The developed systems support various technical features such as Supabase backend hosting, Stripe payment integration, cron job scheduling, and long link maintenance, enabling API calls, real-time data processing, payment functions, LLM integration, scheduled task execution, and user authentication [2] - Users can create a concert seat selection system in 30 minutes without coding, allowing for real-time seat locking, registration, email verification, login, and payment processing through Stripe [2] - Investors can customize real-time dashboards to track the latest prices and industry news of 50 global tech stocks, with updates scheduled at 07:00, 12:00, and 17:00 [2] - Personal entrepreneurs can create independent overseas sites, such as an e-commerce website for crystal bracelets, allowing real transactions and product management [2][4] Group 2: Modular Architecture and Development - MiniMax employs a modular Agent architecture consisting of three core sub-Agents: 1. Research Agent: Analyzes application requirements and generates complete technical plans, including API call specifications [4] 2. Full-Stack Development Agent: Generates robust code based on industry best practices for complete front-end and back-end functionality [4] 3. Testing Agent: Conducts interface-level testing and debugging for modular applications [4] Group 3: Recent Innovations and Market Impact - MiniMax has introduced the MCP builder feature, allowing users to develop any desired MCP with a single sentence, which can be reused within the MiniMax Agent or downloaded for flexible use [5] - The platform has released 12 feature updates in just over a month, emphasizing its commitment to innovation and the potential to lower the barriers for complex application development [5] - The Agent full-stack development feature demonstrates MiniMax's capability to quickly implement simple applications and meet personalized development needs [5]