Agents
Search documents
Andrej Karpathy devastates AI optimists...
Matthew Berman· 2025-10-20 21:22
AGI Timelines and Agent Development - Andre Karpathy 认为 AGI (Artificial General Intelligence,通用人工智能) 还需要 10 年以上的时间才能实现 [1] - 行业普遍认为 2025 年至 2035 年将是 Agent (代理) 的十年,但要使 Agent 真正可用并普及到整个经济领域,还需要大量的开发工作 [1] - 行业观察到 LLM (Large Language Model,大型语言模型) 在近年取得了巨大进展,但仍然存在大量的基础工作、集成工作、物理世界的传感器和执行器、社会工作、安全工作以及研究工作需要完成 [1] Learning Approaches and Model Capabilities - Karpathy 认为 LLM 的学习方式更像是“幽灵”,而不是动物,动物天生就具备大量通过进化预先设定的智能 [1][2] - 行业对强化学习 (RL) 的有效性表示怀疑,认为其每次计算所获得的学习信号较差,并倾向于 agentic 交互,即为 Agent 创建一个可以进行实验和学习的“游乐场” [2] - 行业正在探索系统提示学习 (System Prompt Learning),这是一种通过改变系统提示来影响模型行为的新学习范式,类似于人类做笔记 [2][3] Model Size and Memorization - 行业趋势是模型尺寸先增大后减小,认知核心 (Cognitive Core) 的概念是剥离 LLM 的百科全书式知识,使其更擅长泛化 [3] - 行业对当前 Agent 行业提出了批评,认为其在工具方面过度投入,而忽略了当前的能力水平,并强调与 LLM 协作,结合人类的优势和 LLM 的长处 [3]
X @Avi Chawla
Avi Chawla· 2025-10-20 19:45
Core Problem & Solution - The open-source Parlant framework introduces a new reasoning approach to prevent hallucinations in LLMs [1] - This new approach achieves a SOTA success rate of 90.2% [2] - It outperforms popular techniques like Chain-of-Thought [2] Key Features of Parlant - Parlant enables the building of Agents that do not hallucinate and follow instructions [1]
Build Hour: Responses API
OpenAI· 2025-10-14 13:08
Responses API Overview - OpenAI introduced the Responses API to evolve beyond the Chat Completions API, addressing design limitations and enabling new functionalities for building agentic applications [1] - The Responses API combines the simplicity of chat completions with the ability to perform more agentic tasks, simplifying workflows like tool use, code execution, and state management [1] - The core of the Responses API is an agentic loop, allowing multiple actions within a single API request, unlike Chat Completions which only allows one model sample per request [2] - The Responses API uses "items" for everything, including messages, function calls, and MCP calls, making coding easier compared to Chat Completions where function calling was bolted onto messages [2] - The Responses API is purpose-built for reasoning models, preserving reasoning from request to request, boosting tool calling performance by 5% in primary tool calling eval tobench [2] - The Responses API facilitates multimodal workflows, making it easier to work with images and other multimodal content, including support for context stuffing with files like PDFs [2] - Streaming is rethought in the Responses API, emitting a finite number of strongly typed events, simplifying development compared to Chat Completions' object deltas [2] - Long multi-turn rollouts with the Responses API are 20% faster and less expensive due to the ability to rehydrate context from request to request, preserving the chain of thought [2] Agent Platform and Tools - OpenAI is changing deployment with its agent platform, centering on the Responses API and Agents SDK for building embeddable, customizable UIs [3] - Agent Builder and Chatkit, built on the Responses API, make it easy to build workflows into applications with minimal effort [3] - The Responses API is at the core of the improvement flywheel, enabling distillation and reinforcement fine-tuning using stateful data, along with tools like web search and file search [3]
X @Polygon
Polygon· 2025-10-10 15:25
Industry Trend - Agents are leading the next evolution in technology by thinking, paying, and settling onchain [1] - The industry is rebuilding the world's money rails [1] Partnerships and Standardization - The Agent Payment Protocol was developed with over 60 payment and tech partners [1] - Key partners include American Express, Coinbase, Intuit, Mastercard, PayPal, Salesforce & ServiceNow [1] - The protocol aims to standardize how agents handle financial transactions [1]
AgentKit Demo
OpenAI· 2025-10-08 17:00
Agent Kit Features & Benefits - Agent Kit helps developers create agents faster than ever before by visually wiring nodes instead of starting with code [1][3] - Agent Kit provides pre-built tools like file search, guard rails, and human-in-the-loop, enabling developers to model complex workflows easily [4] - Agent Kit allows for customization using widgets, enhancing the user experience beyond plain text [6][23] - Agent Kit includes pre-built guardrails for security, protecting against hallucinations and blocking PII [11][12] - Agent Kit facilitates testing and evaluation of agents directly from the builder to ensure expected behavior before deployment [16] Deployment & Integration - Agent Kit enables fully deployed and published agents in production with a workflow ID for direct execution [17] - Agent Kit offers code export for running agents in custom environments [17] - Agent Kit allows for easy integration into existing websites using components and visual customization [19][20] - Changes to agents can be deployed directly to the site without requiring code modifications [22] Use Case: Devday Website Agent - An agent powered by Agent Kit was built and deployed within the Devday website in under eight minutes to assist users in navigating the event [2][22]
X @Avi Chawla
Avi Chawla· 2025-10-02 06:31
Technology & Innovation - Airweave addresses the limitations of RAG (Retrieval-Augmented Generation) in handling real-time data [1] - Airweave constructs live, bi-temporal knowledge bases to ensure agents utilize the most up-to-date information [1] - The system supports comprehensive agentic retrieval, incorporating semantic and keyword search, query expansion, and integration across 30+ sources [1] - Airweave is 100% open-source [1]
Getting Started with LangSmith (3/8): Debugging with Studio
LangChain· 2025-09-29 04:28
Core Functionality of Langsmith Studio - Langsmith Studio is an IDE for building and debugging AI agents, compatible with any Langraph agent [1] - It provides a visual representation of the agent's structure and execution flow [5] - Users can interact with agents by sending messages and observing real-time execution [6][7] - Studio allows for debugging through trace view, showing the Langchain trace generated by each call [7] Debugging Capabilities - Studio facilitates debugging by allowing users to step through the execution path and identify issues in each step [10][11] - It supports hot reloading, enabling quick testing of changes made to the application [8][14] - Interrupts (breakpoints) can be set to inspect the state of the agent at specific points during execution [15] - Forking allows users to go back to previous steps, edit the state, and rerun the execution from that point with modified data [17][18] Practical Applications and Problem Solving - The platform can identify problems such as overly complex language in AI responses and address them by modifying prompts [9][10][12][13] - It helps diagnose issues related to flaky tools that intermittently fail to return results [3][15][16][19]
X @Avi Chawla
Avi Chawla· 2025-09-25 06:34
Building Agents is about engineering “behavior” at scale. So you cannot vibe-prompt an Agent and expect it to work.Parlant gives the structure to build Agents that behave exactly as instructed.GitHub repo: https://t.co/kjVj5Rp7Xm(don't forget to star it ⭐) ...
X @Ethereum
Ethereum· 2025-09-17 17:59
RT MetaMask Developer (@MetaMaskDev)We’re at the dawn of the era of agents.Adoption is early, curiosity is relentless, and the frontier is wide open.Across @Consensys, we’re committed to supporting and expanding x402 (proposed by @CoinbaseDev) and we’re working with @Google to create an open economy of agentic payments. 🧵 ...
X @s4mmy
s4mmy· 2025-09-06 17:13
RT s4mmy (@S4mmyEth)I've had several Prediction Market protocols ask why I don't post more about their industry.I have zero doubt that the market segment will do ok, but ultimately (imo) it doesn't provide the same value that I see being created with AI & Agents.Take BIO as a prime example: It aims to disrupt medicine by launching agentic systems for the top scientists.That's a lofty vision.Its not important to me whether they'll succeed, but the fact they are trying something innovative is what gets my att ...