AI agents

Search documents
Agentic Excellence: Mastering AI Agent Evals w/ Azure AI Evaluation SDK — Cedric Vidal, Microsoft
AI Engineer· 2025-06-27 10:04
AI Agent Evaluation - Azure AI Evaluation SDK is designed to rigorously assess agentic applications, focusing on capabilities, contextual understanding, and accuracy [1] - The SDK enables the creation of evaluations using structured test plans, scenarios, and advanced analytics to identify strengths and weaknesses of AI agents [1] - Companies are leveraging the SDK to enhance agent trustworthiness, reliability, and performance in conversational agents, data-driven decision-makers, and autonomous workflow orchestrators [1] Microsoft's AI Initiatives - Microsoft is promoting AI in startups and facilitating the transition of research and startup products to the market [1] - Cedric Vidal, Principal AI Advocate at Microsoft, specializes in Generative AI and the startup and research ecosystems [1] Industry Expertise - Cedric Vidal has experience as an Engineering Manager in the AI data labeling space for the self-driving industry and as CTO of a Fintech AI SAAS startup [1] - He also has 10 years of experience as a software engineering services consultant for major Fintech enterprises [1]
Architecting Agent Memory: Principles, Patterns, and Best Practices — Richmond Alake, MongoDB
AI Engineer· 2025-06-27 09:56
AI Agents and Memory - The presentation focuses on the importance of memory in AI agents, emphasizing that memory is crucial for making agents reflective, interactive, proactive, reactive, and autonomous [6] - The discussion highlights different forms of memory, including short-term, long-term, conversational entity memory, knowledge data store, cache, and working memory [8] - The industry is moving towards AI agents and agentic systems, with a focus on building believable, capable, and reliable agents [1, 21] MongoDB's Role in AI Memory - MongoDB is positioned as a memory provider for agentic systems, offering features needed to turn data into memory and enhance agent capabilities [20, 21, 31] - MongoDB's flexible document data model and retrieval capabilities (graph, vector, text, geospatial query) are highlighted as key advantages for AI memory management [25] - MongoDB acquired Voyage AI to improve AI systems by reducing hallucination through better embedding models and re-rankers [32, 33] - Voyage AI's embedding models and re-rankers will be integrated into MongoDB Atlas to simplify data chunking and retrieval strategies [34] Memory Management and Implementation - Memory management involves generation, storage, retrieval, integration, updating, and forgetting mechanisms [16, 17] - Retrieval Augmented Generation (RAG) is discussed, with MongoDB providing retrieval mechanisms beyond just vector search [18] - The presentation introduces "Memoriz," an open-source library with design patterns for various memory types in AI agents [21, 22, 30] - Different memory types are explored, including persona memory, toolbox memory, conversation memory, workflow memory, episodic memory, long-term memory, and entity memory [23, 25, 26, 27, 29, 30]
What does Enterprise Ready MCP mean? — Tobin South, WorkOS
AI Engineer· 2025-06-27 09:31
MCP and AI Agent Development - MCP is presented as a way of interfacing between AI and external resources, enabling functionalities like database access and complex computations [3] - The industry is currently focused on building internal demos and connecting them to APIs, but needs to move towards robust authentication and authorization [9][10] - The industry needs to adapt existing tooling for MCP due to its dynamic client registration, which can flood developer dashboards [12] Enterprise Readiness and Security - Scaling MCP servers requires addressing free credit abuse, bot blocking, and robust access controls [12] - Selling MCP solutions to enterprises necessitates SSO, lifecycle management, provisioning, fine-grained access controls, audit logs, and data loss prevention [12] - Regulations like GDPR impose specific logging requirements for AI workloads, which are not widely supported [12] Challenges and Future Development - Passing scope and access control between different AI workloads remains a significant challenge [13] - The MCP spec is actively developing, with features like elicitation (AI asking humans for input) still unstable [13] - Cloud vendors are solving cloud hosting, but authorization and access control are the hardest parts of enterprise deployment [13]
Fault lines beneath roaring AI trade
CNBC Television· 2025-06-26 18:44
long sleeves and the collars. Stay stiff all day. Get 20% off using code TV at collars and co.com. >> Tech stocks are still ripping higher today. Nvidia, Microsoft, Broadcom.These are all fresh record highs. But Deirdre Bosa has a new deep dive on an overlooked problem with the latest wave of AI models, which has powered much of the recent rally. Deirdre what can you tell us.So Kelly the next frontier in AI. It's not just chatting or summarizing, it's reasoning. And that means thinking through problems, mak ...
X @Avi Chawla
Avi Chawla· 2025-06-26 06:49
Links:- ML for Beginners: https://t.co/4BjD3ePOET- AI for Beginners: https://t.co/RMGBL5sRfe- NN Zero to Hero: https://t.co/BGKZvCTGeN- Paper implementations: https://t.co/SN0DH2BLQq- Made with ML: https://t.co/2xrM6s50X0- Hands-on LLMs: https://t.co/KTZUVbsAFY- Advanced RAG techniques: https://t.co/3n1fgpc72t- Agents for Beginners: https://t.co/O52uS8quyh- Agents towards production: https://t.co/3n1fgpc72t- AI Engg. Hub: https://t.co/b2WVNQqcBANote: This roadmap moves toward LLMs, NLP, and AI agents after ...
Rubrik acquires Predibase to accelerate adoption of AI agents
TechCrunch· 2025-06-25 17:34
Acquisition Announcement - Data cybersecurity company Rubrik announced its intent to acquire Predibase, a startup focused on training and fine-tuning open source AI models [1][2] - The deal's financial terms were not disclosed, but reports suggest it falls between $100 million and $500 million [1] Company Background - Predibase was founded in 2021 and has raised over $28 million in venture capital from notable investors such as Felicis, Greylock, and Sancus Ventures [2] - Rubrik, founded in 2014, has raised more than $1.6 billion in venture capital and went public in April 2024 [6] Strategic Implications - The integration of Predibase is expected to enhance Rubrik users' ability to build AI agents using platforms like Amazon Bedrock, Azure OpenAI, and Google Agentspace [2] - Bipul Sinha, CEO of Rubrik, emphasized that combining Predibase's capabilities with Rubrik's secure data platforms can transform AI applications by addressing performance and cost issues [3] Industry Trends - Rubrik's acquisition is part of a broader trend where companies are acquiring firms to strengthen their technology stack for AI agent development [3] - Other recent acquisitions in the industry include Salesforce acquiring Informatica for $8 billion and Snowflake acquiring Crunchy Data [4]
Andrej Karpathy on why we still need humans in the loop
Y Combinator· 2025-06-24 21:46
I think guies are very useful for auditing systems and visual representations in general. And I think guies for example are extremely important to this because a guey utilizes your computer vision GPU in all of our head. Reading text is effortful and it's not fun.But looking at stuff is fun and it's it's just a kind of like a highway to your brain. We have to keep the AI on the leash. I think a lot of people are getting way over excited with AI agents and uh it's not useful to me to get a diff of 1,000 line ...
Salesforce launches Agentforce 3 with AI agent observability and MCP support
VentureBeat· 2025-06-23 21:03
Core Insights - Salesforce has launched significant enhancements to its AI agent platform, Agentforce 3, aimed at addressing enterprise challenges in deploying digital workers at scale, particularly in monitoring performance and ensuring security across corporate systems [1][2][9] Group 1: AI Agent Performance and Demand - The introduction of a "Command Center" in Agentforce 3 provides executives with real-time visibility into AI agent performance and supports interoperability with numerous external business tools [2][9] - There has been a 233% increase in AI agent usage within six months, with over 8,000 customers adopting the technology, leading to measurable returns such as a 15% reduction in customer case handling time for Engine and a 70% autonomous resolution rate for 1-800Accountant during peak tax season [3][22] Group 2: Enterprise Integration and Transformation - PepsiCo is leveraging Agentforce as part of its AI-driven transformation strategy, recognizing the need for better integration of data and systems to meet evolving customer demands [5][6][8] - The deployment of AI agents is seen as essential for enhancing customer engagement and driving backend efficiency, with PepsiCo's long-standing partnership with Salesforce facilitating a swift transition to AI technologies [7][8] Group 3: Operational Challenges and Solutions - The new observability platform in Agentforce 3 addresses the operational challenges that arise post-deployment, providing analytics on agent interactions and health monitoring with real-time alerts [11][12] - The system captures all agent activity using the OpenTelemetry standard, allowing integration with existing monitoring tools and ensuring oversight of AI agents within operational workflows [13] Group 4: Interoperability and Security - Salesforce's adoption of the Model Context Protocol (MCP) enhances AI agent interoperability, enabling connections with MCP-compliant servers without custom development [14][16] - The platform's enhanced architecture offers 50% lower latency and improved security for regulated industries by hosting AI models within Salesforce's infrastructure, ensuring sensitive data remains secure [18][19] Group 5: Industry-Specific Deployments and Pricing - Salesforce has developed over 200 pre-configured industry actions to expedite AI agent deployment, with significant results reported by clients such as OpenTable and Grupo Falabella [21][22] - The company has introduced flexible pricing models, including unlimited usage licenses for employee-facing agents and per-action pricing that scales with actual AI work performed [22] Group 6: Future of Work and Competitive Landscape - The rise of AI agents is transforming enterprise operations, with new roles emerging for managing these digital employees, indicating a shift in how work is organized [23][24] - As competition intensifies among major technology firms, Salesforce emphasizes its integration advantages, allowing for comprehensive tracking of work cycles within the enterprise ecosystem [24]
Building Agents with Amazon Nova Act and MCP - Du'An Lightfoot, Amazon (Full Workshop)
AI Engineer· 2025-06-19 02:04
Workshop Overview - The workshop focuses on building AI agents using Amazon's agent technologies [1] - Participants will gain hands-on experience in building sophisticated AI agents [1] - The workshop is 2-hour long [1] Technologies Highlighted - Amazon Nova Act is used for reliable web navigation [1] - Model Context Protocol (MCP) connects agents to external data sources and APIs [1] - Amazon Bedrock Agents orchestrates complex workflows [1] Skills Acquired - Participants will learn to build agents that can navigate the web like humans [1] - Participants will learn to perform complex multi-step tasks [1] - Participants will learn to leverage specialized tools through natural language commands [1]
How 11x Rebuilt Their Alice Agent: From React to Multi-Agent with LangGraph| LangChain Interrupt
LangChain· 2025-06-16 16:36
Company Overview - 11X is building digital workers, including Alice, an AI SDR, and Julian, an AI voice agent [1] - The company relocated from London to San Francisco and rebuilt its core product, Alice, from the ground up [2] Alice Rebuild & Vision - The rebuild of Alice was driven by the belief that agents are the future [3] - The new vision for Alice centers on seven agentic capabilities, including chat-based interaction, knowledge base training via document uploads, AI-driven lead sourcing, deep lead research, personalized emails, automatic handling of inbound messages, and self-learning [11][12][13] Development Process & Tech Stack - The rebuild of Alice 2 took only 3 months from the first commit to migrating the last business customer [3][14] - The company chose a vanilla tech stack and leveraged vendors like Langchain to move quickly [15][16][17] - Langchain was chosen as a key partner due to its AI dev tools, agent framework, cloud hosting, observability, Typescript support, and customer support [18][19] Agent Architecture Evolution - The company experimented with three different architectures for campaign creation: React, workflow, and multi-agent systems [21] - The final architecture was a multi-agent system with a supervisor and specialized sub-agents for research, positioning, LinkedIn messaging, and email writing [44][45][46] Results & Future Plans - Alice 2 went live in January and has sourced close to 2 million leads and sent close to 3 million messages [52] - Alice 2 has generated about 21,000 replies, with a reply rate of around 2%, on par with a human SDR [52] - Future plans include integrating Alice and Julian, implementing self-learning, and exploring new technologies like computer use, memory, and reinforcement learning [53][54]