Workflow
AI Engineer
icon
Search documents
How to Build Trustworthy AI — Allie Howe
AI Engineer· 2025-06-16 20:29
Core Concept - Trustworthy AI is defined as the combination of AI Security and AI Safety, crucial for AI systems [1] Key Strategies - Building trustworthy AI requires product and engineering teams to collaborate on AI that is aligned, explainable, and secure [1] - MLSecOps, AI Red Teaming, and AI Runtime Security are three focus areas that contribute to achieving both AI Security and AI Safety [1] Resources for Implementation - Modelscan (https://github.com/protectai/modelscan) is a resource for MLSecOps [1] - PyRIT (https://azure.github.io/PyRIT/) and Microsoft's AI Red Teaming Lessons eBook (https://ashy-coast-00aeb501e.6.azurestaticapps.net/MS_AIRT_Lessons_eBook.pdf) are resources for AI Red Teaming [1] - Pillar Security (https://www.pillar.security/solutionsai-detection) and Noma Security (https://noma.security/) offer resources for AI Runtime Security [1] Demonstrating Trust - Vanta (https://www.vanta.com/collection/trust/what-is-a-trust-center) provides resources for showcasing Trustworthy AI to customers and prospects [1]
Windsurf everywhere, doing everything, all at once - Kevin Hou, Windsurf
AI Engineer· 2025-06-16 19:59
Core Functionality & Vision - Windsurf is rapidly growing with key features like web search, MCP support, auto-generated memories, and parallel agents [1] - The company's core philosophy centers on creating a shared timeline between humans and AI for intuitive interaction [1] - The vision is for Windsurf to integrate with various developer tools such as Google Docs, Figma, GitHub, Notion, and Linear [1] - Windsurf aims to expand beyond coding to include tasks like interacting with third-party services and writing design documents [1] - The goal is to create a nearly autonomous AI that assists developers in the background [1] Model & Benchmarking - SWE-1 is introduced as a new software engineering model trained for entire workflows [1] - End-to-End Task Benchmark and Conversational SWE Task Benchmark showcase Windsurf's results [1] - A data flywheel drives Windsurf's continuous improvement through a feedback loop [1] Future Outlook - The industry emphasizes the harmony of model, data, and application for building successful AI products by 2025 [1]
Claude Code & the evolution of agentic coding - Boris Cherny, Anthropic
AI Engineer· 2025-06-16 19:54
A ten thousand foot view of the coding space, the UX of coding, and the Claude Code team's approach About Boris Cherny Created Claude Code. Member of Technical Staff @Anthropic. Prev: Principal Engineer @Meta, Architect @Coatue. Author, OReilly's Programming TypeScript. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter ...
Exposing Agents as MCP servers with mcp-agent: Sarmad Qadri
AI Engineer· 2025-06-11 16:57
My name is Sarmad and today I want to talk about building effective agents with model context protocol or MCP. So a lot has changed in the last year. Um especially as far as agent development is concerned.I think 2025 is the year of agents and uh things like MCP make agent design simpler and more robust than ever before. So I want to talk about what the agent tech stack looks like in 2025. The second thing is a lot of uh MCP servers today are just you know onetoone mappings of existing REST API uh uh servic ...
Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multi-Dimensional Analysis
AI Engineer· 2025-06-11 15:40
AI Safety Benchmarks Analysis - The paper analyzes AI safety benchmarks, a novel approach as no prior work has examined the semantic extent covered by these benchmarks [13] - The research identifies six primary harm categories within AI safety benchmarks, noting varying coverage and breadth across different benchmarks [54] - Semantic coverage gaps exist across recent benchmarks and will evolve as the definition of harms changes [55] Methodology and Framework - The study introduces an optimized clustering configuration framework applicable to other benchmarks with similar topical use or LLM applications, demonstrating its scalability [55] - The framework involves appending benchmarks into a single dataset, cleaning data, and iteratively developing unsupervised learning clusters to identify harms [17][18] - The methodology uses embedding models (like MiniLM and MPNet), dimensionality reduction techniques (TSNE and UMAP), and distance metrics (Euclidean and Mahalanobis) to optimize cluster separation [18][19][41] Evaluation and Insights - Plotting semantic space offers a transparent evaluation approach, providing more actionable insights compared to traditional ROUGE and BLEU scores [56] - The research highlights the potential for bias in clusters and variations in prompt lengths across different benchmarks [51] - The study acknowledges limitations including methodological constraints, dimensionality reduction information loss, biases in embedding models, and the limited scope of analyzed benchmarks [51][52] Future Research Directions - Future research could explore harm benchmarks across diverse cultural contexts and investigate prompt-response relationships [53] - Applying the methodology to domain-specific datasets could further reveal differences and insights [53]
Just do it. (let your tools think for themselves) - Robert Chandler
AI Engineer· 2025-06-10 17:30
Hi, I'm Robert. I'm the co-founder and CTO at Wordware. And at Wordware, I've personally helped hundreds of teams build reliable AI agents.I'm here to share a few of the insights that we got, especially when it comes to tools. Um, really agentic MCPs, giving your tools time to think. Before I worked on uh LLMs and agents, I used to work on self-driving cars, and really, you know, building high reliable systems is in my blood.So, uh, yeah, here we go. The promise of agents are automated systems that can take ...
Supercharging developer workflow with Amazon Q Developer - Vikash Agrawal
AI Engineer· 2025-06-10 17:30
Amazon Q Developer Overview - Amazon Q Developer is an AI coding assistant designed to integrate into various stages of the Software Development Life Cycle (SDLC) [2][5] - It supports multiple IDEs (VS Code, IntelliJ, Eclipse), CLI, and GitHub, aiming for a seamless experience across different developer preferences [5][15][22] - Users can start using Amazon Q Developer in the command line or IDE without requiring an AWS account [5] Key Features and Use Cases - Amazon Q Developer can assist in the planning phase by providing best practices and suggesting robust, production-ready approaches [30] - It can generate code, unit tests, and documentation, streamlining the development process [16][19] - The tool facilitates debugging by analyzing logs and identifying root causes of errors within the AWS console [26] - Amazon Q Developer supports feature development through context-aware feature building and bug fixing [18] - It offers security scans and monitoring within GitHub, enabling code review and automated fixes in pull requests [23] SDLC Integration - Amazon Q Developer aids in deploying applications to cloud services by generating SAM scripts [25] - It helps maintain and modernize applications by providing insights into application performance and potential issues [4][26] Best Practices and Considerations - The industry emphasizes the importance of pre-planning, including infrastructure considerations, to avoid production issues [29][30] - Prompt engineering is highlighted as a crucial skill for developers to effectively utilize generative AI and obtain desired outputs [31]
Beyond Conversation: Why Documents Transform Natural Language into Code - Filip Kozera
AI Engineer· 2025-06-10 17:30
Hi, I'm Philip and I'm the CEO at Wordware. Today I want to talk to you about what sucks about chatbased interfaces, how documents can actually solve those issues and how do they lead to um background agents that do tasks for you in the or in the background. So firstly, let's start with what are the problems with chatbased systems.When I interact with um cla or open AI, it all seems very affirmal. um I end up often creating workflows for myself using projects or just copy pasting stuff and in that way when ...
Break It 'Til You Make It: Building the Self-Improving Stack for AI Agents - Aparna Dhinakaran
AI Engineer· 2025-06-10 17:30
Hey everyone, my name is Aperna. I'm one of the founders of Arise and today we're going to talk about agent evaluation. At Arise, we build development tools to help teams build agents and take them all the way to production.We focus on everything from evaluation to observability monitoring and in tracing your application so you can see every single step that your application took. Uh let me tell you a little bit about why we got into this and then I'll jump into uh some concrete tips about how we think abou ...
Why Bolt.new Won and Most DevTools AI Pivots Failed - Victoria Melnikova
AI Engineer· 2025-06-10 17:30
Challenges & Initial State - ST Blitz faced near failure due to practically zero revenue and flat growth after 7 years of development and millions in investment [1] - Investors initially doubted browser-based technology and web containers, viewing them as a dead end [3] Turnaround & Growth - ST Blitz experienced explosive growth, evidenced by 20 million errors in just two months [1] - 18 months later, the company anticipates surpassing 100 million AR (likely Annual Recurring Revenue) for product B [2] - The success was not overnight, but the result of 7 years of development [2] AI Integration Strategy - Simply adding a chatbot interface is often ineffective due to user fatigue and underwhelming results [4] - Attempting to AI-power everything leads to direct competition with industry giants like OpenAI and Anthropic, which is difficult for smaller teams with limited resources [5] - Waiting for perfect AI is not practical; companies must build tooling iteratively [6] - The biggest mistake is neglecting user feedback; companies should prioritize user interaction to avoid developing unnecessary features [6][7] Keys to Success - Identify the company's unique competitive advantage, not just product features, such as domain expertise, data distribution, or infrastructure [7][8] - Determine how AI can amplify this competitive advantage, focusing on what becomes possible when AI meets the company's unique capabilities [8][9] - Create a new category by reinventing the user flow, rather than fitting into existing workflows [10] - ST Blitz's secret was not unique AI (as everyone has access), but their unique competitive advantage [10] - Reinventing the user flow and betting everything on it was crucial for ST Blitz [11]