Workflow
AI Engineer
icon
Search documents
Veo 3 for Developers - Paige Bailey
AI Engineer· 2025-06-17 18:35
Model Overview - Google DeepMind's Veo 3 is a state-of-the-art video generation model [1] - Veo 3 generates video with synchronized audio from text and image prompts [1] - Veo 3 understands intricate details, maintains coherence, and simulates realistic physics and camera movements [1] Capabilities & Features - Veo 3 offers advanced capabilities like semantic context rendering and cinematic control [1] - Veo 3 can generate dialogue, sound effects, and music [1] Accessibility & Integration - Veo 3 is accessible via Vertex AI (preview) [1] - Developers can integrate Veo 3 into their workflows or test it in the Gemini App, Flow, and via the Gemini APIs on Google Cloud [1] Potential Applications - Veo 3 empowers innovation in filmmaking, game development, and education [1]
Good design hasn’t changed with AI - John Pham
AI Engineer· 2025-06-17 04:15
Bad designs are still bad. AI doesn’t make it good. The novelty of AI makes the bad things tolerable, for a short time. Building great designs and experiences with AI have the same first principles pre-AI. When people use software, they want it to feel responsive, safe, accessible and delightful. We’ll go over the big and small details that goes into software that people want to use, not forced to use. About John Pham I'm John Pham, an engineer and a self-taught designer. I seek the dopamine hits of buildin ...
Building AI Products That Actually Work - Ben Hylak, Sid Bendre
AI Engineer· 2025-06-17 03:50
You've made the demo. How do you make the product? A lot of AI products don't actually work. Even worse, a lot of the techniques being advertised for making AI products better don't work either. We'll cover the challenges + techniques we've seen actually work in the real world. About Ben Hylak Ben Hylak is co-founder at Raindrop, building Sentry for AI products. He was previously a designer at Apple for 4 years, building the Apple Vision Pro. About Sid Bendre Sid Bendre is the co-founder of Oleve, a company ...
Model Maxxing: RFT, DPO, SFT with OpenAI — Ilan Bigio, OpenAI
AI Engineer· 2025-06-17 03:49
AI Model Fine-Tuning and Prompt Engineering - Workshop covers SFT, DPO, RFT, prompt engineering/optimization, and agent scaffolding [1] OpenAI Expertise - Ilan Bigio, a founding member of OpenAI's Developer Experience team, leads technical development for Swarm, the precursor to the Agents SDK [1] - Ilan Bigio contributed to Codex CLI and created the AI phone ordering demo showcased at DevDay 2024 [1] - Ilan Bigio partnered with companies like Cursor, Khan Academy, and Klarna to shape their AI products [1] AI Application and Development - Ilan Bigio created ShellAI, an open-source, AI-powered terminal assistant [1] - OpenAI provides in-depth technical guides on topics like Function Calling, Latency Optimization, and Agent Orchestration [1] Educational Background - Ilan Bigio designed and taught courses at Brown [1]
Safety and security for code executing agents - Fouad Matin
AI Engineer· 2025-06-17 00:09
Code is the lingua franca for both software engineers and highly capable AI models. As we give agents the ability to build, test, and run code that they generate, the command line becomes their canvas—and their attack surface. This keynote explores what it takes to bring code-executing agents from research to real-world deployment while maintaining control and security. We’ll cover how terminals offer AI an ideal interface, why they’re deceptively risky, and what it means to embed security, guardrails, and ...
How to Build Trustworthy AI — Allie Howe
AI Engineer· 2025-06-16 20:29
Core Concept - Trustworthy AI is defined as the combination of AI Security and AI Safety, crucial for AI systems [1] Key Strategies - Building trustworthy AI requires product and engineering teams to collaborate on AI that is aligned, explainable, and secure [1] - MLSecOps, AI Red Teaming, and AI Runtime Security are three focus areas that contribute to achieving both AI Security and AI Safety [1] Resources for Implementation - Modelscan (https://github.com/protectai/modelscan) is a resource for MLSecOps [1] - PyRIT (https://azure.github.io/PyRIT/) and Microsoft's AI Red Teaming Lessons eBook (https://ashy-coast-00aeb501e.6.azurestaticapps.net/MS_AIRT_Lessons_eBook.pdf) are resources for AI Red Teaming [1] - Pillar Security (https://www.pillar.security/solutionsai-detection) and Noma Security (https://noma.security/) offer resources for AI Runtime Security [1] Demonstrating Trust - Vanta (https://www.vanta.com/collection/trust/what-is-a-trust-center) provides resources for showcasing Trustworthy AI to customers and prospects [1]
Windsurf everywhere, doing everything, all at once - Kevin Hou, Windsurf
AI Engineer· 2025-06-16 19:59
Core Functionality & Vision - Windsurf is rapidly growing with key features like web search, MCP support, auto-generated memories, and parallel agents [1] - The company's core philosophy centers on creating a shared timeline between humans and AI for intuitive interaction [1] - The vision is for Windsurf to integrate with various developer tools such as Google Docs, Figma, GitHub, Notion, and Linear [1] - Windsurf aims to expand beyond coding to include tasks like interacting with third-party services and writing design documents [1] - The goal is to create a nearly autonomous AI that assists developers in the background [1] Model & Benchmarking - SWE-1 is introduced as a new software engineering model trained for entire workflows [1] - End-to-End Task Benchmark and Conversational SWE Task Benchmark showcase Windsurf's results [1] - A data flywheel drives Windsurf's continuous improvement through a feedback loop [1] Future Outlook - The industry emphasizes the harmony of model, data, and application for building successful AI products by 2025 [1]
Claude Code & the evolution of agentic coding - Boris Cherny, Anthropic
AI Engineer· 2025-06-16 19:54
A ten thousand foot view of the coding space, the UX of coding, and the Claude Code team's approach About Boris Cherny Created Claude Code. Member of Technical Staff @Anthropic. Prev: Principal Engineer @Meta, Architect @Coatue. Author, OReilly's Programming TypeScript. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter ...
Exposing Agents as MCP servers with mcp-agent: Sarmad Qadri
AI Engineer· 2025-06-11 16:57
My name is Sarmad and today I want to talk about building effective agents with model context protocol or MCP. So a lot has changed in the last year. Um especially as far as agent development is concerned.I think 2025 is the year of agents and uh things like MCP make agent design simpler and more robust than ever before. So I want to talk about what the agent tech stack looks like in 2025. The second thing is a lot of uh MCP servers today are just you know onetoone mappings of existing REST API uh uh servic ...
Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multi-Dimensional Analysis
AI Engineer· 2025-06-11 15:40
AI Safety Benchmarks Analysis - The paper analyzes AI safety benchmarks, a novel approach as no prior work has examined the semantic extent covered by these benchmarks [13] - The research identifies six primary harm categories within AI safety benchmarks, noting varying coverage and breadth across different benchmarks [54] - Semantic coverage gaps exist across recent benchmarks and will evolve as the definition of harms changes [55] Methodology and Framework - The study introduces an optimized clustering configuration framework applicable to other benchmarks with similar topical use or LLM applications, demonstrating its scalability [55] - The framework involves appending benchmarks into a single dataset, cleaning data, and iteratively developing unsupervised learning clusters to identify harms [17][18] - The methodology uses embedding models (like MiniLM and MPNet), dimensionality reduction techniques (TSNE and UMAP), and distance metrics (Euclidean and Mahalanobis) to optimize cluster separation [18][19][41] Evaluation and Insights - Plotting semantic space offers a transparent evaluation approach, providing more actionable insights compared to traditional ROUGE and BLEU scores [56] - The research highlights the potential for bias in clusters and variations in prompt lengths across different benchmarks [51] - The study acknowledges limitations including methodological constraints, dimensionality reduction information loss, biases in embedding models, and the limited scope of analyzed benchmarks [51][52] Future Research Directions - Future research could explore harm benchmarks across diverse cultural contexts and investigate prompt-response relationships [53] - Applying the methodology to domain-specific datasets could further reveal differences and insights [53]