AI Engineer
Search documents
Benchmarks Are Memes: How What We Measure Shapes AI—and Us - Alex Duffy
AI Engineer· 2025-07-15 17:05
Benchmarks as Memes in AI - Benchmarks are presented as memes that shape AI development, influencing what models are trained and tested on [1][3][8] - The AI industry faces a problem of benchmark saturation, as models become too good at existing benchmarks, diminishing their value [5][6] - There's an opportunity for individuals to create new benchmarks that define what AI models should excel at, shaping the future of AI capabilities [7][13] The Lifecycle and Impact of Benchmarks - The typical benchmark lifecycle involves an idea spreading, becoming a meme, and eventually being saturated as models train on it [8] - Benchmarks can have unintended consequences, such as reinforcing biases if not designed thoughtfully, as seen with the Chat-GPT thumbs-up/thumbs-down benchmarking [14] - The industry should focus on creating benchmarks that empower people and promote agency, rather than treating them as mere data points [16] Qualities of Effective Benchmarks - Great benchmarks should be multifaceted, rewarding creativity, accessible to both small and large models, generative, evolutionary, and experiential [17][18][19] - The industry needs more "squishy," non-static benchmarks for areas like ethics, society, and art, requiring subject matter expertise [34][35] - Benchmarks can be used to build trust in AI by allowing people to define goals, provide feedback, and see AI improve, fostering a sense of importance and control [37] AI Diplomacy Benchmark - AI Diplomacy is presented as an example of a benchmark that mimics real-world situations, testing models' abilities to negotiate, form alliances, and betray each other [20][22][23] - The AI Diplomacy benchmark revealed interesting personality traits in different models, such as 03 being a schemer and Claude models being naively optimistic [24][25][30] - The AI Diplomacy benchmark highlighted the importance of social aspects and convincing others, with models like Llama performing well due to their social skills [31]
Small AI Teams with Huge Impact — Vikas Paruchuri, Datalab
AI Engineer· 2025-07-15 17:05
Company Growth & Strategy - Datal Lab achieved seven-figure ARR and trained state-of-the-art models with a team of three [1] - The company has grown in revenue 5x since January [2] - Customers include tier one AI labs, universities, Fortune 500 companies, and AI startups [3] - The company's philosophy is to hire less than 15 generalists and fill in the edges with AI and internal tooling [11][12] Team Building & Productivity - Headcount does not equal productivity [3] - The company aims to maintain a "golden period" of alignment and productivity indefinitely [10][11] - The company prioritizes hiring senior generalists with maturity and the ability to solve problems independently [21][22] - The company emphasizes in-person work for small teams to facilitate fast collaboration and tight feedback loops [23] Technology & Processes - The company reuses components aggressively and keeps technology simple, avoiding fancy front-end frameworks [23][24] - The company minimizes bureaucracy and emphasizes high trust and continuous discussions [25] - The company uses AI to automate low-leverage tasks, allowing the team to focus on higher-level work [20]
Tiny Teams — Grant Lee, Gamma
AI Engineer· 2025-07-15 17:04
Company Vision & Product - Gamma aims to revolutionize content creation and sharing, positioning itself as an alternative to traditional tools like PowerPoint [1] - The company focuses on a content-first approach, simplifying the design and formatting process [3] - Gamma's goal is to provide tools that foster imagination and facilitate the sharing of ideas [4][5] Team Structure & Management - Gamma emphasizes a flat organizational structure, moving away from traditional hierarchies [5][6][7] - The company promotes the "rise of the generalist," valuing employees with diverse skill sets and adaptability [8][10] - Gamma utilizes the "player coach" model, where leaders contribute to both management and hands-on work [8][16] Scaling & Culture - Gamma has reached over 50 million users with a team of 30 [7] - The company prioritizes brand and culture from the beginning, viewing them as interconnected [24] - Gamma invests in maintaining a strong company culture through a living culture deck and regular all-hands meetings [26][29] Hiring Practices - Gamma seeks individuals who are continuous learners and effective teachers [15][16] - The company assesses candidates for "high agency" by exploring their problem-solving approaches and depth of understanding [40][41] - Gamma utilizes work trials to ensure a good fit between the company and new hires [46]
Building a 10 person unicorn - Max Brodeur-Urbas, Gumloop
AI Engineer· 2025-07-15 17:03
Company Overview & Growth Strategy - Gum Loop, founded a year and a half ago, focuses on workflow automation and has scaled to nine people after raising a Series A as a team of two [1][9] - The company emphasizes product-led growth (PLG), relying on inbound interest rather than outbound sales, which contributes to rapid scaling [11][12] - Gum Loop's customers include large companies like Instacart and Shopify, with Shopify rolling out the product company-wide [10][11] Hiring & Team Culture - Gum Loop prioritizes hiring exceptional individuals and maintains a small team to enable faster movement and minimize meetings [9][10][16] - The company uses "work trials" to assess candidates, integrating them into the team for several days to evaluate fit [16][21] - Gum Loop fosters a culture of rapid iteration, challenging the team to ship features quickly, while also emphasizing fun and team-building activities like company retreats [31][32][33][34] Internal Operations & Automation - Gum Loop minimizes meetings to allow employees deep focus time for building product [22][23][24] - The company automates internal processes using its own product, Gum Loop, to improve efficiency [26][27][29] - Gum Loop uses AI chatbot data to inform product decisions, automating tasks that would otherwise consume significant employee time [29]
Using OSS models to build AI apps with millions of users — Hassan El Mghari
AI Engineer· 2025-07-15 17:02
AI App Building - Key Takeaways - The barrier to building AI apps has lowered dramatically due to AI tools and groundbreaking models [7][9] - Simplicity in app architecture is crucial for rapid development and validation, often involving a single API call [21][22] - UI design is paramount, consuming approximately 80% of development time, significantly impacting user adoption [35] - Incorporating the latest AI models can increase the potential for virality [38][39] - Launching early and iterating based on user feedback is essential for de-risking projects [40] Tech Stack & Resources - Together AI provides an inference API for querying open source models and offers dedicated instances for fine-tuning [5][6][22] - The presenter uses Nextjs and Typescript as the full-stack framework, Neon as the serverless Postgres host, and Clerk for authentication [22][23] - Open source projects can often secure sponsorships or free credits from AI companies and database providers [50][51] App Development Process - Ideation involves maintaining a running list of ideas and prioritizing the top five [26][27] - Naming should focus on short, memorable names with available domain names [28] - Design involves sketching or prototyping the app's workflow and user interface [29] - The initial build should focus on the simplest possible working version with minimal API endpoints [30]
Bolt.new: How we scaled $0-20m ARR in 60 days, with 15 people — Eric Simons, Bolt
AI Engineer· 2025-07-15 17:01
Company Growth & Strategy - The company's ARR (Annual Recurring Revenue) was $0.7 million over seven years before launching Bolt [4] - The company doubled its ARR after launching Bolt [7] - The company emphasizes a small team with more context per head to increase agency and speed [13][14] - The company believes in taking many shots on goal to find product market fit, similar to an enterprise sales pipeline [15][16] Team & Culture - The company values a shared set of core values: low ego, high trust, user obsession, grit, and resilience [19][20] - The company focuses on saving the right things and prioritizing high-impact areas, accepting that some fires will have to burn [22][24] - The company encourages independent thinking and avoiding the hive mind mentality [29][32] Customer Engagement & Support - The company runs weekly office hour sessions to engage with the community and show progress [33][34] - The company uses AI tools like Parel Help's SAM to automate 90% of support tickets [36][37] - The company emphasizes community building and creating spaces for users to learn from each other [38][39] - The company is running a Guinness World Record-breaking hackathon with 80,000+ participants for product testing and community support [39][40]
Survive the AI Knife Fight: Building Products That Win — Brian Balfour, Reforge
AI Engineer· 2025-07-14 18:59
Industry Landscape & Challenges - The tech industry is experiencing intense competition with rapid product launches and well-funded startups across various software categories [1] - Companies are collapsing in months rather than years due to the competitive pressure [1] - Foundational shifts in technology are happening on a monthly basis [4] - The key question for companies is "What do I build and why will it win?" [1] AI Product Strategy - Companies should treat AI like a series of Lego blocks, assembling differentiated AI features and products by integrating available AI capabilities with their data and functionality [12] - Competitive advantage comes from unique data, functionality, and understanding of unmet customer needs, not the AI itself [13] - Data provides context to the AI model to generate a unique output, with uniqueness stemming from real-time, user-specific, domain-specific, and human judgment data [16][17] - Functionality determines how the AI behaves and gives the AI product superpowers through specialized workflows, unique algorithms, business rules, and integrations [18] Granola Case Study - Granola, an AI notetaker, gained 40% market attention and $50 million in funding by focusing on helping users take better notes rather than replacing the entire note-taking process [21][22][24] - Granola assembled Lego blocks by using off-the-shelf AI capabilities (Deepgram for transcription, Anthropic and OpenAI for other functionalities) and combining user notes with transcriptions to enhance notes [25][26]
Automating Escrow with USDC and AI - Corey Cooper, Circle
AI Engineer· 2025-07-14 14:30
Circle & USDC Overview - Circle, a fintech company established in 2013, issues stablecoins and is backed by financial service industry pillars like BlackRock and Fidelity [6] - Circle's USDC and EURC are fully reserved 100% with fiat and short-term treasuries in a bank account, ensuring trust and transparency [7] - Since inception, Circle has settled over 26 trillion in transactions across roughly 20 different blockchains [8] - Circle acquired Hashnote, enabling liquidation from a money market into USDC 24/7, 365 days a week [9][10] USDC Programmability & Features - USDC is designed as an internet-native dollar, enhancing programmability and transferability for global transactions in seconds [16][17] - USDC smart contracts include features like allow lists and block lists to protect users from malicious actors [24][25] - The "spend on behalf" feature allows businesses to delegate spending with caps from a USDC wallet balance, scalable to tens of thousands of users [26][28] - USDC contract functions include balance of, total supply, allowance, transfer, transfer from, and approve, enabling innovative experiences [36][37] AI & USDC Integration - Combining USDC with AI enables verification of workflows for escrow agreements and instant settlement [3] - USDC is suitable for agents due to near-instant settlement, built-in verification, 24/7 availability, and programmability [39][43] - Circle's escrow agent application uses Circle Wallets and Circle Contracts API to provision wallets and deploy smart contracts [45] - The escrow process involves parsing agreement details using AI, creating a smart contract, depositing USDC, and verifying task completion with AI before releasing funds [47][57]
How LLMs work for Web Devs: GPT in 600 lines of Vanilla JS - Ishan Anand
AI Engineer· 2025-07-13 17:30
Core Technology & Architecture - The workshop focuses on a GPT-2 inference implementation in Vanilla JS, providing a foundation for understanding modern AI systems like ChatGPT, Claude, DeepSeek, and Llama [1] - It covers key concepts such as converting raw text into tokens, representing semantic meaning through vector embeddings, training neural networks through gradient descent, and generating text with sampling algorithms [1] Educational Focus & Target Audience - The workshop is designed for web developers entering the field of ML and AI, aiming to provide a "missing AI degree" in two hours [1] - Participants will gain an intuitive understanding of how Transformers work, applicable to LLM-powered projects [1] Speaker Expertise - Ishan Anand, an AI consultant and technology executive, specializes in Generative AI and LLMs, and created "Spreadsheets-are-all-you-need" [1] - He has a background as former CTO and co-founder of Layer0 (acquired by Edgio) and VP of Product Management for Edgio, with expertise in web performance, edge computing, and AI/ML [1]
[Workshop] AI Pipelines and Agents in Pure TypeScript with Mastra.ai — Nick Nisi, Zack Proser
AI Engineer· 2025-07-12 16:00
Overview - Mastra.ai is a TypeScript framework designed to streamline the development of agentic AI systems, offering an alternative to traditional approaches using LangChain and vector databases [1] - The workshop aims to equip participants with the skills to develop scalable AI-driven internal tools based on sound software engineering principles [1] Technical Aspects - Participants will learn to build structured AI workflows with composable tools and reliable control [1] - The session covers Mastra installation, running a local MCP server, defining tools and agents in TypeScript, and using the Mastra playground [1] - Practical examples include RAG setups and tool-chaining agents [1] Application - The framework enables the creation of internal AI assistants capable of handling requests like data cleaning, email drafting, and document summarization with minimal code [1] Speakers - Nick Nisi is an elite software engineer with expertise in open source web development [1] - Zachary Proser builds AI systems and shares his learnings through sample applications, technical guides, and real-world lessons [1]