Workflow
AI Engineer
icon
Search documents
360Brew: LLM-based Personalized Ranking and Recommendation - Hamed and Maziar, LinkedIn AI
AI Engineer· 2025-07-16 17:59
Model Building and Training - LinkedIn leverages large language models (LLMs) for personalization and ranking tasks, aiming to use one model for all tasks [2][3] - The process involves converting user information into prompts, a method called "promptification" [8] - LinkedIn builds a large foundation model, Blue XL, with 150 billion parameters, then distills it to smaller, more efficient models like a 3B model for production [12] - Distillation from a large model is more effective than training a small model from scratch [14] - Increasing data, model size (up to 8x22B), and context length can improve model performance, but longer contexts may require model adjustments [17][18][19] Model Performance and Generalization - The model improves performance for cold start users, showing a growing gap compared to production models as interactions decrease [21] - The model demonstrates generalization to new domains, performing on par with or better than task-specific production models in out-of-domain tasks [23] Model Serving and Optimization - LinkedIn focuses on model specification, pruning, and quantization to improve throughput and reduce latency for production [26] - Gradual pruning and distillation are more effective than aggressive pruning, minimizing information loss [29][30] - Mixed precision, including FP8 for activations and model parameters but FP32 for the LM head, is crucial for maintaining prediction precision [31][32] - Sparsifying attention scores can reduce latency by allowing multiple item recommendations without each item attending to each other [34][35] - LinkedIn achieved a 7x reduction in latency and a 30x increase in throughput per GPU through these optimization techniques [36]
What We Learned from Using LLMs in Pinterest — Mukuntha Narayanan, Han Wang, Pinterest
AI Engineer· 2025-07-16 17:58
[Music] Yeah. Hi everyone. Um, thanks for joining the talk today.Um, we're super excited to be here and shares some of the learnings we um, we have from integrating the LM into Pinterest search. My name is Khan and today I'll be presenting with Mukunda and we are both machine learning engineers from search relevance team at Pinterest. So start with a brief introduction to Pinterest.Um Pinterest is a visual discovery platform where piners can come to find inspiration to create a life they love. And there are ...
RL for Autonomous Coding — Aakanksha Chowdhery, Reflection.ai
AI Engineer· 2025-07-16 16:18
Large Language Models Evolution - Scaling laws 表明,增加计算量、数据和参数可以提高 Transformer 模型的性能,并推广到其他领域 [2][3] - 随着模型规模的扩大,性能持续提高,并在中等数学难题的解决率上有所体现,尤其是在提示模型展示思维链时 [5][7] - 通过强化学习和人类反馈,模型能够更好地遵循指令,从而实现聊天机器人等应用 [10][11] Inference Time Optimization - 通过生成多个响应并进行多数投票(自洽性),可以在推理时提高性能 [15] - 顺序修改之前的响应,特别是在可以验证答案的领域(如数学和编程),可以显著提高性能 [16][17] - 在可以验证答案的领域,推理时间计算的扩展可以转化为智能 [19] Reinforcement Learning for Autonomous Coding - 强化学习是下一个扩展前沿,特别是在可以自动验证输出的领域 [24] - 经验时代将通过强化学习构建超级智能系统,尤其是在具有自动验证的领域 [25] - 自动编码是一个扩展强化学习的绝佳领域,因为它具有验证输出的能力 [30][31] Challenges in Scaling Reinforcement Learning - 扩展强化学习比扩展 LLM 更具挑战性,因为它需要多个模型副本以及训练和推理循环 [29] - 在强化学习中,奖励模型的奖励函数设计是一个挑战 [29][30] Reflection's Mission - Reflection 致力于构建超级智能,并以自主编码作为根本问题 [33] - Reflection 团队由在 LLM 和强化学习领域有开创性工作的 35 位先驱组成 [33]
Recsys Keynote: Improving Recommendation Systems & Search in the Age of LLMs - Eugene Yan
AI Engineer· 2025-07-16 15:00
Industry Trend - Recommendation and search systems have been significantly impacted by advances in language modeling, evolving from Word2vec to GRUs, Transformers, and BERT [1] - The emergence of large language models (LLMs) is driving innovation in model architecture, scalable system designs, and customer experiences within recommendation and search systems [1] - The industry is exploring real-world implementations and measurable outcomes of LLMs in recommendation and search systems [1] Technological Advancement - LLM-driven techniques are expected to shape the future of content discovery and intelligent search [1] - Amazon is building recommendation systems and AI-powered products using ML/AI [1]
OpenAI's Sean Grove: Code is NOT all you do
AI Engineer· 2025-07-16 07:00
uh it feels tangible and real but it's sort of underelling the job that each of you does. Code is sort of 10 to 20% of the value that you bring. The other 80 to 90% is in structured communication and this is going to be different for everyone but a process typically looks something like you talk to users in order to understand their challenges.You distill these stories down and then ideulate about how to solve these problems. What what is the goal that you want to achieve. You plan ways to achieve those goa ...
OpenAI's Sean Grove: Everything is a Spec: The Universal Language of Intent
AI Engineer· 2025-07-16 00:01
Core Concept - The industry emphasizes that specifications are a universal concept applicable across various fields, including programming, product management, and lawmaking [1] - The industry views prompt engineering as a form of specification writing, aligning AI models with intentions and values [1] Benefits of Specifications - Specifications enable faster and safer product development and deployment [2] - Specifications allow for broader contributions from various roles, blurring the lines between traditional roles like PM, lawmaker, engineer, marketer, and programmer [2]
Benchmarks Are Memes: How What We Measure Shapes AI—and Us - Alex Duffy
AI Engineer· 2025-07-15 17:05
Benchmarks as Memes in AI - Benchmarks are presented as memes that shape AI development, influencing what models are trained and tested on [1][3][8] - The AI industry faces a problem of benchmark saturation, as models become too good at existing benchmarks, diminishing their value [5][6] - There's an opportunity for individuals to create new benchmarks that define what AI models should excel at, shaping the future of AI capabilities [7][13] The Lifecycle and Impact of Benchmarks - The typical benchmark lifecycle involves an idea spreading, becoming a meme, and eventually being saturated as models train on it [8] - Benchmarks can have unintended consequences, such as reinforcing biases if not designed thoughtfully, as seen with the Chat-GPT thumbs-up/thumbs-down benchmarking [14] - The industry should focus on creating benchmarks that empower people and promote agency, rather than treating them as mere data points [16] Qualities of Effective Benchmarks - Great benchmarks should be multifaceted, rewarding creativity, accessible to both small and large models, generative, evolutionary, and experiential [17][18][19] - The industry needs more "squishy," non-static benchmarks for areas like ethics, society, and art, requiring subject matter expertise [34][35] - Benchmarks can be used to build trust in AI by allowing people to define goals, provide feedback, and see AI improve, fostering a sense of importance and control [37] AI Diplomacy Benchmark - AI Diplomacy is presented as an example of a benchmark that mimics real-world situations, testing models' abilities to negotiate, form alliances, and betray each other [20][22][23] - The AI Diplomacy benchmark revealed interesting personality traits in different models, such as 03 being a schemer and Claude models being naively optimistic [24][25][30] - The AI Diplomacy benchmark highlighted the importance of social aspects and convincing others, with models like Llama performing well due to their social skills [31]
Small AI Teams with Huge Impact — Vikas Paruchuri, Datalab
AI Engineer· 2025-07-15 17:05
Company Growth & Strategy - Datal Lab achieved seven-figure ARR and trained state-of-the-art models with a team of three [1] - The company has grown in revenue 5x since January [2] - Customers include tier one AI labs, universities, Fortune 500 companies, and AI startups [3] - The company's philosophy is to hire less than 15 generalists and fill in the edges with AI and internal tooling [11][12] Team Building & Productivity - Headcount does not equal productivity [3] - The company aims to maintain a "golden period" of alignment and productivity indefinitely [10][11] - The company prioritizes hiring senior generalists with maturity and the ability to solve problems independently [21][22] - The company emphasizes in-person work for small teams to facilitate fast collaboration and tight feedback loops [23] Technology & Processes - The company reuses components aggressively and keeps technology simple, avoiding fancy front-end frameworks [23][24] - The company minimizes bureaucracy and emphasizes high trust and continuous discussions [25] - The company uses AI to automate low-leverage tasks, allowing the team to focus on higher-level work [20]
Tiny Teams — Grant Lee, Gamma
AI Engineer· 2025-07-15 17:04
Company Vision & Product - Gamma aims to revolutionize content creation and sharing, positioning itself as an alternative to traditional tools like PowerPoint [1] - The company focuses on a content-first approach, simplifying the design and formatting process [3] - Gamma's goal is to provide tools that foster imagination and facilitate the sharing of ideas [4][5] Team Structure & Management - Gamma emphasizes a flat organizational structure, moving away from traditional hierarchies [5][6][7] - The company promotes the "rise of the generalist," valuing employees with diverse skill sets and adaptability [8][10] - Gamma utilizes the "player coach" model, where leaders contribute to both management and hands-on work [8][16] Scaling & Culture - Gamma has reached over 50 million users with a team of 30 [7] - The company prioritizes brand and culture from the beginning, viewing them as interconnected [24] - Gamma invests in maintaining a strong company culture through a living culture deck and regular all-hands meetings [26][29] Hiring Practices - Gamma seeks individuals who are continuous learners and effective teachers [15][16] - The company assesses candidates for "high agency" by exploring their problem-solving approaches and depth of understanding [40][41] - Gamma utilizes work trials to ensure a good fit between the company and new hires [46]
Building a 10 person unicorn - Max Brodeur-Urbas, Gumloop
AI Engineer· 2025-07-15 17:03
Company Overview & Growth Strategy - Gum Loop, founded a year and a half ago, focuses on workflow automation and has scaled to nine people after raising a Series A as a team of two [1][9] - The company emphasizes product-led growth (PLG), relying on inbound interest rather than outbound sales, which contributes to rapid scaling [11][12] - Gum Loop's customers include large companies like Instacart and Shopify, with Shopify rolling out the product company-wide [10][11] Hiring & Team Culture - Gum Loop prioritizes hiring exceptional individuals and maintains a small team to enable faster movement and minimize meetings [9][10][16] - The company uses "work trials" to assess candidates, integrating them into the team for several days to evaluate fit [16][21] - Gum Loop fosters a culture of rapid iteration, challenging the team to ship features quickly, while also emphasizing fun and team-building activities like company retreats [31][32][33][34] Internal Operations & Automation - Gum Loop minimizes meetings to allow employees deep focus time for building product [22][23][24] - The company automates internal processes using its own product, Gum Loop, to improve efficiency [26][27][29] - Gum Loop uses AI chatbot data to inform product decisions, automating tasks that would otherwise consume significant employee time [29]