Workflow
Prompt Learning
icon
Search documents
Build a Prompt Learning Loop - SallyAnn DeLucia & Fuad Ali, Arize
AI Engineer· 2026-01-06 17:30
[music] Hey everyone, gonna get started here. Thanks so much for joining us today. Um, I'm Sally. I'm the director of RISE.I'm going to be walking you through some of crowd prompt learning. Uh we're actually going to be building a driven optimization loop for the part of the workshop. Um I come from a technical background and started off in data science before I made my way over to product.Uh I do like to still be touching code today. I think one of my favorite projects that I work on is building our own ag ...
准确率腰斩,大模型视觉能力一出日常生活就「失灵」
3 6 Ke· 2025-12-09 06:59
Core Insights - The EgoCross project focuses on evaluating cross-domain first-person video question answering, revealing the limitations of existing MLLMs in specialized fields such as surgery, industry, extreme sports, and animal perspectives [1][3][4] Group 1: Project Overview - EgoCross is the first cross-domain EgocentricQA benchmark, covering four high-value professional fields and containing nearly 1,000 high-quality QA pairs [3][9] - The project provides both closed (CloseQA) and open (OpenQA) evaluation formats, addressing a significant gap in the assessment of models in these specialized areas [3][9] Group 2: Model Evaluation - Eight mainstream MLLMs were tested, revealing that even the best-performing models had a CloseQA accuracy below 55% and OpenQA accuracy below 35% in cross-domain scenarios [4][9] - The study found that reinforcement learning (RL) methods could significantly improve performance, with an average increase of 22% in accuracy [10][16] Group 3: Task and Domain Challenges - The research highlights the significant domain shift between everyday activities and specialized fields, with models performing well in daily tasks but struggling in professional contexts [8][9] - The study identified that prediction tasks showed a more severe decline in performance compared to basic identification tasks [13][16] Group 4: Improvement Strategies - Three improvement methods were explored: prompt learning, supervised fine-tuning (SFT), and reinforcement learning (RL), with RL showing the most substantial performance gains [15][16] - The findings suggest that current models have limitations in generalization, indicating a need for further development to create more capable multimodal systems [16]