Workflow
海天瑞声CEO李科:数据产业正从劳动密集型向技术和知识密集型转变

Group 1 - The core viewpoint of the forum is that the integration of data and AI serves as a dual engine driving innovation and growth in the intelligent era [1][2] - Current challenges in large model development include a "data wall" dilemma, where the contribution of unlabeled data to model performance is diminishing, leading to a need for a shift from expert-driven data science to quantitative and self-evolving stages [1] - A practical example shared indicates that filtering high-quality data from a vast dataset can significantly enhance model accuracy, with a reported 1.7% improvement in domain question-answering tasks by using only 20% of high-quality data from 10 billion tokens [1] Group 2 - Emphasis on data quality is crucial, with a focus on both human and machine experiences to enhance large model performance [2] - The global AI data industry is undergoing a significant transformation from labor-intensive to technology and knowledge-intensive models, showcasing how high-quality data can benefit various industries through real-world applications [2] - High-quality datasets should meet the VALID² criteria (vitality, authenticity, large sample size, completeness, diversity, high knowledge density), indicating a systematic reconstruction of methodologies, infrastructure, and industry ecology [2]