CRASE5用于ACT写作技术报告

Investment Rating - The report does not explicitly provide an investment rating for the industry or company Core Insights - The CRASE5 scoring engine was developed to enhance automated scoring capabilities for ACT writing essays, incorporating new functionalities such as detecting off-topic essays and providing confidence levels in scoring [3][4] - The report aims to validate the performance of the CRASE5 models against the previous CRASE+ models, ensuring that the new models maintain similar scoring accuracy and reliability [4][30] - The CRASE5 engine was trained using a dataset of approximately 14,000 hand-scored essays, ensuring a representative sample for effective model training and validation [12][18] Summary by Sections Introduction - The CRASE scoring engine has been operational since October 2022 for various ACT programs, with CRASE5 set to be used starting September 2025 [1][3] Background: Automated Scoring and CRASE5 - Automated scoring utilizes algorithms to replicate human scoring behavior, with CRASE being a long-standing system since 2007, now enhanced for broader applications [6][7] Methods for Engine Training and Validation - The training sample consisted of essays from multiple ACT administrations, ensuring diversity and representativeness in the data used for model training [12][14] - The validation process involved comparing the new CRASE5 scores with those from human raters to assess accuracy and reliability [30][32] Results for Engine Training and Validation - The CRASE5 models demonstrated comparable performance to the original CRASE+ models, with agreement rates exceeding ACT's operational thresholds [32][41] - Distributional metrics for the first writing domain showed that the mean scores and standard deviations were consistent across raters and the CRASE5 engine [31][33] - The report includes detailed metrics for various writing domains, confirming that CRASE5 meets or exceeds the required standards for operational use [42][68] Baseline Results on the 1–6 Scale - The CRASE5 models achieved high exact agreement rates (over 71%) with human raters, indicating strong reliability [32][41] - The quadratic weighted kappa (QWK) values for CRASE5 were above the industry standard of 0.70, supporting its operational viability [27][32] Baseline Results on the 2–12 Scale - The CRASE5 models on the 2–12 scale showed promising distributional metrics, with a QWK of 0.91, suggesting suitability for operational use [55][59] - Exact agreement rates for the 2–12 scale were lower than for the 1–6 scale, but still within acceptable ranges for stakeholders to consider [53][55]