Comparing Multiclass AUC ROC Methods


  • Dylan Miller MacEwan University


Statistical models are commonly used to predict the outcome of events in a wide variety of fields such as health, finance, and business. Evaluation metrics are used to assess the effectiveness of these predictive models. One classification evaluation metric, called the receiver operating characteristics (ROC) curve has several useful properties, such as being threshold agnostic and can manage class imbalance where the outcomes are not equally represented. Despite the usefulness of the ROC curve, there is not a standard approach to extend to curve to multiclass problems. The purpose of this project was to evaluate multivariate ROC curve implementations with various underlying class proportions and degrees of separation. The methods we evaluated include the Macro, Micro, and Weighted average for one versus rest comparisons as well as the Hand and Till (HT) method. We compared the methods on simulated data with balanced, unbalanced and strongly unbalanced class proportions in combination with no separation, small separation, and large separation between classes. We found the methods were significantly different when class proportions were either unbalanced or severely unbalanced and the distributions were either separated or strongly separated (n=100, p<0.01). Pairwise comparisons found that HT and Macro were significantly different than Micro and Weighted (n=100, p<0.01). This study demonstrates that some of the AUC ROC methods differ depending on the class proportions and underlying distributions. The findings from this project may help practitioners select the most appropriate method according to their goals.

Department: Computer Science 

Faculty Mentor: Dr. Wanhua Su





Computer Sciences