ROC or AUC is widely used in logistic regression or other classification methods for model comparison and feature selection, which measures the trade-off between sensitivity and specificity. The paper by Gary King warns the dangers using logistic regression for rare event and proposed a penalized likelihood estimator. In PROC LOGISTIC, the FIRTH option implements this penalty concept.
When the event in the response variable is rare, the ROC curve will be dominated by minority class and thus insensitive to the change of true positive rate, which provides litter information for model diagnosis. For example, I construct a subset of SASHELP.CARS with the response variable Type including 3 hybrid cars and 262 sedan cars, and hope to use the regressors, Weight, Wheelbase, Invoice to predict whether a car’s type is either hybrid or sedan. After the logistic regression, the AUC tents to be 0.9109 that is a pretty high value. However, the model is still ill-fitted and needs tuning, since the classification table shows the sensitivity is zero.
where type in ("Sedan", "Hybrid");
proc freq data = rare;
proc logistic data = rare;
model Type(event='Hybrid') = Weight Wheelbase Invoice
/ pprob = 0.01 0.05 pevent = 0.5 0.05 ctable;
|Prob Event||Prob Level||Correct Event||Correct Non-Event||Incorrect Event||Incorrect Non-Event||Accuracy||Sensitivity||Specificity||False POS||False NEG|
In case that ROC won’t help PROC LOGISTIC any more, there seem three ways that may increase the desired sensitivity or boost the ROC curve.
Lower the cut-off probability
In the example above, moving the cut-off probability to an alternative value to 0.01 will significant increase the sensitivity. However, the result comes with the drastic loss of specificity as the cost.
Up-sampling or down-sampling
Imbalanced classes in the response variable could be adjusted by unequal weight such as up-sampling or down-sampling. Down-sampling, would be easy to fulfill using a stratified sampling by
. Up-sampling is more appropriate for this case, but may need over-sampling techniques in SAS
Use different criterions such as F1 score
For modeling rare event classification, the most important factors should be sensitivity and precision, instead of accuracy that combines sensitivity and specificity. On the contrary, the F1 score
can be interpreted as a weighted average of the sensitivity and precision, which makes it a better candidate to replace AUC.