# ROC和AUC介绍以及如何计算AUC！！

[En]

The original post was posted on my blog:

http://alexkong.net/2013/06/introduction-to-auc-and-roc/

http://bubblexc.com/y2011/148/

)。这篇博文简单介绍ROC和AUC的特点，以及更为深入地，讨论如何作出ROC曲线图以及计算AUC。

# ROC曲线

[En]

As we can see in the example diagram of this ROC curve, the Abscissa of the ROC curve is false positive rate (FPR) and the ordinate is true positive rate (TPR). The following figure details how FPR and TPR are defined.

# 如何画ROC曲线

)上对ROC曲线的定义：

In signal detection theory, a receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied.

[En]

Next, we take the “Score” value as the threshold threshold from high to low. When the probability that the test sample belongs to a positive sample is greater than or equal to this threshold, we think it is a positive sample, otherwise it is a negative sample. For example, for the fourth sample in the figure, its “Score” value is 0.6, then sample 1, 2, 3 and 4 are considered positive samples, because their “Score” values are greater than or equal to 0.6, while other samples are considered negative samples. Each time we choose a different threshold, we can get a set of FPR and TPR, that is, points on the ROC curve. In this way, we get a total of 20 sets of FPR and TPR values, and draw them on the ROC curve as shown in the following figure:

[En]

When we set threshold to 1 and 0, we can get two points on the ROC curve (0mem0) and (1mem1), respectively. Join these (FPR,TPR) pairs together and you get the ROC curve. The more threshold values, the smoother the ROC curve.

[En]

In fact, we do not necessarily have to get the probability value that each test sample is a positive sample, as long as we get the “score” of the classifier to the test sample (the score is not necessarily in the range of (0)). The higher the score, the more certain the classifier thinks that the test sample is a positive sample and uses each score as threshold at the same time. I think it’s easier to understand how to translate a score into a probability.

# AUC值的计算

AUC（Area Under Curve）被定义为ROC曲线下的面积，显然这个面积的数值不会大于1。又由于ROC曲线一般都处于y=x这条直线的上方，所以AUC的取值范围在0.5和1之间。使用AUC值作为评价标准是因为很多时候ROC曲线并不能清晰的说明哪个分类器的效果更好，而作为一个数值，对应AUC更大的分类器效果更好。

http://scikit-learn.org/stable/

)。

# 为什么使用ROC曲线

[En]

In the figure above, (a) and (c) are ROC curves, and (b) and (d) are Precision-Recall curves. (a) and (b) show the results of classifying them in the original test set (balanced distribution of positive and negative samples), and (c) and (d) are the results of the classifier after increasing the number of negative samples in the test set to 10 times. It is obvious that the ROC curve basically keeps its original appearance, while the Precision-Recall curve changes greatly.

http://en.wikipedia.org/wiki/File:Roccurves.png

[^5]: Davis, J., & Goadrich, M. (2006, June). The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning (pp. 233-240). ACM.

[^6]: Fawcett, T. (2006). An introduction to ROC analysis. Pattern recognition letters, 27(8), 861-874.

[En]

Original: https://www.cnblogs.com/zhizhan/p/6002209.html
Author: 止战F
Title: ROC和AUC介绍以及如何计算AUC！！

(0)