sklearn决策树/随机森林多分类绘制ROC和PR曲线

有篇讲解原理的博文/论文(可看可不看):《为什么Decision Tree可以绘制出ROC曲线?》

一、数据,并要先one-hot多分类标签

from sklearn.preprocessing import label_binarize
y_test = label_binarize(y_test, classes=[0, 1, 2, 3, 4])
n_classes = y_test.shape[1]

二、构建模型,注意OVR类的使用(OneVsRestClassifier)


from sklearn.multiclass import OneVsRestClassifier

dtc = OneVsRestClassifier(DecisionTreeClassifier(criterion="gini",
                                                 min_samples_leaf=3, max_depth=15))

clf = dtc.fit(X=X_train, y=y_train)

y_score = clf.predict_proba(X_test)

三、绘图

第1个图 PR曲线

from sklearn.metrics import precision_recall_curve

precision = dict()
recall = dict()
average_precision = dict()
for i in range(n_classes):
    precision[i], recall[i], _ = precision_recall_curve(y_test[:, i],
                                                        y_score[:, i])
    average_precision[i] = average_precision_score(y_test[:, i], y_score[:, i])

precision["macro"], recall["macro"], _ = precision_recall_curve(y_test.ravel(),
                                                                y_score.ravel())
average_precision["macro"] = average_precision_score(y_test, y_score,
                                                     average="macro")
print('Average precision score, macro-averaged over all classes: {0:0.2f}'
      .format(average_precision["macro"]))

plt.subplot(2, 3, iter_)
iter_ += 1

plt.step(recall['macro'], precision['macro'], where='post')

plt.xlabel('Recall')
plt.ylabel('Precision')
plt.ylim([0.0, 1.05])
plt.xlim([0.0, 1.0])
plt.title(
    'Average precision score, macro-averaged over all classes: AP={0:0.3f}'
        .format(average_precision["macro"]))

第2个图 ROC曲线(分别计算每一类下面面积为AUC值)

from sklearn.metrics import roc_curve, auc
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])

plt.subplot(2, 3, iter_)
iter_ += 1
lw = 1
colors = ['blue', 'red', 'green', 'black', 'yellow']
for i, color in zip(range(n_classes), colors):
    plt.plot(fpr[i], tpr[i], color=color, lw=lw,
             label='ROC curve of class {0} (area = {1:0.3f})'
             ''.format(i, roc_auc[i]))

plt.plot([0, 1], [0, 1], 'k--', lw=lw)
plt.xlim([-0.05, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic for multi-class data')
plt.legend(loc="lower right")
plt.show()

与上面不一样的是,也可以直接计算AUC值

auc()这个函数快被取消了,建议之后多用roc_auc_score,但是《sklearn:为什么roc_auc_score()和auc()有不同的结果? 博客园》

from sklearn.metrics import roc_auc_score
y_score = clf.predict_proba(X_test)
print(f"AUC={roc_auc_score(y_test, y_score, average='micro')}")

sklearn决策树/随机森林多分类绘制ROC和PR曲线

全部参考:

Original: https://blog.csdn.net/weixin_43469047/article/details/114707688
Author: 小白tree
Title: sklearn决策树/随机森林多分类绘制ROC和PR曲线

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/665320/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球