有篇讲解原理的博文/论文(可看可不看):《为什么Decision Tree可以绘制出ROC曲线?》
一、数据,并要先one-hot多分类标签
from sklearn.preprocessing import label_binarize
y_test = label_binarize(y_test, classes=[0, 1, 2, 3, 4])
n_classes = y_test.shape[1]
二、构建模型,注意OVR类的使用(OneVsRestClassifier)
from sklearn.multiclass import OneVsRestClassifier
dtc = OneVsRestClassifier(DecisionTreeClassifier(criterion="gini",
min_samples_leaf=3, max_depth=15))
clf = dtc.fit(X=X_train, y=y_train)
y_score = clf.predict_proba(X_test)
三、绘图
第1个图 PR曲线
from sklearn.metrics import precision_recall_curve
precision = dict()
recall = dict()
average_precision = dict()
for i in range(n_classes):
precision[i], recall[i], _ = precision_recall_curve(y_test[:, i],
y_score[:, i])
average_precision[i] = average_precision_score(y_test[:, i], y_score[:, i])
precision["macro"], recall["macro"], _ = precision_recall_curve(y_test.ravel(),
y_score.ravel())
average_precision["macro"] = average_precision_score(y_test, y_score,
average="macro")
print('Average precision score, macro-averaged over all classes: {0:0.2f}'
.format(average_precision["macro"]))
plt.subplot(2, 3, iter_)
iter_ += 1
plt.step(recall['macro'], precision['macro'], where='post')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.ylim([0.0, 1.05])
plt.xlim([0.0, 1.0])
plt.title(
'Average precision score, macro-averaged over all classes: AP={0:0.3f}'
.format(average_precision["macro"]))
第2个图 ROC曲线(分别计算每一类下面面积为AUC值)
from sklearn.metrics import roc_curve, auc
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])
plt.subplot(2, 3, iter_)
iter_ += 1
lw = 1
colors = ['blue', 'red', 'green', 'black', 'yellow']
for i, color in zip(range(n_classes), colors):
plt.plot(fpr[i], tpr[i], color=color, lw=lw,
label='ROC curve of class {0} (area = {1:0.3f})'
''.format(i, roc_auc[i]))
plt.plot([0, 1], [0, 1], 'k--', lw=lw)
plt.xlim([-0.05, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic for multi-class data')
plt.legend(loc="lower right")
plt.show()
与上面不一样的是,也可以直接计算AUC值
auc()这个函数快被取消了,建议之后多用roc_auc_score,但是《sklearn:为什么roc_auc_score()和auc()有不同的结果? 博客园》
from sklearn.metrics import roc_auc_score
y_score = clf.predict_proba(X_test)
print(f"AUC={roc_auc_score(y_test, y_score, average='micro')}")
全部参考:
- 《为什么Decision Tree可以绘制出ROC曲线?》(原理)
- 官方文档《Precision-Recall》
- 《如何获得决策树的ROC曲线?》
- 《绘制决策树分类器的多类ROC曲线》看该论坛问题下面的回答
- 《多分类下的ROC曲线和AUC》
- 官方文档《sklearn.metrics.roc_curve》
- 《sklearn:为什么roc_auc_score()和auc()有不同的结果? 博客园》(区别)
Original: https://blog.csdn.net/weixin_43469047/article/details/114707688
Author: 小白tree
Title: sklearn决策树/随机森林多分类绘制ROC和PR曲线
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/665320/
转载文章受原作者版权保护。转载请注明原作者出处!