KeyError: “None of [Int64Index([…],n dtype=‘int64‘, length=739)] are in the [columns]“

2023年6月19日上午10:51 • 人工智能 • 阅读 74

KeyError: “None of [Int64Index([ 0, 1, 2, 3, 4, 6, 7, 8, 9, 10,\n …\n 907, 908, 910, 911, 912, 914, 916, 917, 920, 923],\n dtype=’int64′, length=739)] are in the [columns]”

问题：

解决：

完整错误：

KeyError: “None of [Int64Index([...],n dtype=‘int64‘, length=739)] are in the [columns]“

问题：


validation_test_result_pd = pd.DataFrame()
X = features_train
y = target_train

n_samples, n_features = X.shape

Classification and ROC analysis
Run classifier with cross-validation and plot ROC curves

cv = StratifiedKFold(n_splits=5)
cv =KFold(n_splits=5,shuffle=True,random_state=42)

classifier = LogisticRegression(class_weight = "balanced", penalty = "l2")
'n_estimators': 200, 'min_child_weight': 1, 'max_depth': 5
classifier = pipeline_optimizer.fitted_pipeline_.steps[0][1]
classifier = XGBClassifier()

tprs = []
aucs = []

accs = []
auc_mean_list = []
auc_std_list = []
acc_mean_list = []
acc_std_list = []
mean_fpr = np.linspace(0, 1, 100)

rep_folds = []

fig, ax = plt.subplots()
for i, (train, test) in enumerate(cv.split(X, y)):
    #print(test)
    #print(type(test))
    #print(train.shape)
    classifier.fit(X[train], y[train])
    viz = plot_roc_curve(classifier, X[test], y[test],
                         name='ROC curve of fold {}'.format(i),
                         alpha=0.2, lw=2, ax=ax)

    #print(viz)
    interp_tpr = np.interp(mean_fpr, viz.fpr, viz.tpr)
    interp_tpr[0] = 0.0

    tprs.append(interp_tpr)
    aucs.append(viz.roc_auc)
    y_pred = classifier.predict(X[test])
    accs.append(accuracy_score(y[test], y_pred))

    print(str(i))
    print('---------------classification report of fold %d-------------------' % (i))

    y_pred = classifier.predict(X[test])
    print(classification_report(y[test], y_pred))
    #fold confusion matrix
    rep_folds.append(classification_report(y[test], y_pred, output_dict=True,digits=3))
    # Compute confusion matrix
    cnf_matrix = confusion_matrix(y[test], y_pred)
    np.set_printoptions(precision=2)
    #np.set_printoptions(precision=3)

    # Plot non-normalized confusion matrix
    plt.figure()
    # radiomics
    title_raidomics='Radiomics plot of fold_'+str(i)
    plot_radiomics(X[test],y[test],title_raidomics,classifier)
    plt.figure(figsize(6,4))
    #confustion matrix
    plot_confusion_matrix(cnf_matrix, classes=[0,1],
                          title='Confusion matrix, without normalization of fold %d ' % (i))

    # Plot normalized confusion matrix
    plt.figure()
    plot_confusion_matrix(cnf_matrix, classes=[0,1], normalize=True,
                          title='Normalized confusion matrix of fold %d ' % (i))

    if i == 0:
        best = viz.roc_auc
        model = classifier
    elif viz.roc_auc>best:
        best = viz.roc_auc
        model = classifier
    else:
        pass

    input_len = len(test)
    to_dca = {'y_true':y[test],'y_pred':y_pred,'probailities':classifier.predict_proba(X[test])[:,1]}
    dca_pd = pd.DataFrame.from_dict(to_dca)
    dca_pd['train_ornot'] = input_len*["fold_" + str(i)]

    #fold_pd = pd.DataFrame(X[test],columns = features_train.columns.tolist())
    #dca_pd = pd.concat([dca_pd,fold_pd],axis = 1)

    train_temp_pd = df_in.iloc[target_train.index.tolist()].reset_index(drop = True)
    index_pd = train_temp_pd.iloc[test].reset_index(drop = True)
    dca_pd = pd.concat([dca_pd,index_pd],axis = 1)

    #fold_value = 'fold'+str(i)
    #dca_pd['validation_fold_or_test'] = [fold_value]*index_pd.shape[0]

    if i == 0:
        validation_test_result_pd = dca_pd
    else:
        validation_test_result_pd = validation_test_result_pd.append(dca_pd,ignore_index = True)

ax.plot([0, 1], [0, 1], linestyle='--', lw=2, color='red',
        label='Chance', alpha=.8)

mean_tpr = np.mean(tprs, axis=0)
mean_tpr[-1] = 1.0
from sklearn.metrics import auc
mean_auc = auc(mean_fpr, mean_tpr)
std_auc = np.std(aucs)

mean_acc = np.mean(accs)
std_acc = np.std(accs)
import math
print(len(aucs) == len(mean_tpr))

std_error = std_auc / math.sqrt(5)
cv = 5
so t-test table search and the result is 2.776, cause freedom is 5-1
ci =  2.776 * std_error
ci = 2.262 * std_error
print(ci)
lower_bound = mean_auc - ci
upper_bound = mean_auc + ci

ax.plot(mean_fpr, mean_tpr, color='b',
        label=r'Mean ROC (AUC = %0.2f $\pm$ %0.2f)' % (mean_auc, std_auc),
        lw=2, alpha=.8)
#https://stats.stackexchange.com/questions/100159/confidence-intervals-for-auc-using-cross-validation
ax.plot(mean_fpr, mean_tpr, color='blue',
        label=r'Mean ROC (AUC CI = [%0.3f,%0.3f])' % (lower_bound, upper_bound),
        lw=2.5, alpha=.8)
ax.plot(mean_fpr, mean_tpr, color='b',
        label=r'Mean ROC (AUC = %0.2f $\pm$ %0.2f)' % (mean_auc, std_auc),
        lw=2, alpha=.8)

std_tpr = np.std(tprs, axis=0)
tprs_upper = np.minimum(mean_tpr + std_tpr, 1)
tprs_lower = np.maximum(mean_tpr - std_tpr, 0)

cross validation scores output
print(aucs)
auc_mean_list.append(mean_auc)
auc_std_list.append(std_auc)
print('{} cv auc is :'.format(pipeline_optimizer.fitted_pipeline_.steps[0][0]), str(mean_auc), str(std_auc))

cross validation scores output
print(accs)
acc_mean_list.append(mean_acc)
acc_std_list.append(std_acc)
print('{} cv acc is :'.format(pipeline_optimizer.fitted_pipeline_.steps[0][0]), str(mean_acc), str(std_acc))

ax.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,
                label=r'$\pm$ 1 std. dev.')
ax.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,)

ax.set(xlim=[-0.05, 1.05], ylim=[-0.05, 1.05],
       title="ROC Curve with 5fold CV")
ax.legend(loc="lower right")
plt.show()
fig.savefig('{}_5fold.png'.format(pipeline_optimizer.fitted_pipeline_.steps[0][0]),bbox_inches='tight')
plt.show()

解决：

添加如下代码段：

去除原始数据的索引之后转化为numpy数组；

X = features_train.reset_index(drop=True).values
y = target_train.reset_index(drop=True).values

或者在训练时候使用如下语法：

 X_train, X_test = X.iloc[train_index], X.iloc[test_index]

 y_train, y_test = y.iloc[train_index], y.iloc[test_index]

完整错误：

Original: https://blog.csdn.net/zhongkeyuanchongqing/article/details/120796789
Author: Data+Science+Insight
Title: KeyError: “None of [Int64Index([…],n dtype=‘int64‘, length=739)] are in the [columns]“

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/638879/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

目标检测模型从训练到部署！

Datawhale干货作者：张强，Datawhale成员训练目标检测模型并部署到你的嵌入式设备，让边缘设备长”眼睛”。目标检测的任务是找出图像中所有感…

人工智能 2023年7月10日
0077
用Cmake 编译OpenCV常见的错误

minGW32-make遇到的错误1： [ 37%] Linking CXX shared library …\bin\libopencv_core341.dllCMa…

人工智能 2023年6月24日
00107
Hopfield 神经网络笔记

文章目录定义与概念计算与推导 * 工作方式 – 异步工作模式同步工作模式一个结论能量函数权重矩阵的确定 – 外积法伪逆法一个例子拓展 * …

人工智能 2023年7月13日
0077
（五）比赛中的CV算法（中）目标检测的常见概念和术语

性能指标与mAP（mean average precision） mAP是目标检测中最常见的测试检测器性能的指标。在次之前先让我们看看混淆矩阵，这是机器学习中所有分类器都要确定的一…

人工智能 2023年7月2日
0086
☀️机器学习入门☀️(二) KNN分类算法 | 附加小练习

目录 1. 聚类与分类 * 1.1 聚类 1.2 分类 2. 关于KNN算法 * 2.1 Lp距离定义： 2.1 K值的选取 3. 练习 * 第一题：第二题：最后聚类与分类 …

人工智能 2023年6月30日
0076
python+opencv实现人脸微整形

目录一、前言二、主要原理三、算法实现 * （1）计算偏移量（2）考虑多个点影响（3）控制点的手动增加，删除功能四、总结一、前言表情捕捉驱动另一张脸或者3D人脸是元宇…

人工智能 2023年7月20日
0062
Python 重命名文件夹

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。 Original: https:…

人工智能 2023年7月9日
0089
手写数字的识别问题-SVM、朴素贝叶斯算法、决策树算法、KNN算法对比

目录一、题目内容和要求：二、问题背景和相关工作介绍 * 概念介绍数据介绍算法介绍三、解题思路 * 1.SVM算法流程 2.朴素贝叶斯算法流程 3.决策树算法流程 4.K…

人工智能 2023年7月27日
0060
Windows下编译语音识别引擎Wenet

Windows下编译语音识别引擎Wenet 一、Wenet简介二、Wenet首次编译过程 * 2.1下载第三方库源码 2.2替换软链接 2.3下载gRPC的third_party…

人工智能 2023年5月25日
0081
迁移学习之DenseNet121(121层）,DenseNet169(169层),DenseNet201(201层)（图像识别）

文章目录 * – 1.实现效果： – 2.结果分析： – 3.主文件TransorDenseNet.py: 1.实现效果：实际图片：（1）De…

人工智能 2023年5月26日
00116
【Python数据分析】pandas知识总结(超全面)

创建一维数据表 sr1 = pd.Series(np.arange(10), index=list(string.ascii_uppercase[:10])) print(sr1)…

人工智能 2023年7月8日
0069
kaldi 发音字典生成

kaldi 发音字典可以通过3种工具来生成,分别是:g2p-seq2seq Phonetisaurus 和 Sequitur,据说g2p-seq2seq的准确性更好些,今天我们就来…

人工智能 2023年5月23日
0092
Android实现实时视频聊天功能｜源码 Demo 分享

疫情期间，很多线下活动转为线上举行，实时音视频的需求剧增，在视频会议，在线教育，电商购物等众多场景成了”生活新常态”。本文将教你如何通过即构ZEGO 音视…

人工智能 2023年6月3日
0081
【机器学习】决策树python实现

决策树理解：所谓决策树，就是根据树结构来进行决策。举个例子，小明的妈妈去上海人民公园相亲角为儿子物色相亲对象，广场上数百名适婚年龄男女的家长自发来到这里，手里拿着自家孩子的基本资…

人工智能 2023年6月15日
0070
独孤九剑第四式-K近邻模型(KNN)

💐文章适合于所有的相关人士进行学习💐🍀各位看官看完了之后不要立刻转身呀🍀🌿期待三连关注小小博主加收藏🌿🍃小小博主回关快会给你意想不到的惊喜呀🍃各位老板动动小手给小弟点赞收藏一下，…

人工智能 2023年6月16日
00101
【Python程序设计】网络爬虫与自动化

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年6月27日
0070

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

KeyError: “None of [Int64Index([…],n dtype=‘int64‘, length=739)] are in the [columns]“

问题：

解决：

完整错误：

大家都在看