基于Kaggle心脏病数据集的数据分析和分类预测-StatisticalLearning统计学习实验报告
一、实验准备
本数据来源于kaggle,包含14个维度,303个样本,具体的变量说明如下表所示。
变量名详细说明取值范围target是否患有心脏病(分类变量)0=否,1=是age年龄(连续变量)[29,77]sex性别(分类变量)1=男,0=女cp胸痛经历(分类变量)1=典型心绞痛,2=非典型性心绞痛,3=非心绞痛,4=无症状trestbps静息血压(连续变量Hg)[94,200]chols人体胆固醇(连续变量mg/dl)[126,564]fbs空腹血糖(分类变量>120mg/dl)1=真,0=假restecg静息心电图测量(分类变量)0=正常,1=有ST-T波异常,2=按Estes标准显示可能或明确的左心室肥厚thalach最大心率(连续变量)[71,202]exang运动诱发心绞痛(分类变量)1=是,0=否oldpeak运动相对于休息引起的ST段压低(连续变量)[0,6.2]slope峰值运动ST段的斜率(分类变量)1=上升,2=平坦,3=下降ca主要血管数量(连续变量)[0,3]thal地中海贫血的血液疾病(分类变量)1=正常,2=固定缺陷,3=可逆缺陷
'''
-*- coding: utf-8 -*-
@Author : DouGang
@E-mail : dorza@qq.com
@Software : PyCharm, Python3.6
@Time : 2021-07-24
'''
导入相关库
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import precision_score,recall_score,f1_score
from sklearn.metrics import precision_recall_curve,roc_curve,average_precision_score,auc
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import SGDClassifier
二、数据展示
plt.rcParams['font.sans-serif'] = ['SimHei']
heart_df = pd.read_csv("./dataSet/heart.csv")
print(heart_df.shape)
print(heart_df.head())
print(heart_df.info())
print(heart_df.describe())
print(heart_df.isnull().sum())
sns.heatmap(heart_df.isnull())
plt.show()
sns.pairplot(heart_df,hue='target')
plt.show()
(303, 14)
age sex cp trestbps chol fbs ... exang oldpeak slope ca thal target
0 63 1 3 145 233 1 ... 0 2.3 0 0 1 1
1 37 1 2 130 250 0 ... 0 3.5 0 0 2 1
2 41 0 1 130 204 0 ... 0 1.4 2 0 2 1
3 56 1 1 120 236 0 ... 0 0.8 2 0 2 1
4 57 0 0 120 354 0 ... 1 0.6 2 0 2 1
[5 rows x 14 columns]
`
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
# Column Non-Null Count Dtype
Original: https://blog.csdn.net/qq_40605313/article/details/120061220
Author: 中科豆
Title: 基于Kaggle心脏病数据集的数据分析和分类预测-StatisticalLearning统计学习实验报告
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/662179/
转载文章受原作者版权保护。转载请注明原作者出处!