基于Kaggle心脏病数据集的数据分析和分类预测-StatisticalLearning统计学习实验报告

基于Kaggle心脏病数据集的数据分析和分类预测-StatisticalLearning统计学习实验报告

一、实验准备

本数据来源于kaggle,包含14个维度,303个样本,具体的变量说明如下表所示。

变量名详细说明取值范围target是否患有心脏病(分类变量)0=否,1=是age年龄(连续变量)[29,77]sex性别(分类变量)1=男,0=女cp胸痛经历(分类变量)1=典型心绞痛,2=非典型性心绞痛,3=非心绞痛,4=无症状trestbps静息血压(连续变量Hg)[94,200]chols人体胆固醇(连续变量mg/dl)[126,564]fbs空腹血糖(分类变量>120mg/dl)1=真,0=假restecg静息心电图测量(分类变量)0=正常,1=有ST-T波异常,2=按Estes标准显示可能或明确的左心室肥厚thalach最大心率(连续变量)[71,202]exang运动诱发心绞痛(分类变量)1=是,0=否oldpeak运动相对于休息引起的ST段压低(连续变量)[0,6.2]slope峰值运动ST段的斜率(分类变量)1=上升,2=平坦,3=下降ca主要血管数量(连续变量)[0,3]thal地中海贫血的血液疾病(分类变量)1=正常,2=固定缺陷,3=可逆缺陷

'''
    -*- coding: utf-8 -*-
    @Author     : DouGang
    @E-mail     : dorza@qq.com
    @Software   : PyCharm, Python3.6
    @Time       : 2021-07-24
'''

导入相关库


import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import precision_score,recall_score,f1_score
from sklearn.metrics import precision_recall_curve,roc_curve,average_precision_score,auc

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

from sklearn.linear_model import LogisticRegression

from sklearn.linear_model import SGDClassifier

二、数据展示

plt.rcParams['font.sans-serif'] = ['SimHei']
heart_df = pd.read_csv("./dataSet/heart.csv")
print(heart_df.shape)
print(heart_df.head())
print(heart_df.info())
print(heart_df.describe())
print(heart_df.isnull().sum())
sns.heatmap(heart_df.isnull())
plt.show()
sns.pairplot(heart_df,hue='target')
plt.show()
(303, 14)
   age  sex  cp  trestbps  chol  fbs  ...  exang  oldpeak  slope  ca  thal  target
0   63    1   3       145   233    1  ...      0      2.3      0   0     1       1
1   37    1   2       130   250    0  ...      0      3.5      0   0     2       1
2   41    0   1       130   204    0  ...      0      1.4      2   0     2       1
3   56    1   1       120   236    0  ...      0      0.8      2   0     2       1
4   57    0   0       120   354    0  ...      1      0.6      2   0     2       1
[5 rows x 14 columns]

`

RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
# Column Non-Null Count Dtype

Original: https://blog.csdn.net/qq_40605313/article/details/120061220
Author: 中科豆
Title: 基于Kaggle心脏病数据集的数据分析和分类预测-StatisticalLearning统计学习实验报告

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/662179/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球