【人工智能项目】- 机器学习实现收入分类预测报告
题目
利用age、workclass、…、native_country等13个特征预测收入是否超过50k,是一个二分类问题。
训练集
32561个样本,每个样本14个特征,其中6个连续性特征、9个离散型特征。
特征介绍:
Age:年龄;
Workclass:离散值,表示工作类型,包括私人的,不为公司的,不为公司的,联邦政府的,地方政府的,州政府的,没有薪水的,从未工作过的;
Fnlwgt:连续值;
Education:学历背景;
Education-num:受教育时间;
Maritial_status:婚姻状况;
Occupation:职业;
Relationship:关系;
Race:种族;
Sex性别;
Captital_gain:资本收益;
Captital loss:损失;
Hours/week 工作时长;
Native country:国籍;
Income:收入。为该问题的label;
; 测试集
16281个样本,每个样本14个特征。
即在测试集中,根据age等14个特征,预测income是否超过50k,二分类问题。
说明
部分特征的值为”?”,表示缺失值,需要对其先处理。
实验部分
1.导入数据
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import missingno
import seaborn as sns
from pandas.tools.plotting import scatter_matrix
from mpl_toolkits.mplot3d import Axes3D
from sklearn.feature_selection import RFE, RFECV
from sklearn.svm import SVR
from sklearn.decomposition import PCA
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, label_binarize
import sklearn.ensemble as ske
from sklearn import datasets, model_selection, tree, preprocessing, metrics, linear_model
from sklearn.svm import LinearSVC
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LinearRegression, LogisticRegression, Ridge, Lasso, SGDClassifier
from sklearn.tree import DecisionTreeClassifier
import scipy.stats as st
from scipy.stats import randint as sp_randint
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import precision_recall_fscore_support, roc_curve, auc
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
train_data = pd.read_csv("train.csv")
test_data = pd.read_csv("test.csv")
train_data.head()
AgeWorkclassfnlgwtEducationEducation numMarital StatusOccupationRelationshipRaceSexCapital GainCapital LossHours/WeekNative countryIncome039State-gov77516Bachelors13Never-marriedAdm-clericalNot-in-familyWhiteMale2174040United-States
Original: https://blog.csdn.net/Mind_programmonkey/article/details/121125930
Author: mind_programmonkey
Title: 【人工智能项目】- 机器学习实现收入分类预测报告
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/666508/
转载文章受原作者版权保护。转载请注明原作者出处!