从零开始数据分析Kaggle项目—泰坦尼克号(三)
本节主要内容如何利用Pandas进行排序、算术计算以及函数describe()的使用。
import numpy as np
import pandas as pd
df = pd.read_csv("train.csv")
df.head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS1211Cumings, Mrs. John Bradley (Florence Briggs Th…female38.010PC 1759971.2833C85C2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
df1 = pd.DataFrame(np.arange(8).reshape((2, 4)),
index=['2', '1'],
columns=['a', 'b', 'c','d'])
df1
abcd2012314567
df1.sort_values(by='c', ascending=True)
abcd2012314567
df1.sort_index()
abcd1456720123
df1.sort_index(axis=1)
abcd2012314567
df1.sort_index(axis=1, ascending=False)
dcba2321017654
df1.sort_values(by=['a', 'c'], ascending=False)
abcd1456720123
df.sort_values(by=['Fare', 'Age'], ascending=False).head(5)
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked67968011Cardeza, Mr. Thomas Drake Martinezmale36.001PC 17755512.3292B51 B53 B55C25825911Ward, Miss. Annafemale35.000PC 17755512.3292NaNC73773811Lesurer, Mr. Gustave Jmale35.000PC 17755512.3292B101C43843901Fortune, Mr. Markmale64.01419950263.0000C23 C25 C27S34134211Fortune, Miss. Alice Elizabethfemale24.03219950263.0000C23 C25 C27S
df_a = pd.DataFrame(np.arange(9.).reshape(3, 3),
columns=['a', 'b', 'c'],
index=['one', 'two', 'three'])
df_b = pd.DataFrame(np.arange(12.).reshape(4, 3),
columns=['a', 'e', 'c'],
index=['first', 'one', 'two', 'second'])
df_a
abcone0.01.02.0two3.04.05.0three6.07.08.0
df_b
aecfirst0.01.02.0one3.04.05.0two6.07.08.0second9.010.011.0
df_a + df_b
abcefirstNaNNaNNaNNaNone3.0NaN7.0NaNsecondNaNNaNNaNNaNthreeNaNNaNNaNNaNtwo9.0NaN13.0NaN
max(df['SibSp'] + df['Parch'])
10
df.describe()
PassengerIdSurvivedPclassAgeSibSpParchFarecount891.000000891.000000891.000000714.000000891.000000891.000000891.000000mean446.0000000.3838382.30864229.6991180.5230080.38159432.204208std257.3538420.4865920.83607114.5264971.1027430.80605749.693429min1.0000000.0000001.0000000.4200000.0000000.0000000.00000025%223.5000000.0000002.00000020.1250000.0000000.0000007.91040050%446.0000000.0000003.00000028.0000000.0000000.00000014.45420075%668.5000001.0000003.00000038.0000001.0000000.00000031.000000max891.0000001.0000003.00000080.0000008.0000006.000000512.329200
df["Fare"].describe()
count 891.000000
mean 32.204208
std 49.693429
min 0.000000
25% 7.910400
50% 14.454200
75% 31.000000
max 512.329200
Name: Fare, dtype: float64
df["Parch"].describe()
count 891.000000
mean 0.381594
std 0.806057
min 0.000000
25% 0.000000
50% 0.000000
75% 0.000000
max 6.000000
Name: Parch, dtype: float64
总结:本项目共分三个章节,本章第1节主要内容如何利用Pandas进行排序、算术计算以及计算描述函数describe()的使用,欢迎交流
第一章 end
Original: https://blog.csdn.net/weixin_45058606/article/details/121956262
Author: 一个游在的小鱼
Title: 从零开始数据分析Kaggle项目—泰坦尼克号(三)
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/752470/
转载文章受原作者版权保护。转载请注明原作者出处!