从零开始数据分析Kaggle项目—泰坦尼克号(三)

从零开始数据分析Kaggle项目—泰坦尼克号(三)
本节主要内容如何利用Pandas进行排序、算术计算以及函数describe()的使用。


import numpy as np
import pandas as pd
df = pd.read_csv("train.csv")
df.head()

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS1211Cumings, Mrs. John Bradley (Florence Briggs Th…female38.010PC 1759971.2833C85C2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S4503Allen, Mr. William Henrymale35.0003734508.0500NaNS


df1 = pd.DataFrame(np.arange(8).reshape((2, 4)),
                     index=['2', '1'],
                     columns=['a', 'b', 'c','d'])
df1

abcd2012314567


df1.sort_values(by='c', ascending=True)

abcd2012314567


df1.sort_index()

abcd1456720123


df1.sort_index(axis=1)

abcd2012314567


df1.sort_index(axis=1, ascending=False)

dcba2321017654


df1.sort_values(by=['a', 'c'], ascending=False)

abcd1456720123


df.sort_values(by=['Fare', 'Age'], ascending=False).head(5)

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked67968011Cardeza, Mr. Thomas Drake Martinezmale36.001PC 17755512.3292B51 B53 B55C25825911Ward, Miss. Annafemale35.000PC 17755512.3292NaNC73773811Lesurer, Mr. Gustave Jmale35.000PC 17755512.3292B101C43843901Fortune, Mr. Markmale64.01419950263.0000C23 C25 C27S34134211Fortune, Miss. Alice Elizabethfemale24.03219950263.0000C23 C25 C27S


df_a = pd.DataFrame(np.arange(9.).reshape(3, 3),
                     columns=['a', 'b', 'c'],
                     index=['one', 'two', 'three'])
df_b = pd.DataFrame(np.arange(12.).reshape(4, 3),
                     columns=['a', 'e', 'c'],
                     index=['first', 'one', 'two', 'second'])
df_a

abcone0.01.02.0two3.04.05.0three6.07.08.0

df_b

aecfirst0.01.02.0one3.04.05.0two6.07.08.0second9.010.011.0


df_a + df_b

abcefirstNaNNaNNaNNaNone3.0NaN7.0NaNsecondNaNNaNNaNNaNthreeNaNNaNNaNNaNtwo9.0NaN13.0NaN


max(df['SibSp'] + df['Parch'])
10

df.describe()

PassengerIdSurvivedPclassAgeSibSpParchFarecount891.000000891.000000891.000000714.000000891.000000891.000000891.000000mean446.0000000.3838382.30864229.6991180.5230080.38159432.204208std257.3538420.4865920.83607114.5264971.1027430.80605749.693429min1.0000000.0000001.0000000.4200000.0000000.0000000.00000025%223.5000000.0000002.00000020.1250000.0000000.0000007.91040050%446.0000000.0000003.00000028.0000000.0000000.00000014.45420075%668.5000001.0000003.00000038.0000001.0000000.00000031.000000max891.0000001.0000003.00000080.0000008.0000006.000000512.329200


df["Fare"].describe()
count    891.000000
mean      32.204208
std       49.693429
min        0.000000
25%        7.910400
50%       14.454200
75%       31.000000
max      512.329200
Name: Fare, dtype: float64

df["Parch"].describe()
count    891.000000
mean       0.381594
std        0.806057
min        0.000000
25%        0.000000
50%        0.000000
75%        0.000000
max        6.000000
Name: Parch, dtype: float64

总结:本项目共分三个章节,本章第1节主要内容如何利用Pandas进行排序、算术计算以及计算描述函数describe()的使用,欢迎交流
第一章 end

Original: https://blog.csdn.net/weixin_45058606/article/details/121956262
Author: 一个游在的小鱼
Title: 从零开始数据分析Kaggle项目—泰坦尼克号(三)

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/752470/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球