文章很长,高低要忍一下,如果忍不了,那就收藏吧,总会用到的
为了方便查找,先提供目录,一步定位!
- 如何使用列表和字典创建 Series
* - 使用列表创建 Series
- 使用 name 参数创建 Series
- 使用简写的列表创建 Series
- 使用字典创建 Series
- 如何使用 Numpy 函数创建 Series
- 如何获取 Series 的索引和值
- 如何在创建 Series 时指定索引
- 如何获取 Series 的大小和形状
- 如何获取 Series 开始或末尾几行数据
* - Head()
- Tail()
- Take()
- 使用切片获取 Series 子集
- 如何创建 DataFrame
- 如何设置 DataFrame 的索引和列信息
- 如何重命名 DataFrame 的列名称
- 如何根据 Pandas 列中的值从 DataFrame 中选择或过滤行
- 在 DataFrame 中使用”isin”过滤多行
- 迭代 DataFrame 的行和列
- 如何通过名称或索引删除 DataFrame 的列
- 向 DataFrame 中新增列
- 如何从 DataFrame 中获取列标题列表
- 如何随机生成 DataFrame
- 如何选择 DataFrame 的多个列
- 如何将字典转换为 DataFrame
- 使用 ioc 进行切片
- 检查 DataFrame 中是否是空的
- 在创建 DataFrame 时指定索引和列名称
- 使用 iloc 进行切片
- iloc 和 loc 的区别
- 使用时间索引创建空 DataFrame
- 如何改变 DataFrame 列的排序
- 检查 DataFrame 列的数据类型
- 更改 DataFrame 指定列的数据类型
- 如何将列的数据类型转换为 DateTime 类型
- 将 DataFrame 列从 floats 转为 ints
- 如何把 dates 列转换为 DateTime 类型
- 两个 DataFrame 相加
- 在 DataFrame 末尾添加额外的行
- 为指定索引添加新行
- 如何使用 for 循环添加行
- 在 DataFrame 顶部添加一行
- 如何向 DataFrame 中动态添加行
- 在任意位置插入行
- 使用时间戳索引向 DataFrame 中添加行
- 为不同的行填充缺失值
- append, concat 和 combine_first 示例
- 获取行和列的平均值
- 计算行和列的总和
- 连接两列
- 过滤包含某字符串的行
- 过滤索引中包含某字符串的行
- 使用 AND 运算符过滤包含特定字符串值的行
- 查找包含某字符串的所有行
- 如果行中的值包含字符串,则创建与字符串相等的另一列
- 计算 pandas group 中每组的行数
- 检查字符串是否在 DataFrme 中
- 从 DataFrame 列中获取唯一行值
- 计算 DataFrame 列的不同值
- 删除具有重复索引的行
- 删除某些列具有重复值的行
- 从 DataFrame 单元格中获取值
- 使用 DataFrame 中的条件索引获取单元格上的标量值
- 设置 DataFrame 的特定单元格值
- 从 DataFrame 行获取单元格值
- 用字典替换 DataFrame 列中的值
- 统计基于某一列的一列的数值
- 处理 DataFrame 中的缺失值
- 删除包含任何缺失数据的行
- 删除 DataFrame 中缺失数据的列
- 按降序对索引值进行排序
- 按降序对列进行排序
- 使用 rank 方法查找 DataFrame 中元素的排名
- 在多列上设置索引
- 确定 DataFrame 的周期索引和列
- 导入 CSV 指定特定索引
- 将 DataFrame 写入 csv
- 使用 Pandas 读取 csv 文件的特定列
- Pandas 获取 CSV 列的列表
- 找到列值最大的行
- 使用查询方法进行复杂条件选择
- 检查 Pandas 中是否存在列
- 为特定列从 DataFrame 中查找 n-smallest 和 n-largest 值
- 从 DataFrame 中查找所有列的最小值和最大值
- 在 DataFrame 中找到最小值和最大值所在的索引位置
- 计算 DataFrame Columns 的累积乘积和累积总和
- 汇总统计
- 查找 DataFrame 的均值、中值和众数
- 测量 DataFrame 列的方差和标准偏差
- 计算 DataFrame 列之间的协方差
- 计算 Pandas 中两个 DataFrame 对象之间的相关性
- 计算 DataFrame 列的每个单元格的百分比变化
- 在 Pandas 中向前和向后填充 DataFrame 列的缺失值
- 在 Pandas 中使用非分层索引使用 Stacking
- 使用分层索引对 Pandas 进行拆分
- Pandas 获取 HTML 页面上 table 数据
1如何使用列表和字典创建 Series
import pandas as pd
ser1 = pd.Series([1.5, 2.5, 3, 4.5, 5.0, 6])
print(ser1)
Output:
0 1.5
1 2.5
2 3.0
3 4.5
4 5.0
5 6.0
dtype: float64
import pandas as pd
ser2 = pd.Series(["India", "Canada", "Germany"], name="Countries")
print(ser2)
Output:
0 India
1 Canada
2 Germany
Name: Countries, dtype: object
import pandas as pd
ser3 = pd.Series(["A"]*4)
print(ser3)
Output:
0 A
1 A
2 A
3 A
dtype: object
import pandas as pd
ser4 = pd.Series({"India": "New Delhi",
"Japan": "Tokyo",
"UK": "London"})
print(ser4)
Output:
India New Delhi
Japan Tokyo
UK London
dtype: object
2如何使用 Numpy 函数创建 Series
import pandas as pd
import numpy as np
ser1 = pd.Series(np.linspace(1, 10, 5))
print(ser1)
ser2 = pd.Series(np.random.normal(size=5))
print(ser2)
Output:
0 1.00
1 3.25
2 5.50
3 7.75
4 10.00
dtype: float64
0 -1.694452
1 -1.570006
2 1.713794
3 0.338292
4 0.803511
dtype: float64
3如何获取 Series 的索引和值
import pandas as pd
import numpy as np
ser1 = pd.Series({"India": "New Delhi",
"Japan": "Tokyo",
"UK": "London"})
print(ser1.values)
print(ser1.index)
print("\n")
ser2 = pd.Series(np.random.normal(size=5))
print(ser2.index)
print(ser2.values)
Output:
['New Delhi' 'Tokyo' 'London']
Index(['India', 'Japan', 'UK'], dtype='object')
RangeIndex(start=0, stop=5, step=1)
[ 0.66265478 -0.72222211 0.3608642 1.40955436 1.3096732 ]
4如何在创建 Series 时指定索引
import pandas as pd
values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]
code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]
ser1 = pd.Series(values, index=code)
print(ser1)
Output:
IND India
CAN Canada
AUS Australia
JAP Japan
GER Germany
FRA France
dtype: object
5如何获取 Series 的大小和形状
import pandas as pd
values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]
code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]
ser1 = pd.Series(values, index=code)
print(len(ser1))
print(ser1.shape)
print(ser1.size)
Output:
6
(6,)
6
6如何获取 Series 开始或末尾几行数据
import pandas as pd
values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]
code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]
ser1 = pd.Series(values, index=code)
print("-----Head()-----")
print(ser1.head())
print("\n\n-----Head(2)-----")
print(ser1.head(2))
Output:
IND India
CAN Canada
dtype: object
import pandas as pd
values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]
code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]
ser1 = pd.Series(values, index=code)
print("-----Tail()-----")
print(ser1.tail())
print("\n\n-----Tail(2)-----")
print(ser1.tail(2))
Output:
GER Germany
FRA France
dtype: object
import pandas as pd
values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]
code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]
ser1 = pd.Series(values, index=code)
print("-----Take()-----")
print(ser1.take([2, 4, 5]))
Output:
DateOFBirth int64
State object
dtype: object
DateOFBirth State
Jane 1349720105 NY
Nick 1349806505 TX
Aaron 1349892905 FL
Penelope 1349979305 AL
Dean 1350065705 AK
Christina 1349792905 TX
Cornelia 1349730105 TX
DailyExp float64
State object
dtype: object
DailyExp State
Jane 75.70 NY
Nick 56.69 TX
Aaron 55.69 FL
Penelope 96.50 AL
Dean 84.90 AK
Christina 110.50 TX
Cornelia 58.90 TX
DateOfBirth object
State object
dtype: object
Age Date Of Join EmpCode Name Occupation
0 23 2018-01-25 Emp001 John Chemist
1 24 2018-01-26 Emp002 Doe Statistician
2 34 2018-01-26 Emp003 William Statistician
3 29 2018-02-26 Emp004 Spark Statistician
4 40 2018-03-16 Emp005 Mark Programmer
Name Occupation Date Of Join Age
Emp001 John Doe Chemist 2018-01-25 23
Emp002 William Spark Statistician 2018-01-26 24
Age Date Of Join EmpCode Name Occupation
0 23 2018-01-25 Emp002 John Chemist
1 24 2018-01-26 Emp003 Doe Statistician
2 34 2018-01-26 Emp004 William Statistician
Name Age
1 Rocky 21
2 Sunny 22
3 Mark 25
4 Taylor 28
Name Age
2014-05-01 18:47:05 Rocky 21
2014-05-02 18:47:05 Sunny 22
2014-05-03 18:47:05 Mark 25
A B C D
0 10.0 20 0.0 0.0
1 0.0 30 40.0 50.0
A B C D
0 10.0 20.0 0.0 0.0
1 0.0 30.0 40.0 50.0
42获取行和列的平均值
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [5, 5, 0, 0]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3'])
df['Mean Basket'] = df.mean(axis=1)
df.loc['Mean Fruit'] = df.mean()
print(df)
Output:
Apple Orange Banana Pear Mean Basket
Basket1 10.000000 20.0 30.0 40.000000 25.0
Basket2 7.000000 14.0 21.0 28.000000 17.5
Basket3 5.000000 5.0 0.0 0.000000 2.5
Mean Fruit 7.333333 13.0 17.0 22.666667 15.0
43计算行和列的总和
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [5, 5, 0, 0]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3'])
df['Sum Basket'] = df.sum(axis=1)
df.loc['Sum Fruit'] = df.sum()
print(df)
Output:
Apple Orange Banana Pear Sum Basket
Basket1 10 20 30 40 100
Basket2 7 14 21 28 70
Basket3 5 5 0 0 10
Sum Fruit 22 39 51 68 180
44连接两列
import pandas as pd
df = pd.DataFrame(columns=['Name', 'Age'])
df.loc[1, 'Name'] = 'Rocky'
df.loc[1, 'Age'] = 21
df.loc[2, 'Name'] = 'Sunny'
df.loc[2, 'Age'] = 22
df.loc[3, 'Name'] = 'Mark'
df.loc[3, 'Age'] = 25
df.loc[4, 'Name'] = 'Taylor'
df.loc[4, 'Age'] = 28
print('\n------------ BEFORE ----------------\n')
print(df)
df['Employee'] = df['Name'].map(str) + ' - ' + df['Age'].map(str)
df = df.reindex(['Employee'], axis=1)
print('\n------------ AFTER ----------------\n')
print(df)
Output:
Employee
1 Rocky - 21
2 Sunny - 22
3 Mark - 25
4 Taylor - 28
45过滤包含某字符串的行
import pandas as pd
df = pd.DataFrame({'DateOfBirth': ['1986-11-11', '1999-05-12', '1976-01-01',
'1986-06-01', '1983-06-04', '1990-03-07',
'1999-07-09'],
'State': ['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
},
index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean',
'Christina', 'Cornelia'])
print(df)
print("\n---- Filter with State contains TX ----\n")
df1 = df[df['State'].str.contains("TX")]
print(df1)
Output:
DateOfBirth State
Jane 1986-11-11 NY
Nick 1999-05-12 TX
Aaron 1976-01-01 FL
Penelope 1986-06-01 AL
Dean 1983-06-04 AK
Christina 1990-03-07 TX
Cornelia 1999-07-09 TX
DateOfBirth State
Jane 1986-11-11 NY
Pane 1999-05-12 TX
Frane 1983-06-04 AK
47使用 AND 运算符过滤包含特定字符串值的行
import pandas as pd
df = pd.DataFrame({'DateOfBirth': ['1986-11-11', '1999-05-12', '1976-01-01',
'1986-06-01', '1983-06-04', '1990-03-07',
'1999-07-09'],
'State': ['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
},
index=['Jane', 'Pane', 'Aaron', 'Penelope', 'Frane',
'Christina', 'Cornelia'])
print(df)
print("\n---- Filter DataFrame using & ----\n")
df.index = df.index.astype('str')
df1 = df[df.index.str.contains('ane') & df['State'].str.contains("TX")]
print(df1)
Output:
DateOfBirth State
Jane 1986-11-11 NY
Pane 1999-05-12 TX
Aaron 1976-01-01 FL
Penelope 1986-06-01 AL
Frane 1983-06-04 AK
Christina 1990-03-07 TX
Cornelia 1999-07-09 TX
DateOfBirth State
Jane 1986-11-11 NY
Pane 1999-05-12 TX
Frane 1983-06-04 AK
Christina 1990-03-07 TX
Cornelia 1999-07-09 TX
49如果行中的值包含字符串,则创建与字符串相等的另一列
import pandas as pd
import numpy as np
df = pd.DataFrame({
'EmpCode': ['Emp001', 'Emp002', 'Emp003', 'Emp004', 'Emp005'],
'Name': ['John', 'Doe', 'William', 'Spark', 'Mark'],
'Occupation': ['Chemist', 'Accountant', 'Statistician',
'Statistician', 'Programmer'],
'Date Of Join': ['2018-01-25', '2018-01-26', '2018-01-26', '2018-02-26',
'2018-03-16'],
'Age': [23, 24, 34, 29, 40]})
df['Department'] = pd.np.where(df.Occupation.str.contains("Chemist"), "Science",
pd.np.where(df.Occupation.str.contains("Statistician"), "Economics",
pd.np.where(df.Occupation.str.contains("Programmer"), "Computer", "General")))
print(df)
Output:
Age Date Of Join EmpCode Name Occupation Department
0 23 2018-01-25 Emp001 John Chemist Science
1 24 2018-01-26 Emp002 Doe Accountant General
2 34 2018-01-26 Emp003 William Statistician Economics
3 29 2018-02-26 Emp004 Spark Statistician Economics
4 40 2018-03-16 Emp005 Mark Programmer Computer
50计算 pandas group 中每组的行数
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [5, 5, 0, 0],
[6, 6, 6, 6], [8, 8, 8, 8], [5, 5, 0, 0]],
columns=['Apple', 'Orange', 'Rice', 'Oil'],
index=['Basket1', 'Basket2', 'Basket3',
'Basket4', 'Basket5', 'Basket6'])
print(df)
print("\n ----------------------------- \n")
print(df[['Apple', 'Orange', 'Rice', 'Oil']].
groupby(['Apple']).agg(['mean', 'count']))
Output:
Apple Orange Rice Oil
Basket1 10 20 30 40
Basket2 7 14 21 28
Basket3 5 5 0 0
Basket4 6 6 6 6
Basket5 8 8 8 8
Basket6 5 5 0 0
-----------------------------
Orange Rice Oil
mean count mean count mean count
Apple
5 5 2 0 2 0 2
6 6 1 6 1 6 1
7 14 1 21 1 28 1
8 8 1 8 1 8 1
10 20 1 30 1 40 1
51检查字符串是否在 DataFrme 中
import pandas as pd
df = pd.DataFrame({'DateOfBirth': ['1986-11-11', '1999-05-12', '1976-01-01',
'1986-06-01', '1983-06-04', '1990-03-07',
'1999-07-09'],
'State': ['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
},
index=['Jane', 'Pane', 'Aaron', 'Penelope', 'Frane',
'Christina', 'Cornelia'])
if df['State'].str.contains('TX').any():
print("TX is there")
Output:
TX is there
52从 DataFrame 列中获取唯一行值
import pandas as pd
df = pd.DataFrame({'State': ['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
},
index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean',
'Christina', 'Cornelia'])
print(df)
print("\n----------------\n")
print(df["State"].unique())
Output:
State
Jane NY
Nick TX
Aaron FL
Penelope AL
Dean AK
Christina TX
Cornelia TX
Age Height
Jane 30 120
Jane 40 162
Aaron 30 120
Penelope 40 120
Jaane 30 120
Nicky 30 72
Armour 20 120
Ponting 25 81
Apple Orange Banana Pear
Basket1 10 20.0 30.0 40.0
Basket2 7 14.0 21.0 28.0
Basket3 5 NaN NaN NaN
Apple Orange Banana Pear
Basket1 True True True True
Basket2 True True True True
Basket3 True False False False
63删除包含任何缺失数据的行
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [5,]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3'])
print("\n--------- DataFrame ---------\n")
print(df)
print("\n--------- Use of dropna() ---------\n")
print(df.dropna())
Output:
Apple Orange Banana Pear
Basket1 10 20.0 30.0 40.0
Basket2 7 14.0 21.0 28.0
64删除 DataFrame 中缺失数据的列
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [5,]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3'])
print("\n--------- DataFrame ---------\n")
print(df)
print("\n--------- Drop Columns) ---------\n")
print(df.dropna(1))
Output:
Apple
Basket1 10
Basket2 7
Basket3 5
65按降序对索引值进行排序
import pandas as pd
df = pd.DataFrame({'DateOfBirth': ['1986-11-11', '1999-05-12', '1976-01-01',
'1986-06-01', '1983-06-04', '1990-03-07',
'1999-07-09'],
'State': ['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
},
index=['Jane', 'Pane', 'Aaron', 'Penelope', 'Frane',
'Christina', 'Cornelia'])
print(df.sort_index(ascending=False))
Output:
DateOfBirth State
Penelope 1986-06-01 AL
Pane 1999-05-12 TX
Jane 1986-11-11 NY
Frane 1983-06-04 AK
Cornelia 1999-07-09 TX
Christina 1990-03-07 TX
Aaron 1976-01-01 FL
66按降序对列进行排序
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp001', 'Emp002', 'Emp003', 'Emp004', 'Emp005'],
'Name': ['John', 'Doe', 'William', 'Spark', 'Mark'],
'Occupation': ['Chemist', 'Statistician', 'Statistician',
'Statistician', 'Programmer'],
'Date Of Join': ['2018-01-25', '2018-01-26', '2018-01-26', '2018-02-26',
'2018-03-16'],
'Age': [23, 24, 34, 29, 40]})
print(employees.sort_index(axis=1, ascending=False))
Output:
Occupation Name EmpCode Date Of Join Age
0 Chemist John Emp001 2018-01-25 23
1 Statistician Doe Emp002 2018-01-26 24
2 Statistician William Emp003 2018-01-26 34
3 Statistician Spark Emp004 2018-02-26 29
4 Programmer Mark Emp005 2018-03-16 40
67使用 rank 方法查找 DataFrame 中元素的排名
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [5, 5, 0, 0]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3'])
print("\n--------- DataFrame Values--------\n")
print(df)
print("\n--------- DataFrame Values by Rank--------\n")
print(df.rank())
Output:
Apple Orange Banana Pear
Basket1 3.0 3.0 3.0 3.0
Basket2 2.0 2.0 2.0 2.0
Basket3 1.0 1.0 1.0 1.0
68在多列上设置索引
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp001', 'Emp002', 'Emp003', 'Emp004', 'Emp005'],
'Name': ['John', 'Doe', 'William', 'Spark', 'Mark'],
'Occupation': ['Chemist', 'Statistician', 'Statistician',
'Statistician', 'Programmer'],
'Date Of Join': ['2018-01-25', '2018-01-26', '2018-01-26', '2018-02-26',
'2018-03-16'],
'Age': [23, 24, 34, 29, 40]})
print("\n --------- Before Index ----------- \n")
print(employees)
print("\n --------- Multiple Indexing ----------- \n")
print(employees.set_index(['Occupation', 'Age']))
Output:
Date Of Join EmpCode Name
Occupation Age
Chemist 23 2018-01-25 Emp001 John
Statistician 24 2018-01-26 Emp002 Doe
34 2018-01-26 Emp003 William
29 2018-02-26 Emp004 Spark
Programmer 40 2018-03-16 Emp005 Mark
69确定 DataFrame 的周期索引和列
import pandas as pd
values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]
pidx = pd.period_range('2015-01-01', periods=6)
df = pd.DataFrame(values, index=pidx, columns=['Country'])
print(df)
Output:
Country
2015-01-01 India
2015-01-02 Canada
2015-01-03 Australia
2015-01-04 Japan
2015-01-05 Germany
2015-01-06 France
70导入 CSV 指定特定索引
import pandas as pd
df = pd.read_csv('test.csv', index_col="DateTime")
print(df)
Output:
Wheat Rice Oil
DateTime
10/10/2016 10.500 12.500 16.500
10/11/2016 11.250 12.750 17.150
10/12/2016 10.000 13.150 15.500
10/13/2016 12.000 14.500 16.100
10/14/2016 13.000 14.825 15.600
10/15/2016 13.075 15.465 15.315
10/16/2016 13.650 16.105 15.030
10/17/2016 14.225 16.745 14.745
10/18/2016 14.800 17.385 14.460
10/19/2016 15.375 18.025 14.175
71将 DataFrame 写入 csv
import pandas as pd
df = pd.DataFrame({'DateOfBirth': ['1986-11-11', '1999-05-12', '1976-01-01',
'1986-06-01', '1983-06-04', '1990-03-07',
'1999-07-09'],
'State': ['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
},
index=['Jane', 'Pane', 'Aaron', 'Penelope', 'Frane',
'Christina', 'Cornelia'])
df.to_csv('test.csv', encoding='utf-8', index=True)
Output:
检查本地文件
72使用 Pandas 读取 csv 文件的特定列
import pandas as pd
df = pd.read_csv("test.csv", usecols = ['Wheat','Oil'])
print(df)
73Pandas 获取 CSV 列的列表
Output:
['DateTime', 'Wheat', 'Rice', 'Oil']
74找到列值最大的行
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [55, 15, 8, 12]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3'])
print(df.ix[df['Apple'].idxmax()])
Output:
75使用查询方法进行复杂条件选择
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [55, 15, 8, 12]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3'])
print(df)
print("\n ----------- Filter data using query method ------------- \n")
df1 = df.ix[df.query('Apple > 50 & Orange ).index]
print(df1)
Output:
Apple Orange Banana Pear
Basket1 10 20 30 40
Basket2 7 14 21 28
Basket3 55 15 8 12
----------- Filter data using query method -------------
Apple Orange Banana Pear
Basket3 55 15 8 12
76检查 Pandas 中是否存在列
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [55, 15, 8, 12]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3'])
if 'Apple' in df.columns:
print("Yes")
else:
print("No")
if set(['Apple','Orange']).issubset(df.columns):
print("Yes")
else:
print("No")
77为特定列从 DataFrame 中查找 n-smallest 和 n-largest 值
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [55, 15, 8, 12],
[15, 14, 1, 8], [7, 1, 1, 8], [5, 4, 9, 2]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3', 'Basket4',
'Basket5', 'Basket6'])
print("\n----------- nsmallest -----------\n")
print(df.nsmallest(2, ['Apple']))
print("\n----------- nlargest -----------\n")
print(df.nlargest(2, ['Apple']))
Output:
Apple Orange Banana Pear
Basket3 55 15 8 12
Basket4 15 14 1 8
78从 DataFrame 中查找所有列的最小值和最大值
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [55, 15, 8, 12],
[15, 14, 1, 8], [7, 1, 1, 8], [5, 4, 9, 2]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3', 'Basket4',
'Basket5', 'Basket6'])
print("\n----------- Minimum -----------\n")
print(df[['Apple', 'Orange', 'Banana', 'Pear']].min())
print("\n----------- Maximum -----------\n")
print(df[['Apple', 'Orange', 'Banana', 'Pear']].max())
Output:
Apple 55
Orange 20
Banana 30
Pear 40
dtype: int64
79在 DataFrame 中找到最小值和最大值所在的索引位置
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [55, 15, 8, 12],
[15, 14, 1, 8], [7, 1, 1, 8], [5, 4, 9, 2]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3', 'Basket4',
'Basket5', 'Basket6'])
print("\n----------- Minimum -----------\n")
print(df[['Apple', 'Orange', 'Banana', 'Pear']].idxmin())
print("\n----------- Maximum -----------\n")
print(df[['Apple', 'Orange', 'Banana', 'Pear']].idxmax())
Output:
Apple Basket3
Orange Basket1
Banana Basket1
Pear Basket1
dtype: object
80计算 DataFrame Columns 的累积乘积和累积总和
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [55, 15, 8, 12],
[15, 14, 1, 8], [7, 1, 1, 8], [5, 4, 9, 2]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3', 'Basket4',
'Basket5', 'Basket6'])
print("\n----------- Cumulative Product -----------\n")
print(df[['Apple', 'Orange', 'Banana', 'Pear']].cumprod())
print("\n----------- Cumulative Sum -----------\n")
print(df[['Apple', 'Orange', 'Banana', 'Pear']].cumsum())
Output:
Apple Orange Banana Pear
Basket1 10 20 30 40
Basket2 17 34 51 68
Basket3 72 49 59 80
Basket4 87 63 60 88
Basket5 94 64 61 96
Basket6 99 68 70 98
81汇总统计
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [55, 15, 8, 12],
[15, 14, 1, 8], [7, 1, 1, 8], [5, 4, 9, 2]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3', 'Basket4',
'Basket5', 'Basket6'])
print("\n----------- Describe DataFrame -----------\n")
print(df.describe())
print("\n----------- Describe Column -----------\n")
print(df[['Apple']].describe())
Output:
Apple
count 6.000000
mean 16.500000
std 19.180719
min 5.000000
25% 7.000000
50% 8.500000
75% 13.750000
max 55.000000
82查找 DataFrame 的均值、中值和众数
import pandas as pd
df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [55, 15, 8, 12],
[15, 14, 1, 8], [7, 1, 1, 8], [5, 4, 9, 2]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3', 'Basket4',
'Basket5', 'Basket6'])
print("\n----------- Calculate Mean -----------\n")
print(df.mean())
print("\n----------- Calculate Median -----------\n")
print(df.median())
print("\n----------- Calculate Mode -----------\n")
print(df.mode())
Output:
Apple 8.5
Orange 14.0
Banana 8.5
Pear 10.0
dtype: float64
Apple 367.900000
Orange 52.666667
Banana 134.266667
Pear 211.866667
dtype: float64
Apple Orange Banana Pear
Apple 367.9 47.600000 -40.200000 -35.000000
Orange 47.6 52.666667 54.333333 77.866667
Banana -40.2 54.333333 134.266667 154.933333
Pear -35.0 77.866667 154.933333 211.866667
Apple Orange Banana Pear
Apple 1.000000 0.341959 -0.180874 -0.125364
Orange 0.341959 1.000000 0.646122 0.737144
Banana -0.180874 0.646122 1.000000 0.918606
Pear -0.125364 0.737144 0.918606 1.000000
Apple
Basket1 NaN
Basket2 -0.300000
Basket3 6.857143
Apple Orange Banana Pear
Basket1 10.0 30.0 40.0 NaN
Basket2 NaN NaN NaN NaN
Basket3 15.0 8.0 12.0 NaN
Basket4 15.0 14.0 1.0 8.0
Basket5 7.0 8.0 NaN NaN
Basket6 5.0 4.0 1.0 NaN
Apple Orange Banana Pear
Basket1 10.0 30.0 40.0 8.0
Basket2 15.0 8.0 12.0 8.0
Basket3 15.0 8.0 12.0 8.0
Basket4 15.0 14.0 1.0 8.0
Basket5 7.0 8.0 1.0 NaN
Basket6 5.0 4.0 1.0 NaN
88在 Pandas 中使用非分层索引使用 Stacking
import pandas as pd
df = pd.DataFrame([[10, 30, 40], [], [15, 8, 12],
[15, 14, 1, 8], [7, 8], [5, 4, 1]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3', 'Basket4',
'Basket5', 'Basket6'])
print("\n------ DataFrame-----\n")
print(df)
print("\n------ Stacking DataFrame -----\n")
print(df.stack(level=-1))
Output:
Basket1 Apple 10.0
Orange 30.0
Banana 40.0
Basket3 Apple 15.0
Orange 8.0
Banana 12.0
Basket4 Apple 15.0
Orange 14.0
Banana 1.0
Pear 8.0
Basket5 Apple 7.0
Orange 8.0
Basket6 Apple 5.0
Orange 4.0
Banana 1.0
dtype: float64
89使用分层索引对 Pandas 进行拆分
import pandas as pd
df = pd.DataFrame([[10, 30, 40], [], [15, 8, 12],
[15, 14, 1, 8], [7, 8], [5, 4, 1]],
columns=['Apple', 'Orange', 'Banana', 'Pear'],
index=['Basket1', 'Basket2', 'Basket3', 'Basket4',
'Basket5', 'Basket6'])
print("\n------ DataFrame-----\n")
print(df)
print("\n------ Unstacking DataFrame -----\n")
print(df.unstack(level=-1))
Output:
Apple Basket1 10.0
Basket2 NaN
Basket3 15.0
Basket4 15.0
Basket5 7.0
Basket6 5.0
Orange Basket1 30.0
Basket2 NaN
Basket3 8.0
Basket4 14.0
Basket5 8.0
Basket6 4.0
Banana Basket1 40.0
Basket2 NaN
Basket3 12.0
Basket4 1.0
Basket5 NaN
Basket6 1.0
Pear Basket1 NaN
Basket2 NaN
Basket3 NaN
Basket4 8.0
Basket5 NaN
Basket6 NaN
dtype: float64
90Pandas 获取 HTML 页面上 table 数据
import pandas as pd
df pd.read_html("url")
Original: https://blog.csdn.net/weixin_45263818/article/details/120952459
Author: 陌路啉
Title: 90个Pandas案例
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/739165/
转载文章受原作者版权保护。转载请注明原作者出处!