Pandas是一个强大的分析结构化数据的工具集,基于NumPy构建,提供了
,它是使Python成为强大而高效的数据分析环境的重要因素之一
(1) 一个强大的分析和操作大型结构化数据集所需的工具集
(2) 基础是NumPy,提供了高性能矩阵的运算
(3) 提供了大量能够快速便捷地处理数据的函数和方法
(4)应用于数据挖掘,数据分析
(5)提供数据清洗功能
10.1 层次化索引
import numpy as np
import pandas as pd
data = pd.Series(np.random.randn(9),
index=[['a','a','a','b','b','c','c','d','d'],
[1,2,3,3,1,2,2,2,3]])
data['b':'c']
data.loc[['b','d']]
data.loc[:,2]
frame = pd.DataFrame({'a':range(7),'b':range(7,0,-1),
'c':['one','one','one','two','two','two','two'],
'd':[0,1,2,0,1,2,3]})
frame
frame2 = frame.set_index(['c','d'])
frame2
frame.set_index(['c','d'],drop=False)
frame2.reset_index()
10.2 数据连接
pd.merge(left,right,how=’inner’,on=None,left_on=None,right_on=None):根据单个或多个键将不同DataFrame的行连接起来,类似数据库的连接操作。
- left:合并时左边的DataFrame
- right:合并时右边的DataFrame
- how:合并的方式默认’inner’,’outer’,’left’,’right’
- on:需要合并的列名,必须两边都有的列名,并以left和right中的列名的交集作为连接键
- left_on:left DataFrame中用作连接键的列
- right_on:right DataFrame中用作连接键的列
left = pd.DataFrame({'key':['K0','K1','K2','K3'],
'A':['A0','A1','A2','A3'],
'B':['B0','B1','B2','B3']})
right = pd.DataFrame({'key':['K0','K1','K2','K3'],
'C':['C0','C1','C2','C3'],
'D':['D0','D1','D2','D3']})
left
right
pd.merge(left,right)
pd.merge(left,right,on='key')
df_obj1 = pd.DataFrame({'key':['b','b','a','c','a','a','b'],
'data':np.random.randint(0,10,7)})
df_obj2 = pd.DataFrame({'key':['a','b','d'],
'data':np.random.randint(0,10,3)})
print(pd.merge(df_obj1,df_obj2,on='key',suffixes=('_left','_right')))
df_obj1 = pd.DataFrame({'key':['b','b','a','c','a','a','b'],
'data1':np.random.randint(0,10,7)})
df_obj2 = pd.DataFrame({'data2':np.random.randint(0,10,3)},index=['a','b','d'])
print(pd.merge(df_obj1,df_obj2,left_on='key',right_index=True))
left2 = pd.DataFrame([[1.,2.],[3.,4.],[5.,6.]],index = ['a','c','e'],columns=['语文','数学'])
right2 = pd.DataFrame([[7.,8.],[9.,10.],[11.,12.],[13.,14.]],index=['b','c','d','e'],columns=['英语','综合'])
left2.join(right2,how='outer')
沿着轴方向将多个对象合并到一起
arr1 = np.random.randint(0,10,(3,4))
arr2 = np.random.randint(0,10,(3,4))
print(np.concatenate([arr1,arr2]))
print(np.concatenate([arr1,arr2],axis=1))
df1 = pd.DataFrame(np.arange(6).reshape(3,2),index=list('abc'),columns=['one','two'])
df2 = pd.DataFrame(np.arange(4).reshape(2,2),index=list('ac'),columns=['one','two'])
pd.concat([df1,df2])
pd.concat([df1,df2],axis=1)
10.3 重塑
(1) 将列索引旋转为行索引,完成层级索引
(2)DataFrame -> Series
data = pd.DataFrame(np.arange(6).reshape((2,3)),
index = pd.Index(['老王','小刘'],name='姓名'),
columns = pd.Index(['语文','数学','英语'],name='科目'))
data
r = data.stack()
print(r)
print(type(r))
(1)将层级索引展开
(2)Series -> DataFrame
(3)默认操作内层索引(即level=-1),可通过设置level指定操作索引的级别
r.unstack()
r.unstack(level = '姓名')
a1 = pd.Series(np.arange(4),index=list('abcd'))
a2 = pd.Series([4,5,6],index=list('cde'))
s1 = pd.concat([a1,a2],keys=['data1','data2'])
type(s1)
s1.unstack()
s1.unstack().stack()
10.4 轴向旋转
df3 = pd.DataFrame({'date':['2018-11-22','2018-11-22','2018-11-23','2018-11-23','2018-11-24'],
'class':['a','b','b','c','c'],
'values':[5,3,2,6,1]},columns=['date','class','values'])
df3
df3.pivot('date','class','values')
df3.set_index(['date','class']).unstack('class')
Original: https://blog.csdn.net/zkx990121/article/details/121741194
Author: 刚入门的小仙女
Title: 3-10 Pandas的数据规整
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/752679/
转载文章受原作者版权保护。转载请注明原作者出处!