Python之DataFrame基础用法

引入库

import pandas as pd
import numpy as np

pandas官方文档:https://pandas.pydata.org/pandas-docs/stable

1.1 利用字典创建

data={"one":np.random.randn(4),"two":np.linspace(1,4,4),"three":['zhangsan','李四',999,0.1]}
df=pd.DataFrame(data,index=[1,2,3,4])
df

onetwothree11.0876811.0zhangsan2-1.7021432.0李四30.1535103.09994-0.2064734.00.1

  • 如果创建df时不指定索引,默认索引将是从0开时,步长为1的数组。
  • df的行、列可以是不同的数据类型,同行也可以有多种数据类型。
  • df创建完成后可以重新设置索引,通常用到3个函数: set_indexreset_indexreindex

df.set_index(['one'],drop=False)

onetwothreeone1.0876811.0876811.0zhangsan-1.702143-1.7021432.0李四0.1535100.1535103.0999-0.206473-0.2064734.00.1

df.index=['a','b','c','d']
df.reset_index(drop=True)

参数 drop默认值为False,意为将原来的索引做为数据列保留,如果设为True,原来的索引会直接删除。

1.2 利用数组创建

data=np.random.randn(6,4)
df=pd.DataFrame(data,columns=list('ABCD'),index=[1,2,'a','b','2006-10-1','第六行'])
df

ABCD1-1.4683700.148891-0.4633740.20106020.483324-0.518111-1.2384870.949822a-0.909020-0.111295-0.8333371.560756b-0.0349780.036142-1.6134372.0408242006-10-1-0.312716-2.481801-0.607386-0.763655第六行0.300200-0.6185870.1875570.416438

1.3 创建一个空DataFrame

pd.DataFrame(columns=('id','name','grade','class'))

2.1 按列读取

df[['A','B','D']]
df.loc[:,['A','B','D']]

ABD1-1.4683700.1488910.20106020.483324-0.5181110.949822a-0.909020-0.1112951.560756b-0.0349780.0361422.0408242006-10-1-0.312716-2.481801-0.763655第六行0.300200-0.6185870.416438

PS: df[‘A’]和 df[[‘A’]]都能读取第一列数据,但它们返回的数据结构不同:

  • type(df[‘A’]): pandas.core.series.Series
  • type(df[[‘A’]]): pandas.core.frame.DataFrame
df.iloc[:,2:]

CD1-0.4633740.2010602-1.2384870.949822a-0.8333371.560756b-1.6134372.0408242006-10-1-0.607386-0.763655第六行0.1875570.416438

2.2 按行读取

df.loc[['a','第六行']]

ABCDa-0.90902-0.111295-0.8333371.560756第六行0.30020-0.6185870.1875570.416438

df.iloc[1:3]

ABCD20.483324-0.518111-1.2384870.949822a-0.909020-0.111295-0.8333371.560756

2.3 按单元格读取


df.B['a']
-0.11129495036792952
  • 读取一个单元格: df.loc[row][col]df.loc[row,col]
  • 读取一行多列:
df.loc[row][[col1,col2]]
df.loc[row,[col1,col2]]
df.loc[row][firstCol:endCol]
df.loc[row,firstCol:endCol]
df.loc['a'][['B','C']]
df.loc['a','A':'C']
A   -0.909020
B   -0.111295
C   -0.833337
Name: a, dtype: float64
  • 读取多行一列:
df.loc[[row1,row2]][col]
df.loc[[row1,row2]].col
df.loc[[row1,row2],col]
df.loc[[1,'a']]['C']
df.loc[[1,'a']].D
df.loc[[1,'a'],'A']
1   -1.46837
a   -0.90902
Name: A, dtype: float64
  • 读取多行多列
df.loc[[row1,row2],[col1,col2]]
df.loc[[row1,row2]][[col1,col2]]
df.loc[[row1,row3],firstCol:endCol]
df.loc[[firstRow1:endRow3,firstCol:endCol]
print(df.loc[[1,'a'],['A','C']])
print(df.loc[[1,'a']][['A','C']])
df.loc[1:'a','A':'C']
         A         C
1 -1.46837 -0.463374
a -0.90902 -0.833337
         A         C
1 -1.46837 -0.463374
a -0.90902 -0.833337

ABC1-1.4683700.148891-0.46337420.483324-0.518111-1.238487a-0.909020-0.111295-0.833337

  • 读取一个单元格
df.iloc[rowNo].col
df.iloc[rowNo][col]
df.iloc[rowNo,colNo]

PS:df.iloc[rowNo,col]不支持。

print(df.iloc[1].B)
print(df.iloc[1]['B'])
df.iloc[1,1]
-0.5181113354542651
-0.5181113354542651
-0.5181113354542651
  • 读取一行多列
df.iloc[rowNo,firestColNo:endColNo]
df.iloc[rowNo][[col1,col2]]
df.iloc[rowNo][firesCol:endCol]

PS: df.iloc[rowNo,[col1,col2]]不支持。

print(df.iloc[1,1:])
df.iloc[1][['A','C']]
df.iloc[1]['A':'C']
B   -0.518111
C   -1.238487
D    0.949822
Name: 2, dtype: float64

A    0.483324
B   -0.518111
C   -1.238487
Name: 2, dtype: float64
  • 读取多行一列
df.iloc[[rowNo1,rowNo2],colNo]
df.iloc[firstRowNo:endRowNo,colNo]
df.iloc[[rowNo1,rowNo2]][col]
df.iloc[firstRowNo:endRowNo][col]
print(df.iloc[[0,2],1])
print(df.iloc[:2,1])
print(df.iloc[[0,2]]['B'])
df.iloc[0:2]['B']
1    0.148891
a   -0.111295
Name: B, dtype: float64
1    0.148891
2   -0.518111
Name: B, dtype: float64
1    0.148891
a   -0.111295
Name: B, dtype: float64
1    0.148891
2   -0.518111
Name: B, dtype: float64
  • 读取多行多列
df.iloc[firstRowNo:endRowNo,firstColNo:endColNo]
df.iloc[[RowNo1,RowNo2],[ColNo1,ColNo2]]
df.iloc[firstRowNo:endRowNo][[col1,col2]]
print(df.iloc[1:4,1:4])
print(df.iloc[[1,3],[1,3]])
df.iloc[1:4][['A','C']]
          B         C         D
2 -0.518111 -1.238487  0.949822
a -0.111295 -0.833337  1.560756
b  0.036142 -1.613437  2.040824
          B         D
2 -0.518111  0.949822
b  0.036142  2.040824

AC20.483324-1.238487a-0.909020-0.833337b-0.034978-1.613437

3.1 按列赋值

df.col=colList/colValue
df[col]=colList/colValue
eg:df.A=[1,2,3,4,5,6],df['A']=0

3.2 按行赋值

df.loc[row]=rowList/rowValue

3.3 给多行多列赋值

df.loc[[row1,row2],[col1,col2]]=value/valueList
df.iloc[[rowNo1,rowNo2],[colNo1,colNo2]]=value/valueList
df.iloc[[rowNo1,rowNo2]][[col1,col2]]=value/valueList

4.1 在任意位置插入

  • 插入一列
insert(ioc,column,value)
ioc:要插入的位置即列号
colunm:列名
value:值
df.insert(1,'E',[1,2,3,4,5,6])
df

AEBCD1-1.46837010.148891-0.4633740.20106020.4833242-0.518111-1.2384870.949822a-0.9090203-0.111295-0.8333371.560756b-0.03497840.036142-1.6134372.0408242006-10-1-0.3127165-2.481801-0.607386-0.763655第六行0.3002006-0.6185870.1875570.416438

  • 插入一行
row={'one':111,'two':222,'three':333}
df.loc[1]=row
df.iloc[1]=row

4.2 在末尾插入(表合并)

  • 按行合并(即 LEFT JOIN)
pd.concat([df,df1],axis=1)
  • 按列合并(即 UNION)
result = pd.concat([df,df2],axis=0)
OR
result = df.append(df2)
data1=np.random.randn(2,4)
df1=pd.DataFrame(data1,columns=list('EFGH'))
df1

EFGH00.257269-0.3147552.065818-1.24893611.609680-0.9421370.3850730.180102

pd.concat([df,df1],axis=1)

ABCDEFGH0NaNNaNNaNNaN0.602251-0.122802-0.064065-0.7471971-1.1780920.7899670.029976-0.3039410.3410690.1907930.073840-1.3950452-0.4075260.500577-2.187063-0.810440NaNNaNNaNNaN2006-10-1-0.7265482.500259-0.7712301.037939NaNNaNNaNNaNa0.2049471.284060-0.5018030.260510NaNNaNNaNNaNb0.5496871.2058480.2064982.123772NaNNaNNaNNaN第六行-0.882526-1.0448420.6271140.596011NaNNaNNaNNaN

data2=np.random.randn(2,5)
df2=pd.DataFrame(data2,columns=list('AEBCD'))
df2

AEBCD0-0.285998-2.167751-0.7600161.094251-0.9963601-0.3284050.248779-0.0235822.0352760.039617

del(dfo)
dfo = pd.concat([df,df2],axis=0)
dfo

AEBCD1-1.4683701.0000000.148891-0.4633740.20106020.4833242.000000-0.518111-1.2384870.949822a-0.9090203.000000-0.111295-0.8333371.560756b-0.0349784.0000000.036142-1.6134372.0408242006-10-1-0.3127165.000000-2.481801-0.607386-0.763655第六行0.3002006.000000-0.6185870.1875570.4164380-0.285998-2.167751-0.7600161.094251-0.9963601-0.3284050.248779-0.0235822.0352760.039617

df.append(df2)

AEBCD1-1.46837010.148891-0.4633740.20106020.4833242-0.518111-1.2384870.949822a-0.9090203-0.111295-0.8333371.560756b-0.03497840.036142-1.6134372.0408242006-10-1-0.3127165-2.481801-0.607386-0.763655第六行0.3002006-0.6185870.1875570.416438

5.1 删除行列

drop(labels, axis=0, level=None, inplace=False)
  • lables:要删除数据的标签
  • axis:0表示删除行,1表示删除列,默认0
  • inplace:是否在当前df中执行此操作,否则会创建临时df

5.2 去重

df.drop_duplicates(subset=['A','B'],keep='first',inplace=True)
  • subset: 输入要进行去重的列名,默认为None
  • keep: 可选参数有三个:’first’、 ‘last’、 False, 默认值 ‘first’。其中,
  • first表示: 保留第一次出现的重复行,删除后面的重复行。
  • last表示: 删除重复项,保留最后一次出现。
  • False表示: 删除所有重复项。
  • inplace:是否在当前df中执行此操作,否则会创建临时df

drop_duplicates删除重复行时会同步删除索引,建议使用 reset_index(drop = True)重置索引

5.3 更改列名

  • 方法一:使用rename函数更改列名
df2=df1.rename(columns={'name':'stu_name','class':'stu_class'},,inplace=True)

  • 方法二:直接更改全部的列名称
df1.columns=['id','stu_name','stu_class']
dfo.drop(['A','B'],axis=1,inplace=True)
dfo

ECD11.000000-0.4633740.20106022.000000-1.2384870.949822a3.000000-0.8333371.560756b4.000000-1.6134372.0408242006-10-15.000000-0.607386-0.763655第六行6.0000000.1875570.4164380-2.1677511.094251-0.99636010.2487792.0352760.039617

dfo.drop([1,2],axis=0)

AEBCDa-0.9090203.000000-0.111295-0.8333371.560756b-0.0349784.0000000.036142-1.6134372.0408242006-10-1-0.3127165.000000-2.481801-0.607386-0.763655第六行0.3002006.000000-0.6185870.1875570.4164380-0.285998-2.167751-0.7600161.094251-0.996360

<tr>
  <th>1</th>
  <td>0.248779</td>
  <td>2.035276</td>
  <td>0.039617</td>
</tr>
dfo.drop([1,2],axis=0)

AEBCDa-0.9090203.000000-0.111295-0.8333371.560756b-0.0349784.0000000.036142-1.6134372.0408242006-10-1-0.3127165.000000-2.481801-0.607386-0.763655第六行0.3002006.000000-0.6185870.1875570.4164380-0.285998-2.167751-0.7600161.094251-0.996360

Original: https://blog.csdn.net/doraemonjtr/article/details/125611147
Author: doraemonjtr
Title: Python之DataFrame基础用法

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/739131/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球