一、Dataframe的读取和保存
1.1 Dataframe导出csv
xlsx_file.to_csv('F:/XXX/XXX.csv', encoding="utf-8-sig",header=True)
1.2 Pandas读取xlsx
xlsx_file = pd.read_excel(xlsx_file_name, sheet_name='Sheet1')
1.3 Dataframe的创建
dataframe可以通过读取csv或者xlsx等方式创建,同时也可以通过数组创建
import pandas as pd
data_list = [[6,10,3],[1,5,4],[1,2,4],[1,15,24],[1,0,2],[3,7,9],[2,8,5]]
df = pd.DataFrame(data_list,columns=['A','B','C'])
df.index = ['G','H','I','J','K','L','M']
print(df)
二、Dataframe的操作
2.1 获取Dataframe和行数和列数
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(24).reshape(6,4), columns=['A', 'B', 'C', 'D'])
row_nums = df.shape[0]
col_nums = df.columns.size
print(row_nums)
print(col_nums)
2.2 Dataframe删除行、列
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(24).reshape(6,4), columns=['A', 'B', 'C', 'D'])
print(df)
df1 = df.drop(axis=0, index = 1, inplace=False)
print(df1)
df2 = df.drop(axis=0, index = [1,2,4], inplace=False)
print(df2)
df3 = df.drop(axis=1, columns = ['A','D'], inplace=False)
print(df3)
注意删除多行的时候要确保index存在,一种非常隐蔽的错误是:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.arange(12).reshape(3,4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.arange(12).reshape(3,4), columns=['A', 'B', 'C', 'D'])
new_df = pd.concat([df1,df2], ignore_index=False)
print(new_df)
df3 = new_df.drop(axis=0, index = 3, inplace=False)
2.3 Dataframe的排序
dataframe的排序有通过行列的名称进行排序,也有同行的数值或者列的数值进行排序。对于数值排序,采用sort_values函数。
import pandas as pd
data_list = [[6,10,3],[1,5,4],[1,2,4],[1,15,24],[1,0,36],[3,7,9],[2,8,5]]
df = pd.DataFrame(data_list,columns=['A','B','C'])
df.index = ['G','H','I','J','K','L','M']
df.sort_values(by='A',axis=0,ascending=False,inplace=True)
print(df)
df = pd.DataFrame(data_list,columns=['A','B','C'])
df.index = ['G','H','I','J','K','L','M']
df_data_order = df.sort_values(by=['A','B'],ascending=[True,True])
print(df_data_order)
很多时候,对于一些默认行号的dataframe,排序之后会把把行号打乱。这个时候可以通过reset_index函数重置索引。
import pandas as pd
data = [['a','3'],['b','1'],['c','2']]
df = pd.DataFrame(data)
df = df.sort_values(by = 1,axis = 0,ascending = False)
print(df)
df = df.reset_index(drop=True)
print(df)
2.4 Dataframe的拼接
Dataframe的拼接有几个函数:merge、concat等函数
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.arange(12).reshape(3,4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.arange(12).reshape(3,4), columns=['A', 'B', 'C', 'D'])
new_df = pd.concat([df1,df2], ignore_index=True)
print(new_df)
new_df = pd.concat([df1,df2], ignore_index=False)
print(new_df)
2.5 Dataframe数据筛选
import pandas as pd
data_list = [['拖动',10,3],[1,5,4],['拖动',2,4],[1,15,24],['滑动',0,2],[3,7,9],[2,8,5]]
df = pd.DataFrame(data_list,columns=['A','B','C'])
print(df)
df1 = df[~(df['A']=='拖动')]
df1 = df1.reset_index(drop=True)
print(df)
df1 = df[(df['A'].isin(['拖动','滑动']) == True)]
df1 = df1.reset_index(drop=True)
print(df1)
df = pd.DataFrame(data_list,columns=['A','B','C'])
df = df[['A','B']]
print(df)
对dataframe的字符串筛选也可以通过Dataframe的contain函数,这种方式可以允许子串的搜索,同时contain函数也支持正则表达式。
import pandas as pd
data_list = [['拖动',10,3],[1,5,4],['拖动',2,4],[1,15,24],['滑动',0,2],[3,7,9],[2,8,5]]
df = pd.DataFrame(data_list,columns=['A','B','C'])
print(df)
df=df[(df['A'].str.contains('动') == True)]
df = df.reset_index(drop=True)
print(df)
df = pd.DataFrame(data_list,columns=['A','B','C'])
parttern = r'.*?'
df=df[(df['A'].str.contains(parttern) == True)]
print(df)
2.6 Dataframe NaN处理
axis: default 0指行,1为列
how: {‘any’, ‘all’}, default ‘any’指带缺失值的所有行; ‘all’指清除全是缺失值的
thresh: int,保留含有int个非空值的行
subset: 对特定的列进行缺失值删除处理
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [np.nan, 1, 2], 'B': [10, np.nan, 10], 'C': [10, 25, 15]})
print(df)
df = df.dropna(axis=0, how='any')
print(df)
Original: https://blog.csdn.net/log_zhan/article/details/126424177
Author: log_zhan
Title: Python库使用笔记—Dataframe
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/674409/
转载文章受原作者版权保护。转载请注明原作者出处!