数据读取
可以读取excel,csv等:
df = pd.read_excel("Name.xlsx")
df = pd.read_csv("Name.csv")
在读入Excel的时候,加入 index_col=0
参数可以避免出现多余的一列
Dataframe 的创建
使用字典创建
dic = {'name1':[1,2,3],'name2':[4,5,6]}
df = pd.DataFrame(dic)
>>>
name1 name2
0 1 4
1 2 5
2 3 6
指定行列和数据来创建
df = pd.DataFrame(columns=["name2","name1"],index=['A','B','C'],
data={'name1':[1,2,3],'name2':[4,5,6]})
>>>
name2 name1
A 4 1
B 5 2
C 6 3
数据类型转换
df = df.astype(str)
取得dataframe其中的数据
假设有如下dataframe
df = pd.read_csv('Info.csv')
print(df)
>>>
name Age Sex
0 A 40 "male"
2 B 23 "male"
1 C 61 "female"
注意,这里的索引是0,2,1不是0,1,2。这对我们后面讲解iloc和loc有作用
条件获取
print(df[df['Sex']=='male'])
>>>
name Age Sex
0 A 40 "male"
2 B 23 "male"
loc,iloc
loc: pd.loc[name_of_index,name_of_column]
:也就是根据行和列的名字来定位
iloc: pd.iloc[rank_of_index,rank_of_column]
:也就是更加行和列的次序(第几行第几列)来定位
print(df.loc[1,'name'])
>>> C
print(df.iloc[1,0])
>>> B
切片(只能获取行)
格式: df[start:end:step]
print(df[0:3:2])
>>>
name Age Sex
0 A 40 male
1 C 61 female
dataframe中数据的删除
假设有如下dataframe
df = pd.read_csv('Info.csv')
print(df)
>>>
name Age Sex
0 A 40 "male"
2 B 23 "male"
1 C 61 "female"
`py
df.drop(columns=['Sex'], inplace=True)
print(df)
>>>
name Age
0 A 40
2 B 23
1 C 61
删除行
df.drop(index=[1], inplace=True)
print(df)
>>>
name Age Sex
0 A 40 male
2 B 23 male
由于传入index和column的参数是一个列表,因此可以批量删除
Dataframe的合并
假设有如下dataframe
df = pd.read_csv('Info.csv')
df2 = pd.read_csv('Info2.csv')
print(df)
print(df2)
>>>
name Age Sex
0 A 40 "male"
2 B 23 "male"
1 C 61 "female"
>>>
name Age Sex
0 AA 44 "female"
1 BB 31 "female"
concat函数
- 简单的增加行数(df2就是df后面要增加的行):
print(pd.concat([df2,df],ignore_index=True))
>>>
name Age Sex
0 AA 44 female
1 BB 31 female
2 A 40 male
3 B 23 male
4 C 61 female
(这里如果不ignore_index的话,索引就会变成0,1,0,2,1)
- 简单的增加列数(把df2放在df1右边然后拼上):
print(pd.concat([df,df2],axis=1))
>>>
name Age Sex name Age Sex
0 A 40 male AA 44.0 female
1 C 61 female BB 31.0 female
2 B 23 male NaN NaN NaN
其中如果发现拼上不能凑成一个长方形,就把空出来的地方补上NaN,就像上面一样
- dataframe可以直接添加列,不需要使用函数
df['Location']=['Asia', 'Asia', 'Asia']
print(df)
>>>
name Age Sex Location
0 A 40 male Asia
2 B 23 male Asia
1 C 61 female Asia
merge函数
- 按照某列的值来合并(比如按照身份证号码整合一群人的信息)
假设有如下dataframe:
print(df)
print(df2)
>>>
name Age Sex
0 A 40 male
2 B 23 male
1 C 61 female
Name Hobby Talent
0 A literature writing
1 B chemistry energetic
2 C sports strong will
按照名字合并:
print(pd.merge(df, df2, left_on='name', right_on='Name'))
>>>
name Age Sex Name Hobby Talent
0 A 40 male A literature writing
1 B 23 male B chemistry energetic
2 C 61 female C sports strong will
如果需要可以将多出来的Name行删掉
merge详解:
https://blog.csdn.net/weixin_39639919/article/details/110970746?utm_medium=distribute.pc_relevant.none-task-blog-2defaultbaidujs_baidulandingword~default-0.pc_relevant_paycolumn_v3&spm=1001.2101.3001.4242.1&utm_relevant_index=3
Dataframe缺失值的填充
; Dataframe排序
假设有如下dataframe
df = pd.read_csv('Info.csv')
print(df)
>>>
name Age Sex
0 A 40 "male"
2 B 23 "male"
1 C 61 "female"
依照年龄排序
df.sort_values(by='Age', ascending=True, inplace=True)
print(df)
>>>
name Age Sex
2 B 23 male
0 A 40 male
1 C 61 female
其中, ascending=True
是升序,否则是降序
dataframe apply
df['ExtraScore'] = df['Nationality'].apply(lambda x : 5 if x != '汉' else 0)
查看dataframe的大致情况
; 可视化
matplotlib库
导入模块: import matplotlib.pyplot as plt
基本操作
创建画布: fig = plt.figure()
添加子图: axes = fig.add_subplot(a, b, c)
:定义fig中一共有a行b列个子图,axes是这些图中的第c个。画统计图需要在子图里面进行。如果只想画一个图,就可以让a,b,c都等于1
给子图添加横纵坐标标题: axes.set_xlabel('x'),axes.set_ylabel('y')
给子图添加标题: axes.set_title('title)
显示画布(必须要加,不然显示不出来): plt.show()
使用条件判定
单变量:直方图,散点图,饼图:x为变量,y为值的分布
双变量:箱线图,散点图:x为变量,y为变量
多变量:散点图,堆叠图:x为变量,y为变量,颜色,大小等也可以表示变量
添加线图
axes.plot(X_data, Y_data)
添加直方图
axes.hist(data,bins=a)
,也就是将data分成a段
散点图
axes.scatter(x=data1,y=data2,s=size,c=color,alpha=0.5)
:以data1为横轴,data2为纵轴,点的大小为size,颜色为color,透明度为0.5画散点图
箱线图
axes.boxplot([data1,data2],labels=['label1','label2'])
:将数据分成data1和data2两组,分别赋予label1和label2这两个标签名
饼图
pies.pie(x=, labels=, autopct=,counterclock=)
例子
import pandas as pd
import matplotlib.pyplot as plt
tips = pd.read_csv("seaborndata/tips.csv")
fig = plt.figure(figsize=(14, 14))
axes1 = fig.add_subplot(3, 2, 1)
axes1.hist(tips['total_bill'], bins=10)
axes1.set_title("Histogram of Total Bill")
axes1.set_xlabel("Total Bill")
axes1.set_ylabel("Frequency")
scatter_plot = fig.add_subplot(3, 2, 2)
scatter_plot.scatter(tips['total_bill'], tips['tip'])
scatter_plot.set_title("Scatterplot of Total Bill VS Tip")
scatter_plot.set_xlabel("Total Bill")
scatter_plot.set_ylabel("Tip")
box_plot = fig.add_subplot(3, 2, 3)
box_plot.boxplot([tips[tips['sex'] == 'Female']['tip'], tips[tips['sex'] == 'Male']['tip']], labels=['Female', 'Male'])
box_plot.set_xlabel('Sex')
box_plot.set_ylabel('Tip')
box_plot.set_title("Boxplot of Tips by Sex")
def sexcode(sex):
if sex == 'Female':
return 2
return 1
tips['sex_color'] = tips['sex'].apply(sexcode)
scatter_plot_color = fig.add_subplot(3, 2, 4)
scatter_plot_color.scatter(x=tips['total_bill'], y=tips['tip'], s=tips['size']*10, c=tips['sex_color'], alpha=0.5)
scatter_plot_color.set_title('Total Bill with color')
scatter_plot_color.set_xlabel("Total Bill")
scatter_plot_color.set_ylabel("Tip")
pies = fig.add_subplot(3, 2, 5)
pies.pie(x=[1, 23, 3.4, 14.4], labels=['sampleA', 'sampleB', 'sampleC', 'sampleD'], autopct='%1.1f%%',counterclock=False)
fig.show()
Original: https://blog.csdn.net/weixin_43907802/article/details/122426927
Author: AndrewMe8211
Title: python pandas笔记
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/738563/
转载文章受原作者版权保护。转载请注明原作者出处!