Pandas常见方法（1）-pandas索引重建、按轴删除条目、选择与过滤、自动对齐与函数处理、统计运算和排序

2023年8月22日上午9:17 • Python • 阅读 75

说明：本blog基于python3， pandas 1.3.5版本

本文主要介绍pandas所有常见基础用法，包括 索引重建、按轴删除条目、选择与过滤、简单运算自动对齐与函数处理、统计运算和排序，共5个部分。并附有代码实例。
【注：本文所有部分根据pandas中的基础数据结构进行分类讲解，Series 和 DataFrame】

公式1： Series.reindex(新索引列表，method = “ffill”/”bfill”)
公式2： DataFrame.reindex(index = 新行索引列表，method = “ffill”/”bfill”, columns = 新列索引列表)
【注1：重建的是的copy，而不是”视图”】
【注2：如果重建的索引值在原Series或DataFrame没有对应的值，用Nan值补全】

我们分别新建一个Series和DataFrame，然后重建索引

代码如下

import pandas as pd

a,b = pd.Series([1,2,3,4]), pd.DataFrame([[1,2,3,4],[5,6,7,8]])
print(("创建的Series：\n{}").format(a))
print(("创建的DataFrame：\n{}").format(b))

a = a.reindex([1,3,5,7])
b = b.reindex(index = [1,"b"], columns = [1,3,5,7])
print(("\n重建后的Series：\n{}").format(a))
print(("重建后DataFrame：\n{}").format(b))

结果如下

创建的Series：
0    1
1    2
2    3
3    4
dtype: int64
创建的DataFrame：
   0  1  2  3
0  1  2  3  4
1  5  6  7  8

重建后的Series：
1    2.0
3    4.0
5    NaN
7    NaN
dtype: float64
重建后DataFrame：
     1    3   5   7
1  6.0  8.0 NaN NaN
b  NaN  NaN NaN NaN

当然，我们可以使用method参数对Nan值做前向填充/后向填充，这里不再赘述

另外，我们也可以使用以下三种方式按索引出现的先后顺序 重新命名索引，而 不是重建索引； 重新命名索引不存在索引不存在的情况！！！

Series.index = 新索引数组
DataFrame.index = 新行索引数组
DataFrame.columns= 新列索引数组

公式1： Series.drop(删除索引列表, inplace = True)
公式2： DataFrame.drop(删除行索引列表，axis = 0, inplace = True)
公式3： DataFrame.drop(删除列索引列表，axis = 1, inplace = True)，
【注：只要inplace参数为True，则返回Series或DataFrame的”视图”】

代码如下

import pandas as pd

a,b = pd.Series([1,2,3,4]), pd.DataFrame([[1,2,3,4],[5,6,7,8]])
print(("创建的Series：\n{}").format(a))
print(("创建的DataFrame：\n{}").format(b))
a.drop([2], inplace = True)
print(("\ndrop后的Series：\n{}").format(a))
b.drop(1, axis = 0, inplace = True)
print(("\ndrop行索引后的DataFrame：\n{}").format(b))
b.drop([2,3], axis = 1, inplace = True)
print(("\ndrop列索引后的DataFrame：\n{}").format(b))

结果如下

创建的Series：
0    1
1    2
2    3
3    4
dtype: int64
创建的DataFrame：
   0  1  2  3
0  1  2  3  4
1  5  6  7  8

drop后的Series：
0    1
1    2
3    4
dtype: int64

drop行索引后的DataFrame：
   0  1  2  3
0  1  2  3  4

drop列索引后的DataFrame：
   0  1
0  1  2

选择就是切片方法，主要有位置切片和索引切片两种方法
位置切片：

公式1：新变量=Series[位置值/索引]，返回该位置元素
公式2：新变量=Series[位置值1/索引1：位置2/索引2]，返回Series类型数据
公式3：新变量=DataFrame[行位置1: 行位置2]，返回行1到行2的数据
公式4：新变量=DataFrame[[列索引1，列索引2]]，返回列索引1和列索引2的数据

[注：使用公式4时，不能进行连续切片，即不能使用列索引1：列索引2的形式进行切片；否则，报错！！！]

代码如下

import pandas as pd

a,b = pd.Series([1,2,3,4], index = ["a","b","c","d"]), pd.DataFrame([[1,2,3,4],[5,6,7,8]], index = ["a","b"])
print(("创建的Series：\n{}").format(a))
print(("创建的DataFrame：\n{}").format(b))
a_0 = a[0]
print(("取出Series的第一行数据：\n{}").format(a_0))
a_1_3 = a["b":"d"]
print(("取出Series的第二行到第四行数据：\n{}").format(a_1_3))
b_0 = b[:1]
print(("取出DataFrame的第一行数据：\n{}").format(b_0))
b_1_3 = b[[1,3]]
print(("取出DataFrame的第一列和第三列数据：\n{}").format(b_1_3))

结果如下

创建的Series：
a    1
b    2
c    3
d    4
dtype: int64
创建的DataFrame：
   0  1  2  3
a  1  2  3  4
b  5  6  7  8
取出Series的第一行数据：
1
取出Series的第二行到第四行数据：
b    2
c    3
d    4
dtype: int64
取出DataFrame的第一行数据：
   0  1  2  3
a  1  2  3  4
取出DataFrame的第一列和第三列数据：
   1  3
a  2  4
b  6  8

正如我们的预期~
当然，我们还可以使用loc和iloc方法进行行/列的切片
公式5：DataFrame.loc[行索引1：行索引2，列索引1：列索引2]
公式6：DataFrame.loc[行位置1：行位置2，列位置1：列位置2]

代码如下，

import pandas as pd

c = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,0,11,12]], index = ["a","b","c"],columns = ["one","two","three","four"])
print(("创建的DataFrame：\n{}").format(c))
c_0 = c.loc["a":"b","two":"four"]
print(("loc切片后的DataFrame：\n{}").format(c_0))
c_1 = c.iloc[:2,3:]
print(("iloc切片后的DataFrame：\n{}").format(c_1))

结果如下

创建的DataFrame：
   one  two  three  four
a    1    2      3     4
b    5    6      7     8
c    9    0     11    12
loc切片后的DataFrame：
   two  three  four
a    2      3     4
b    6      7     8
iloc切片后的DataFrame：
   four
a     4
b     8

接下来，我们介绍过滤方法，
公式1：新变量 = Series[Series数据类型的条件]，新变量为原Series的copy
公式2：新变量 = DataFrame[DataFrame数据的列条件]，新变量为原DataFrame的copy
代码如下

import pandas as pd

a, c = pd.Series([1,2,3]),pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,0,11,12]], index = ["a","b","c"],columns = ["one","two","three","four"])
print(("创建的Series：\n{}").format(a))
print(("创建的DataFrame：\n{}").format(c))
a = a[a<2]
c = c[c["one"]>5]
print(("过滤后的Series：\n{}").format(a))
print(("过滤后的DataFrame：\n{}").format(c))

结果如下，

创建的Series：
0    1
1    2
2    3
dtype: int64
创建的DataFrame：
   one  two  three  four
a    1    2      3     4
b    5    6      7     8
c    9    0     11    12
过滤后的Series：
0    1
dtype: int64
过滤后的DataFrame：
   one  two  three  four
c    9    0     11    12

两个DataFrame类型或Series类型的数据如果直接相加减，是按着索引方式自动对齐；如果某些索引只在一个数据中存在，则最终输出的数据该索引处的值为Nan

比如，我们创建两个DataFrame进行相加

import pandas as pd

a, c = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,0,11,12]], index = [0,"b","c"]), pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,0,11,12]])
print(a)
print(c)

d = a+c
print(("两个DataFrame相加：\n{}").format(d))

结果如下

0  1   2   3
0  1  2   3   4
b  5  6   7   8
c  9  0  11  12
   0  1   2   3
0  1  2   3   4
1  5  6   7   8
2  9  0  11  12
两个DataFrame相加：
     0    1    2    3
0  2.0  4.0  6.0  8.0
1  NaN  NaN  NaN  NaN
2  NaN  NaN  NaN  NaN
b  NaN  NaN  NaN  NaN
c  NaN  NaN  NaN  NaN

但我们可以使用add（加），sub（减），div（除），floordiv（整除），mul（乘），pow（幂次运算）中的fill_value参数，提前对做相对运算的且在某个值为Nan的索引赋值为fill_value设定的值

比如我们对以上两个DataFrame依然做加法，但提前赋值Nan值为0
代码如下，

import pandas as pd

a, c = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,0,11,12]], index = [0,"b","c"]), pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,0,11,12]])
print(a)
print(c)

d = a.add(c, fill_value = 0)
print(("fill_value后，两个DataFrame相加：\n{}").format(d))

结果如下，

   0  1   2   3
0  1  2   3   4
b  5  6   7   8
c  9  0  11  12
   0  1   2   3
0  1  2   3   4
1  5  6   7   8
2  9  0  11  12
fill_value后，两个DataFrame相加：
     0    1     2     3
0  2.0  4.0   6.0   8.0
1  5.0  6.0   7.0   8.0
2  9.0  0.0  11.0  12.0
b  5.0  6.0   7.0   8.0
c  9.0  0.0  11.0  12.0

当然，我们也可以对以上两个DataFrame做乘法

import pandas as pd

a, c = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,0,11,12]], index = [0,"b","c"]), pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,0,11,12]])
print(a)
print(c)

d = a.mul(c, fill_value = 0)
print(("fill_value后，两个DataFrame相乘：\n{}").format(d))

结果如下

   0  1   2   3
0  1  2   3   4
b  5  6   7   8
c  9  0  11  12
   0  1   2   3
0  1  2   3   4
1  5  6   7   8
2  9  0  11  12
fill_value后，两个DataFrame相乘：
     0    1    2     3
0  1.0  4.0  9.0  16.0
1  0.0  0.0  0.0   0.0
2  0.0  0.0  0.0   0.0
b  0.0  0.0  0.0   0.0
c  0.0  0.0  0.0   0.0

函数处理就是利用DataFrame.apply(函数, axis =”columns”/”index”)
当然axis参数也可用0或1表示，0表示index，1表示columns

import pandas as pd
a = pd.DataFrame([[1,2,3],[5,6,7]], index = ["a","c"],columns = [11,22,33])
print(a)

f1 = lambda x:x**2 + 3
a1 = a.apply(f1, axis = "columns")

f2 = lambda x:max(x) - 1
a2 = a.apply(f2, axis = "index")
print("\n")
print(a1)
print("\n")
print(a2)

结果如下，

   11  22  33
a   1   2   3
c   5   6   7

   11  22  33
a   4   7  12
c  28  39  52

11    4
22    5
33    6
dtype: int64

另外，numpy对全元素运算方法对dataFrame数组依然适用

import numpy as np
a3 = np.power(a,3)
print(a3)

结果如下，

    11   22   33
a    1    8   27
c  125  216  343

统计运算汇总就使用describe方法；
判断DataFrame某列/某行元素是否在某个列表中，使用isin方法；
判断DataFrame某列/某行不同元素的个数，用value_counts() 方法
排序使用sort_values(by = 排名列索引的列表, ascending = False/True)

我们分别使用isin，value_counts和sort_values方法
代码如下，

import pandas as pd
a = pd.DataFrame([[1,2,3],[4,5,6],[1,2,6]], columns = [11,22,33])

b = a.isin([1,3,6,8])
print(("idenfy whether the element is in target list by [1,3,6,8] :\n{}").format(b))

c = a[22].value_counts()
print(("count the number of different element in 22 column is \n{}").format(c))

d = a.sort_values(by = [11,33])
print(("sort element in ascending by 11 and 33 column to form a new dataFrame is \n{}").format(d))

结果如下，

idenfy whether the element is in target list by [1,3,6,8] :
      11     22    33
0   True  False  True
1  False  False  True
2   True  False  True
count the number of different element in 22 column is
2    2
5    1
Name: 22, dtype: int64
sort element in ascending by 11 and 33 column to form a new dataFrame is
   11  22  33
0   1   2   3
2   1   2   6
1   4   5   6

写在最后，DataFrame和Series的数据的基本数据处理在本文中都已涵盖，希望对你的学习有帮助

Original: https://blog.csdn.net/dylan_young/article/details/122397631
Author: Efred.D
Title: Pandas常见方法（1）-pandas索引重建、按轴删除条目、选择与过滤、自动对齐与函数处理、统计运算和排序

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/756521/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

论文复现丨基于ModelArts实现Text2SQL

import os import argparse import shutil import sqlite3 import time import tqdm import torc…

Python 2023年10月29日
0023
pandas第五章 -变形–长宽表的变形函数、索引的变形

一、长宽表的变形定义什么是长表？什么是宽表？这个概念是对于某一个特征而言的。例如：一个表中把性别存储在某一个列中，那么它就是关于性别的长表；如果把性别作为列名，列中的元素是某一其…

Python 2023年8月22日
0044
pandas中dropna函数_Pandas简易入门（二）

目录：处理缺失数据制作透视图删除含空数据的行和列多行索引使用apply函数本节要处理的数据来自于泰坦尼克号的生存者名单，它的数据如下 pclass,survived,n…

Python 2023年8月8日
0050
移动平均

移动平均一、移动平均的主要分类 * 1. 简单移动平均 2. 加权移动平均 3. 指数移动平均 4. 分形自适应移动平均 5. 赫尔移动平均二、移动平均使用时的注意点： * 1…

Python 2023年8月20日
0044
手把手教你使用LabVIEW OpenCV dnn实现物体识别（Object Detection）含源码

今天和大家一起分享如何使用LabVIEW调用pb模型实现物体识别，本博客中使用的智能工具包可到主页置顶博客 1、物体识别的概念物体识别也称目标检测，目标检测所要解决的问题是目标…

Python 2023年10月29日
0025
【Unity 3D 从入门到实践】Unity 3D 预制体

目录一，预制体介绍二，创建预制体三，实例化预制体一，预制体介绍预制体是 Unity 3D 提供的保存游戏对象组件和属性的方法，通过预制体可以快速的实例化挂载不同组件的游戏…

Python 2023年10月9日
0064
女同桌找我要表情包，还好我会Python，分分钟给她下载几十个G…

Original: https://www.cnblogs.com/tuixiulaozhou/p/16723142.htmlAuthor: 退休的老周Title: 女同桌找我要表…

Python 2023年6月9日
0096
Pandas数据显示不全？快来了解这些设置技巧！ ⛵

💡 作者：韩信子@ShowMeAI📘 数据分析实战系列：http://www.showmeai.tech/tutorials/40📘 本文地址：http://www.showmea…

Python 2023年9月29日
0030
pandas csv转json_十分钟学习pandas！ pandas常用操作总结！

学习Python, 当然少不了pandas，pandas是python数据科学中的必备工具，熟练使用pandas是从sql boy/girl 跨越到一名优秀的数据分析师傅的必备技能…

Python 2023年8月8日
0034
Python 学习笔记（七）–socket

1.网络七层模型及主要协议 2.TCP的”三次握手”和四次挥手三次握手 Step1：首先客户端向服务器端发送一段TCP报文; Step 2：服务器端接收到…

Python 2023年5月25日
0079
pyplot.plot() 参数

plot函数一般的调用形式 #单条线： plot(x, y, [fmt], data=None, **kwargs) #多条线 plot(x, y, [fmt], x2, y2, …

Python 2023年9月5日
0071
Python | Pandas | 不完全总结

本文对 Pandas 的使用进行不完全总结 1。 Updated: 2023 / 08 / 05 Python | Pandas | 不完全总结数据类型 * 信息查询类型转换 …

Python 2023年8月7日
0050
Pandas

a one-dimensional labeled array capable of holding any data type (integers, strings, float…

Python 2023年8月8日
0041
Python基础

文章目录 numpy&pandas基础 * ndim、shape、dtype、astype的用法数组拼接 pandas数据结构series和dataFrame np.c_…

Python 2023年8月25日
0072
关于DEJA_VU3D – Cesium功能集专栏说明

博主简介博主90后专业GIS行业开发人员，一直从事GIS相关工作5年左右，主要涉及三维和地图可视化等内容。工作中难免要接触到相关开发框架，对Cesium，Three.js，ope…

Python 2023年11月5日
0045
[ansible]建立ssh互信

使用ansible手动建立ssh互信和批量建立ssh互信创建密钥创建基于rsa算法的密钥，也可以创建ed25519算法的密钥，性能比rsa高一般直接回车即可 ssh-keyg…

Python 2023年6月12日
0064

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Pandas常见方法（1）-pandas索引重建、按轴删除条目、选择与过滤、自动对齐与函数处理、统计运算和排序

大家都在看