pandas10minnutes_中英对照02

2023年8月6日下午3:44 • Python • 阅读 42

本次主要讲以下章节内容：
4.Missing data 缺失数据
5.Operations 操作
6.Merge 合并

4.Missing data 缺失数据

pandas primarily uses the value np.nan to represent missing data. It is by default not included in computations. See the Missing Data section.

Reindexing allows you to change/add/delete the index on a specified axis. This returns a copy of the data:

pandas主要使用np.nan表示缺失的数据。默认情况下，它不包括在计算中。请参阅缺失数据部分。
重构索引允许您更改/添加/删除指定轴上的索引。这将返回数据的副本：

import numpy as np
import pandas as pd
dates = pd.date_range("20130101", periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range("20130102", periods=6))
df["F"] = s1

df

ABCDF2013-01-010.184624-1.0428140.444349-0.259771NaN2013-01-02-0.744011-0.390294-0.1332670.9521791.02013-01-031.0039100.718454-0.0824832.1829442.02013-01-04-2.222158-0.509435-0.3671560.8521583.02013-01-05-0.4202092.1786012.5526430.7334524.02013-01-060.4509581.0656500.1717980.7013915.0

df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ["E"])
df1

ABCDFE2013-01-010.184624-1.0428140.444349-0.259771NaNNaN2013-01-02-0.744011-0.390294-0.1332670.9521791.0NaN2013-01-031.0039100.718454-0.0824832.1829442.0NaN2013-01-04-2.222158-0.509435-0.3671560.8521583.0NaN

To drop any rows that have missing data:
要删除任何缺少数据的行，请执行以下操作：

df1.dropna(how="any")

ABCDFE

Filling missing data:
填充缺失数据：

df1.fillna(value=5)

ABCDFE2013-01-010.184624-1.0428140.444349-0.2597715.05.02013-01-02-0.744011-0.390294-0.1332670.9521791.05.02013-01-031.0039100.718454-0.0824832.1829442.05.02013-01-04-2.222158-0.509435-0.3671560.8521583.05.0

To get the boolean mask where values are nan:
要获取值为nan（缺失）的布尔掩码：

pd.isna(df1)

ABCDFE2013-01-01FalseFalseFalseFalseTrueTrue2013-01-02FalseFalseFalseFalseFalseTrue2013-01-03FalseFalseFalseFalseFalseTrue2013-01-04FalseFalseFalseFalseFalseTrue

5.Operations 操作

See the Basic section on Binary Ops.

Operations in general exclude missing data.

Performing a descriptive statistic:

参见二进制操作的基本部分

操作通常排除丢失的数据。
进行描述性统计：

df.mean()

A   -0.291148
B    0.336694
C    0.430981
D    0.860392
F    3.000000
dtype: float64

Same operation on the other axis:
另一个轴上的相同操作：

df.mean(1)

2013-01-01    0.191630
2013-01-02   -0.114052
2013-01-03    0.071200
2013-01-04   -0.257770
2013-01-05    0.466199
2013-01-06    0.878283
Freq: D, dtype: float64

Operating with objects that have different dimensionality and need alignment. In addition, pandas automatically broadcasts along the specified dimension:
操作具有不同维度且需要对齐的对象。此外，pandas还会自动沿指定维度广播：

s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2)
s

2013-01-01    NaN
2013-01-02    NaN
2013-01-03    1.0
2013-01-04    3.0
2013-01-05    5.0
2013-01-06    NaN
Freq: D, dtype: float64

df.sub(s, axis="index")

ABCDF2013-01-01NaNNaNNaNNaNNaN2013-01-02NaNNaNNaNNaNNaN2013-01-030.003910-0.281546-1.0824831.1829441.02013-01-04-5.222158-3.509435-3.367156-2.1478420.02013-01-05-5.420209-2.821399-2.447357-4.266548-1.02013-01-06NaNNaNNaNNaNNaN

Applying functions to the data:
应用
将函数应用于数据：

df.apply(np.cumsum)

ABCDF2013-01-010.184624-1.0428140.444349-0.259771NaN2013-01-02-0.559387-1.4331070.3110820.6924081.02013-01-030.444523-0.7146530.2285992.8753523.02013-01-04-1.777635-1.224088-0.1385573.7275106.02013-01-05-2.1978440.9545132.4140864.46096210.02013-01-06-1.7468872.0201642.5858845.16235315.0

df.apply(lambda x: x.max() - x.min())

A    3.226068
B    3.221415
C    2.919799
D    2.442716
F    4.000000
dtype: float64

df.apply(lambda x: x.max() - x.min(),axis=1)

2013-01-01    1.487163
2013-01-02    1.744011
2013-01-03    2.265428
2013-01-04    5.222158
2013-01-05    4.420209
2013-01-06    4.828202
Freq: D, dtype: float64

See more at Histogramming and Discretization.

组织编程
更多信息请参见组织编程和离散化。

s = pd.Series(np.random.randint(0, 7, size=10))

0    5
1    2
2    6
3    6
4    4
5    1
6    2
7    3
8    1
9    2
dtype: int64

s.value_counts()

2    3
6    2
1    2
5    1
4    1
3    1
dtype: int64

字符串方法

Series is equipped with a set of string processing methods in the str attribute that make it easy to operate on each element of the array, as in the code snippet below. Note that pattern-matching in str generally uses regular expressions by default (and in some cases always uses them). See more at Vectorized String Methods.

Series（序列）在str（字符）属性中配备了一组字符串处理方法，可以方便地对数组的每个元素进行操作，如下面的代码片段所示。请注意，str中的模式匹配通常默认使用正则表达式（在某些情况下总是使用它们）。请参考向量化字符串方法。

s = pd.Series(["A", "B", "C", "Aaba", "Baca", np.nan, "CABA", "dog", "cat"])
s.str.lower()

0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object

type(s)

pandas.core.series.Series

6.Merge

pandas provides various facilities for easily combining together Series and DataFrame objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations.

See the Merging section.

Concatenating pandas objects together with concat():
6.1 连接
pandas提供了各种工具用于在连接/合并类型操作的情况下，轻松地将带有索引和关系代数功能逻辑的序列和数据帧对象组合在一起。
请参阅合并部分。
将pandas对象通过concat（）连接在一起：

df = pd.DataFrame(np.random.randn(10, 4))
df

012300.4889701.237504-1.640805-0.67211710.3908730.9068300.2606620.1199892-0.854710-0.5354101.6418780.3214873-0.1347800.5555541.024371-0.1031644-1.241929-0.116488-0.922242-2.0667265-0.4323972.018692-0.5368010.07457661.452204-0.5871960.9187981.19213070.8199540.224358-0.022698-0.74529380.266344-0.3219441.2515430.6033339-0.4916710.2784490.1947511.056218

pieces = [df[:3], df[3:7], df[7:]]
pieces

[          0         1         2         3
 0  0.488970  1.237504 -1.640805 -0.672117
 1  0.390873  0.906830  0.260662  0.119989
 2 -0.854710 -0.535410  1.641878  0.321487,
           0         1         2         3
 3 -0.134780  0.555554  1.024371 -0.103164
 4 -1.241929 -0.116488 -0.922242 -2.066726
 5 -0.432397  2.018692 -0.536801  0.074576
 6  1.452204 -0.587196  0.918798  1.192130,
           0         1         2         3
 7  0.819954  0.224358 -0.022698 -0.745293
 8  0.266344 -0.321944  1.251543  0.603333
 9 -0.491671  0.278449  0.194751  1.056218]

pieces[0]

012300.4889701.237504-1.640805-0.67211710.3908730.9068300.2606620.1199892-0.854710-0.5354101.6418780.321487

pd.concat(pieces)

note:
Adding a column to a DataFrame is relatively fast. However, adding a row requires a copy, and may be expensive. We recommend passing a pre-built list of records to the DataFrame constructor instead of building a DataFrame by iteratively appending records to it.

注意：向数据帧中添加列的速度相对较快。但是，添加行需要一个副本，而且可能会很昂贵。我们建议将预构建的记录列表传递给DataFrame容器中，而不是通过迭代地向其追加记录来构建DataFrame。

SQL style merges. See the Database style joining section.

SQL风格的合并。请参见”数据库样式连接”部分。

left = pd.DataFrame({"key": ["foo", "foo"], "lval": [1, 2]})

left

keylval0foo11foo2

right = pd.DataFrame({"key": ["foo", "foo"], "rval": [4, 5]})
right

keyrval0foo41foo5

pd.merge(left, right, on="key")

keylvalrval0foo141foo152foo243foo25

pd.merge(left, right)

keylvalrval0foo141foo152foo243foo25

Another example that can be given is:
可以给出的另一个例子是：

left = pd.DataFrame({"key": ["foo", "bar"], "lval": [1, 2]})
right = pd.DataFrame({"key": ["foo", "bar"], "rval": [4, 5]})
pd.merge(left, right, on="key")

keylvalrval0foo141bar25

Original: https://blog.csdn.net/u012338969/article/details/124575624
Author: 雪龙无敌
Title: pandas10minnutes_中英对照02

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/737863/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

Python | 内置函数(BIF)

Python内置函数 | V3.9.1 | 共计155个还没学完, 还没记录完, 不知道自己能不能坚持记录下去 1.ArithmeticError 2.AssertionErro…

Python 2023年5月24日
0094
如何在“浏览器”里实现一个云端EDA

本文介绍了一种在浏览器里编辑代码、仿真、看log、看波形的方法。 django介绍 django是一个由python实现的web后端框架。这里”后端”就是指…

Python 2023年8月6日
0051
Pandas统计计数value_counts()的使用

value_counts()方法返回一个序列Series，该序列包含每个值的数量(对于数据框中的任何列，value_counts()方法会返回该列每个项的计数) value_cou…

Python 2023年8月6日
0081
Python建立线性回归模型进行房价预测

Python建立线性回归模型进行房价预测前期准备多因子房价预测 * 实战流程 – 1.数据加载 2.数据可视化 3.数据预处理 4.模型建立与训练 5.模型预测 6…

Python 2023年9月16日
0087
【Python】merge、join、concat和append用法比较

merge merge 函数通过一个或多个键将数据集的行连接起来。场景：针对同一个主键存在的两张包含不同特征的表，通过主键的链接，将两张表进行合并。合并之后，两张表的行数不增加，…

Python 2023年8月8日
0069
直方图均衡化

1. 图像直方图图像直方图，是指对整个图像在灰度范围内的像素值(0-255)统计出现频率次数，据此生成的直方图，称为图像直方图或直方图。直方图反映了图像灰度的分布情况，是图像的统…

Python 2023年10月29日
0023
IEMOCAP数据集分析

IEMOCAP数据集分析论文：IEMOCAP: Interactive emotional dyadic motion capture database 作者：Carlos Bu…

Python 2023年11月8日
0051
无感知更新token详解

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

Python 2023年8月11日
0049
pandas玩转Excel及数据分析(二) Excel文件读取

这里我们着看其针对excel的API，包括读取excel所用的 read_excel和写excel所用的 to_excel pandas.read_excel(io, sheet_…

Python 2023年8月19日
0039
python anaconda 简易上手指南

1.安装 anaconda 官方提供了两个，miniconda和anaconda，推荐第二个，虽然mini版自带的乱七八糟的库不多，但常用到的编译器如spider、jupyter都…

Python 2023年5月24日
0079
怎样用python读取上一条数据_python怎么读取数据

读取数据可以方便我们的工作，python中常见的数据读取方式有很多，那么python如何读取数据呢？利用pandas中的read_csv模块直接将数据读取出来。(推荐学习：Pyt…

Python 2023年8月20日
0045
python pip安装第三方包速度慢，这篇博客给你安排清楚了

⛳️ 实战场景作为 Python 的初学者，经常要用到第三方模块的安排，常规情况下，直接使用下述命令，然后就等待去了。 pip install 模块包名但是上述命令默认访问的是…

Python 2023年5月24日
0074
Flowable学习笔记(一)：SpringBoot项目整合Flowable

1.基于k8s部署Mysql 参考：k8s部署mysql 我安装是去掉了卷挂载。安装过程可能出现磁盘容量不够，可以通过df -h查看。镜像下载得比较慢，可以先用docker拉取镜像…

Python 2023年9月29日
0058
视觉机械臂自主抓取全流程

目录简介相机标定手眼标定 Eye-In-Hand Eye-To-Hand 求解（Eye-In-Hand）求解AX=XB 手眼标定步骤读取出摄像头信息并确定目标物体的位姿 …

Python 2023年9月30日
0032
利用WSL中的conda虚环境解决Import “***“ could not be resolved问题

项目场景：下午在加载可视化工具时用到了open3d的库，但是在VScode上缺少这样的库，于是我试图使用pip指令安装，结果出现了如下情况：我按照所说更新了pip，或许由于网络…

Python 2023年9月8日
0078
docker-compose 搭建 Prometheus+Grafana监控系统

🚀 优质资源分享 🚀 学习路线指引（点击解锁）知识定位人群定位🧡 Python实战微信订餐小程序 🧡 进阶级本课程是python flask+微信小程序的完美结合，从项目搭建到腾讯…

Python 2023年8月13日
0070

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

pandas10minnutes_中英对照02

4.Missing data 缺失数据

5.Operations 操作

6.Merge

大家都在看