pandas10minnutes_中英对照02

2023年8月6日下午3:44 • Python • 阅读 38

本次主要讲以下章节内容：
4.Missing data 缺失数据
5.Operations 操作
6.Merge 合并

4.Missing data 缺失数据

pandas primarily uses the value np.nan to represent missing data. It is by default not included in computations. See the Missing Data section.

Reindexing allows you to change/add/delete the index on a specified axis. This returns a copy of the data:

pandas主要使用np.nan表示缺失的数据。默认情况下，它不包括在计算中。请参阅缺失数据部分。
重构索引允许您更改/添加/删除指定轴上的索引。这将返回数据的副本：

import numpy as np
import pandas as pd
dates = pd.date_range("20130101", periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range("20130102", periods=6))
df["F"] = s1

df

ABCDF2013-01-010.184624-1.0428140.444349-0.259771NaN2013-01-02-0.744011-0.390294-0.1332670.9521791.02013-01-031.0039100.718454-0.0824832.1829442.02013-01-04-2.222158-0.509435-0.3671560.8521583.02013-01-05-0.4202092.1786012.5526430.7334524.02013-01-060.4509581.0656500.1717980.7013915.0

df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ["E"])
df1

ABCDFE2013-01-010.184624-1.0428140.444349-0.259771NaNNaN2013-01-02-0.744011-0.390294-0.1332670.9521791.0NaN2013-01-031.0039100.718454-0.0824832.1829442.0NaN2013-01-04-2.222158-0.509435-0.3671560.8521583.0NaN

To drop any rows that have missing data:
要删除任何缺少数据的行，请执行以下操作：

df1.dropna(how="any")

ABCDFE

Filling missing data:
填充缺失数据：

df1.fillna(value=5)

ABCDFE2013-01-010.184624-1.0428140.444349-0.2597715.05.02013-01-02-0.744011-0.390294-0.1332670.9521791.05.02013-01-031.0039100.718454-0.0824832.1829442.05.02013-01-04-2.222158-0.509435-0.3671560.8521583.05.0

To get the boolean mask where values are nan:
要获取值为nan（缺失）的布尔掩码：

pd.isna(df1)

ABCDFE2013-01-01FalseFalseFalseFalseTrueTrue2013-01-02FalseFalseFalseFalseFalseTrue2013-01-03FalseFalseFalseFalseFalseTrue2013-01-04FalseFalseFalseFalseFalseTrue

5.Operations 操作

See the Basic section on Binary Ops.

Operations in general exclude missing data.

Performing a descriptive statistic:

参见二进制操作的基本部分

操作通常排除丢失的数据。
进行描述性统计：

df.mean()

A   -0.291148
B    0.336694
C    0.430981
D    0.860392
F    3.000000
dtype: float64

Same operation on the other axis:
另一个轴上的相同操作：

df.mean(1)

2013-01-01    0.191630
2013-01-02   -0.114052
2013-01-03    0.071200
2013-01-04   -0.257770
2013-01-05    0.466199
2013-01-06    0.878283
Freq: D, dtype: float64

Operating with objects that have different dimensionality and need alignment. In addition, pandas automatically broadcasts along the specified dimension:
操作具有不同维度且需要对齐的对象。此外，pandas还会自动沿指定维度广播：

s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2)
s

2013-01-01    NaN
2013-01-02    NaN
2013-01-03    1.0
2013-01-04    3.0
2013-01-05    5.0
2013-01-06    NaN
Freq: D, dtype: float64

df.sub(s, axis="index")

ABCDF2013-01-01NaNNaNNaNNaNNaN2013-01-02NaNNaNNaNNaNNaN2013-01-030.003910-0.281546-1.0824831.1829441.02013-01-04-5.222158-3.509435-3.367156-2.1478420.02013-01-05-5.420209-2.821399-2.447357-4.266548-1.02013-01-06NaNNaNNaNNaNNaN

Applying functions to the data:
应用
将函数应用于数据：

df.apply(np.cumsum)

ABCDF2013-01-010.184624-1.0428140.444349-0.259771NaN2013-01-02-0.559387-1.4331070.3110820.6924081.02013-01-030.444523-0.7146530.2285992.8753523.02013-01-04-1.777635-1.224088-0.1385573.7275106.02013-01-05-2.1978440.9545132.4140864.46096210.02013-01-06-1.7468872.0201642.5858845.16235315.0

df.apply(lambda x: x.max() - x.min())

A    3.226068
B    3.221415
C    2.919799
D    2.442716
F    4.000000
dtype: float64

df.apply(lambda x: x.max() - x.min(),axis=1)

2013-01-01    1.487163
2013-01-02    1.744011
2013-01-03    2.265428
2013-01-04    5.222158
2013-01-05    4.420209
2013-01-06    4.828202
Freq: D, dtype: float64

See more at Histogramming and Discretization.

组织编程
更多信息请参见组织编程和离散化。

s = pd.Series(np.random.randint(0, 7, size=10))

0    5
1    2
2    6
3    6
4    4
5    1
6    2
7    3
8    1
9    2
dtype: int64

s.value_counts()

2    3
6    2
1    2
5    1
4    1
3    1
dtype: int64

字符串方法

Series is equipped with a set of string processing methods in the str attribute that make it easy to operate on each element of the array, as in the code snippet below. Note that pattern-matching in str generally uses regular expressions by default (and in some cases always uses them). See more at Vectorized String Methods.

Series（序列）在str（字符）属性中配备了一组字符串处理方法，可以方便地对数组的每个元素进行操作，如下面的代码片段所示。请注意，str中的模式匹配通常默认使用正则表达式（在某些情况下总是使用它们）。请参考向量化字符串方法。

s = pd.Series(["A", "B", "C", "Aaba", "Baca", np.nan, "CABA", "dog", "cat"])
s.str.lower()

0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object

type(s)

pandas.core.series.Series

6.Merge

pandas provides various facilities for easily combining together Series and DataFrame objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations.

See the Merging section.

Concatenating pandas objects together with concat():
6.1 连接
pandas提供了各种工具用于在连接/合并类型操作的情况下，轻松地将带有索引和关系代数功能逻辑的序列和数据帧对象组合在一起。
请参阅合并部分。
将pandas对象通过concat（）连接在一起：

df = pd.DataFrame(np.random.randn(10, 4))
df

012300.4889701.237504-1.640805-0.67211710.3908730.9068300.2606620.1199892-0.854710-0.5354101.6418780.3214873-0.1347800.5555541.024371-0.1031644-1.241929-0.116488-0.922242-2.0667265-0.4323972.018692-0.5368010.07457661.452204-0.5871960.9187981.19213070.8199540.224358-0.022698-0.74529380.266344-0.3219441.2515430.6033339-0.4916710.2784490.1947511.056218

pieces = [df[:3], df[3:7], df[7:]]
pieces

[          0         1         2         3
 0  0.488970  1.237504 -1.640805 -0.672117
 1  0.390873  0.906830  0.260662  0.119989
 2 -0.854710 -0.535410  1.641878  0.321487,
           0         1         2         3
 3 -0.134780  0.555554  1.024371 -0.103164
 4 -1.241929 -0.116488 -0.922242 -2.066726
 5 -0.432397  2.018692 -0.536801  0.074576
 6  1.452204 -0.587196  0.918798  1.192130,
           0         1         2         3
 7  0.819954  0.224358 -0.022698 -0.745293
 8  0.266344 -0.321944  1.251543  0.603333
 9 -0.491671  0.278449  0.194751  1.056218]

pieces[0]

012300.4889701.237504-1.640805-0.67211710.3908730.9068300.2606620.1199892-0.854710-0.5354101.6418780.321487

pd.concat(pieces)

note:
Adding a column to a DataFrame is relatively fast. However, adding a row requires a copy, and may be expensive. We recommend passing a pre-built list of records to the DataFrame constructor instead of building a DataFrame by iteratively appending records to it.

注意：向数据帧中添加列的速度相对较快。但是，添加行需要一个副本，而且可能会很昂贵。我们建议将预构建的记录列表传递给DataFrame容器中，而不是通过迭代地向其追加记录来构建DataFrame。

SQL style merges. See the Database style joining section.

SQL风格的合并。请参见”数据库样式连接”部分。

left = pd.DataFrame({"key": ["foo", "foo"], "lval": [1, 2]})

left

keylval0foo11foo2

right = pd.DataFrame({"key": ["foo", "foo"], "rval": [4, 5]})
right

keyrval0foo41foo5

pd.merge(left, right, on="key")

keylvalrval0foo141foo152foo243foo25

pd.merge(left, right)

keylvalrval0foo141foo152foo243foo25

Another example that can be given is:
可以给出的另一个例子是：

left = pd.DataFrame({"key": ["foo", "bar"], "lval": [1, 2]})
right = pd.DataFrame({"key": ["foo", "bar"], "rval": [4, 5]})
pd.merge(left, right, on="key")

keylvalrval0foo141bar25

Original: https://blog.csdn.net/u012338969/article/details/124575624
Author: 雪龙无敌
Title: pandas10minnutes_中英对照02

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/737863/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

万字解析——区块链hyperledger fabric2.2部署实战教程

导航一、前言二、hyperledger fabric介绍三、测试网络示例 * 3.1 搭建开发环境 3.2 安装示例、二进制和 Docker 镜像 3.3 使用Fabric测…

Python 2023年10月9日
0045
寒假每日一题2023——4261. 孤独的照片

写在前面题目来源：AcWing 寒假每日一题2023活动链接：https://www.acwing.com/problem/content/description/4264/ 题…

Python 2023年11月6日
0028
SpringCloud微服务实战——搭建企业级开发框架（四十六）：【移动开发】整合uni-app搭建移动端快速开发框架-环境搭建

近年来uni-app发展势头迅猛，只要会vue.js，就可以开发一套代码，发布移动应用到iOS、Android、Web（响应式）、以及各种小程序（微信/支付宝/百度/头条/飞书/Q…

Python 2023年10月17日
0059
pygame实现落球游戏1

游戏内容：屏幕上落下一个球，通过鼠标移动，地下的木块如果接上则加分，否则就减去一命，三条命用完则游戏结束。引包引入对应的包，和原来一样写一个打印文字的方法 import sys…

Python 2023年9月20日
0055
salesforce零基础学习（一百一十七）salesforce部署方式及适用场景

本篇参考：https://architect.salesforce.com/decision-guides/migrate-change https://developer.sal…

Python 2023年10月22日
0052
方法：list.append()

list.append():方法用于在列表末尾添加新的对象；该方法没有返回值，但是会修改原来的列表；格式如下：listname.append(object) listname：…

Python 2023年6月12日
0063
高斯滤波器(Gaussian Filter) python实现及部分原理说明（opencv）

上传图片 img = cv2.imread("img/11.jpeg") print(‘This image is:’, type(img), ‘ with d…

Python 2023年8月28日
0044
【CSS】flex布局用法解析，快速上手flex布局，flex:1是什么意思？肯定看的懂好吧？

一、flex布局 flex 是 flexible box 的缩写，意为”弹性布局”，用来为盒状模型提供最大的灵活性。任何一个容器都可以指定为 flex 布局…

Python 2023年9月29日
0065
Mac 解决 Font family [‘sans-serif‘] not found

mac 中出现 Font family [‘sans-serif’] not found.Falling back to DejaVu Sans. 原因：w…

Python 2023年9月2日
0032
pythonDjango笔记

一、简单说明学Django之前需要大概掌握Python知识点：函数、面向对象。前端开发：HTML、CSS、JavaScript、jQuery、BootStrap。 MySQL数…

Python 2023年8月4日
0049
SolidWorks2022 安装教程

软件介绍 SolidWorks是一款专业的三维3D设计软件，功能强悍，支持分布式数据管理，支持直接处理网格数据，提供更多的数据的灵活性，使用起来高效便捷。可以帮助用户轻松进行3D …

Python 2023年11月5日
00112
实现视频人数统计python-flask-yolov5

需求介绍小编qq群：193369905相关需求联系群主客户端请求输入一段视频或者一个视频流，输出人数或其他目标数量，上报给上层服务器端，即提供一个http API调用算法统计出人…

Python 2023年8月12日
0049
python：DataFrame的创建以及DataFrame的属性

一、DataFrame的创建 Pandas 的数据结构主要是：Series（一维数组），DataFrame（二维数组）。DataFrame是由索引和内容组成，索引既有行索引inde…

Python 2023年8月1日
0043
python创建长度为n的list_从Python中的narray-list字典创建DataFrame

Pandas是一个非常广泛使用的python库，用于数据处理和数据分析。在本文中，我们将看到如何从给定的python字典和列表创建pandas数据框。从带有列表的字典字典是键值…

Python 2023年8月22日
0051
python 数据可视化matplotlib的那些操作(二：应用实例）

目录 1. 绘制3D柱状图 * 1.1 绘制思路 1.2 代码展示 2. 对比图（双y绘制） * 2.1 实现效果 2.2 完整代码 3. 数据包含缺省值 * 3.1 实现效果 3…

Python 2023年9月1日
0049
人工智能与机器学习

欢迎关注博主 Mindtechnist 或加入【Linux C/C++/Python社区】一起探讨和分享Linux C/C++/Python/Shell编程、机器人技术、机器学习、…

Python 2023年9月16日
0062

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

pandas10minnutes_中英对照02

4.Missing data 缺失数据

5.Operations 操作

6.Merge

大家都在看