pandas10minnutes_中英对照02

本次主要讲以下章节内容:
4.Missing data 缺失数据
5.Operations 操作
6.Merge 合并

4.Missing data 缺失数据

pandas primarily uses the value np.nan to represent missing data. It is by default not included in computations. See the Missing Data section.

Reindexing allows you to change/add/delete the index on a specified axis. This returns a copy of the data:

pandas主要使用np.nan表示缺失的数据。默认情况下,它不包括在计算中。请参阅缺失数据部分。
重构索引允许您更改/添加/删除指定轴上的索引。这将返回数据的副本:
import numpy as np
import pandas as pd
dates = pd.date_range("20130101", periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range("20130102", periods=6))
df["F"] = s1
df

ABCDF2013-01-010.184624-1.0428140.444349-0.259771NaN2013-01-02-0.744011-0.390294-0.1332670.9521791.02013-01-031.0039100.718454-0.0824832.1829442.02013-01-04-2.222158-0.509435-0.3671560.8521583.02013-01-05-0.4202092.1786012.5526430.7334524.02013-01-060.4509581.0656500.1717980.7013915.0

df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ["E"])
df1

ABCDFE2013-01-010.184624-1.0428140.444349-0.259771NaNNaN2013-01-02-0.744011-0.390294-0.1332670.9521791.0NaN2013-01-031.0039100.718454-0.0824832.1829442.0NaN2013-01-04-2.222158-0.509435-0.3671560.8521583.0NaN

To drop any rows that have missing data:
要删除任何缺少数据的行,请执行以下操作:

df1.dropna(how="any")

ABCDFE

Filling missing data:
填充缺失数据:

df1.fillna(value=5)

ABCDFE2013-01-010.184624-1.0428140.444349-0.2597715.05.02013-01-02-0.744011-0.390294-0.1332670.9521791.05.02013-01-031.0039100.718454-0.0824832.1829442.05.02013-01-04-2.222158-0.509435-0.3671560.8521583.05.0

To get the boolean mask where values are nan:
要获取值为nan(缺失)的布尔掩码:

pd.isna(df1)

ABCDFE2013-01-01FalseFalseFalseFalseTrueTrue2013-01-02FalseFalseFalseFalseFalseTrue2013-01-03FalseFalseFalseFalseFalseTrue2013-01-04FalseFalseFalseFalseFalseTrue

5.Operations 操作

See the Basic section on Binary Ops.

Operations in general exclude missing data.

Performing a descriptive statistic:

参见二进制操作的基本部分

操作通常排除丢失的数据。
进行描述性统计:

df.mean()
A   -0.291148
B    0.336694
C    0.430981
D    0.860392
F    3.000000
dtype: float64

Same operation on the other axis:
另一个轴上的相同操作:

df.mean(1)
2013-01-01    0.191630
2013-01-02   -0.114052
2013-01-03    0.071200
2013-01-04   -0.257770
2013-01-05    0.466199
2013-01-06    0.878283
Freq: D, dtype: float64

Operating with objects that have different dimensionality and need alignment. In addition, pandas automatically broadcasts along the specified dimension:
操作具有不同维度且需要对齐的对象。此外,pandas还会自动沿指定维度广播:

s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2)
s
2013-01-01    NaN
2013-01-02    NaN
2013-01-03    1.0
2013-01-04    3.0
2013-01-05    5.0
2013-01-06    NaN
Freq: D, dtype: float64
df.sub(s, axis="index")

ABCDF2013-01-01NaNNaNNaNNaNNaN2013-01-02NaNNaNNaNNaNNaN2013-01-030.003910-0.281546-1.0824831.1829441.02013-01-04-5.222158-3.509435-3.367156-2.1478420.02013-01-05-5.420209-2.821399-2.447357-4.266548-1.02013-01-06NaNNaNNaNNaNNaN

Applying functions to the data:
应用
将函数应用于数据:

df.apply(np.cumsum)

ABCDF2013-01-010.184624-1.0428140.444349-0.259771NaN2013-01-02-0.559387-1.4331070.3110820.6924081.02013-01-030.444523-0.7146530.2285992.8753523.02013-01-04-1.777635-1.224088-0.1385573.7275106.02013-01-05-2.1978440.9545132.4140864.46096210.02013-01-06-1.7468872.0201642.5858845.16235315.0

df.apply(lambda x: x.max() - x.min())
A    3.226068
B    3.221415
C    2.919799
D    2.442716
F    4.000000
dtype: float64
df.apply(lambda x: x.max() - x.min(),axis=1)
2013-01-01    1.487163
2013-01-02    1.744011
2013-01-03    2.265428
2013-01-04    5.222158
2013-01-05    4.420209
2013-01-06    4.828202
Freq: D, dtype: float64

See more at Histogramming and Discretization.

组织编程
更多信息请参见组织编程和离散化。

s = pd.Series(np.random.randint(0, 7, size=10))
0    5
1    2
2    6
3    6
4    4
5    1
6    2
7    3
8    1
9    2
dtype: int64
s.value_counts()
2    3
6    2
1    2
5    1
4    1
3    1
dtype: int64

字符串方法

Series is equipped with a set of string processing methods in the str attribute that make it easy to operate on each element of the array, as in the code snippet below. Note that pattern-matching in str generally uses regular expressions by default (and in some cases always uses them). See more at Vectorized String Methods.

Series(序列)在str(字符)属性中配备了一组字符串处理方法,可以方便地对数组的每个元素进行操作,如下面的代码片段所示。请注意,str中的模式匹配通常默认使用正则表达式(在某些情况下总是使用它们)。请参考向量化字符串方法。

s = pd.Series(["A", "B", "C", "Aaba", "Baca", np.nan, "CABA", "dog", "cat"])
s.str.lower()
0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object
type(s)
pandas.core.series.Series

6.Merge

pandas provides various facilities for easily combining together Series and DataFrame objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations.

See the Merging section.

Concatenating pandas objects together with concat():
6.1 连接
pandas提供了各种工具用于在连接/合并类型操作的情况下,轻松地将带有索引和关系代数功能逻辑的序列和数据帧对象组合在一起。
请参阅合并部分。
将pandas对象通过concat()连接在一起:

df = pd.DataFrame(np.random.randn(10, 4))
df

012300.4889701.237504-1.640805-0.67211710.3908730.9068300.2606620.1199892-0.854710-0.5354101.6418780.3214873-0.1347800.5555541.024371-0.1031644-1.241929-0.116488-0.922242-2.0667265-0.4323972.018692-0.5368010.07457661.452204-0.5871960.9187981.19213070.8199540.224358-0.022698-0.74529380.266344-0.3219441.2515430.6033339-0.4916710.2784490.1947511.056218

pieces = [df[:3], df[3:7], df[7:]]
pieces
[          0         1         2         3
 0  0.488970  1.237504 -1.640805 -0.672117
 1  0.390873  0.906830  0.260662  0.119989
 2 -0.854710 -0.535410  1.641878  0.321487,
           0         1         2         3
 3 -0.134780  0.555554  1.024371 -0.103164
 4 -1.241929 -0.116488 -0.922242 -2.066726
 5 -0.432397  2.018692 -0.536801  0.074576
 6  1.452204 -0.587196  0.918798  1.192130,
           0         1         2         3
 7  0.819954  0.224358 -0.022698 -0.745293
 8  0.266344 -0.321944  1.251543  0.603333
 9 -0.491671  0.278449  0.194751  1.056218]
pieces[0]

012300.4889701.237504-1.640805-0.67211710.3908730.9068300.2606620.1199892-0.854710-0.5354101.6418780.321487

pd.concat(pieces)

012300.4889701.237504-1.640805-0.67211710.3908730.9068300.2606620.1199892-0.854710-0.5354101.6418780.3214873-0.1347800.5555541.024371-0.1031644-1.241929-0.116488-0.922242-2.0667265-0.4323972.018692-0.5368010.07457661.452204-0.5871960.9187981.19213070.8199540.224358-0.022698-0.74529380.266344-0.3219441.2515430.6033339-0.4916710.2784490.1947511.056218

note:
Adding a column to a DataFrame is relatively fast. However, adding a row requires a copy, and may be expensive. We recommend passing a pre-built list of records to the DataFrame constructor instead of building a DataFrame by iteratively appending records to it.

注意:向数据帧中添加列的速度相对较快。但是,添加行需要一个副本,而且可能会很昂贵。 我们建议将预构建的记录列表传递给DataFrame容器中,而不是通过迭代地向其追加记录来构建DataFrame。

SQL style merges. See the Database style joining section.

SQL风格的合并。请参见”数据库样式连接”部分。

left = pd.DataFrame({"key": ["foo", "foo"], "lval": [1, 2]})

left

keylval0foo11foo2

right = pd.DataFrame({"key": ["foo", "foo"], "rval": [4, 5]})
right

keyrval0foo41foo5

pd.merge(left, right, on="key")

keylvalrval0foo141foo152foo243foo25

pd.merge(left, right)

keylvalrval0foo141foo152foo243foo25

Another example that can be given is:
可以给出的另一个例子是:

left = pd.DataFrame({"key": ["foo", "bar"], "lval": [1, 2]})
right = pd.DataFrame({"key": ["foo", "bar"], "rval": [4, 5]})
pd.merge(left, right, on="key")

keylvalrval0foo141bar25

Original: https://blog.csdn.net/u012338969/article/details/124575624
Author: 雪龙无敌
Title: pandas10minnutes_中英对照02

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/737863/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球