对Python-pandas中缺失值问题的归纳

目录

*
DataFrame中单个数据的缺失值判断
获得DataFrame缺失值情况
保留某列为空值的行
保留不为空的行
统计每一列缺失值情况

+ 最简单的可用.info()函数
+ .isna()函数结合.sum()画图

import numpy as np
import pandas as pd

df = pd.DataFrame(dict(age=[5, 6, np.NaN],
                    born=[pd.NaT, pd.Timestamp('1939-05-27'),
                          pd.Timestamp('1940-04-25')],
                    name=['Alfred', 'Batman', ''],
                    toy=[None, 'Batmobile', 'Joker']))

'''
age     born        name    toy
0   5.0 NaT         Alfred  None
1   6.0 1939-05-27  Batman  Batmobile
2   NaN 1940-04-25          Joker

'''

DataFrame中单个数据的缺失值判断

pandas中单个缺失值实际上为 np.NaN
对于单个值x,不可直接用 x == np.NaN,可用 np.isnan(x)函数判断

df['age'][2] == np.NaN  False
df['age'][2] is np.NaN  False
np.isnan(df['age'][2])  True

获得DataFrame缺失值情况

df.isna()

    age     born    name    toy
0   False   True    False   True
1   False   False   False   False
2   True    False   False   False

保留某列为空值的行

可综合 .loc[].isna()函数综合判断
先看df.loc函数的定义


.loc[] is primarily label based, but may also be used with a
boolean array.

Allowed inputs are:

- A single label, e.g. 5 or 'a', (note that 5 is
  interpreted as a *label* of the index, and **never** as an
  integer position along the index).

- A list or array of labels, e.g. ['a', 'b', 'c'].

- A slice object with labels, e.g. 'a':'f'.

  .. warning:: Note that contrary to usual python slices, **both** the
      start and the stop are included

- A boolean array of the same length as the axis being sliced,
  e.g. [True, False, True].

- An alignable boolean Series. The index of the key will be aligned before
  masking.

- An alignable Index. The Index of the returned selection will be the input.

- A  function with one argument (the calling Series or
  DataFrame) and that returns valid output for indexing (one of the above)

从可接受输入的倒数第二个点开始,您只需要获得需要判断空值的相应列列表。

[En]

Starting from the penultimate point of acceptable input, you only need to get the corresponding list of columns that need to judge null values.

故可凭借 .isna()函数返回的数据帧,进一步选择对应列即可达到目的


df.loc[ df.isna()['age'] ]

    age born    name    toy
2   NaN 1940-04-25      Joker

保留不为空的行

直接使用 .dropna()函数即可

df.dropna(subset = ['age'])

age born    name    toy
0   5.0 NaT Alfred  None
1   6.0 1939-05-27  Batman  Batmobile

统计每一列缺失值情况

探索性数据分析(EDA)中需要初步了解数据,缺失值情况是很重要的一点。

最简单的可用 .info() 函数

`python
df.info()


RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):

Original: https://blog.csdn.net/MavPig/article/details/115743654
Author: Maverick pig
Title: 对Python-pandas中缺失值问题的归纳



相关阅读

Title: python类型转换astype时间_python dataframe astype 字段类型转换方法

使用astype实现dataframe字段类型转换 # –– coding: UTF-8 –

import pandas as pd

df = pd.DataFrame([{‘col1′:’a’, ‘col2′:’1’}, {‘col1′:’b’, ‘col2′:’2’}])

print df.dtypes

df[‘col2’] = df[‘col2’].astype(‘int’)

print ‘———–‘

print df.dtypes

df[‘col2’] = df[‘col2’].astype(‘float64’)

print ‘———–‘

print df.dtypes

输出结果: col1 object

col2 object

dtype: object

col1 object

col2 float64

dtype: object

注:data type list Data type Description

bool_ Boolean (True or False) stored as a byte

int_ Default integer type (same as C long; normally either int64 or int32)

intc Identical to C int (normally int32 or int64)

intp Integer used for indexing (same as C ssize_t; normally either int32 or int64)

int8 Byte (-128 to 127)

int16 Integer (-32768 to 32767)

int32 Integer (-2147483648 to 2147483647)

int64 Integer (-9223372036854775808 to 9223372036854775807)

uint8 Unsigned integer (0 to 255)

uint16 Unsigned integer (0 to 65535)

uint32 Unsigned integer (0 to 4294967295)

uint64 Unsigned integer (0 to 18446744073709551615)

float_ Shorthand for float64.

float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa

float32 Single precision float: sign bit, 8 bits exponent, 23 bits mantissa

float64 Double precision float: sign bit, 11 bits exponent, 52 bits mantissa

complex_ Shorthand for complex128.

complex64 Complex number, represented by two 32-bit floats (real and imaginary components)

complex128 Complex number, represented by two 64-bit floats (real and imaginary components)

以上这篇python dataframe astype 字段类型转换方法就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持聚米学院。

Original: https://blog.csdn.net/weixin_39616222/article/details/113674191
Author: weixin_39616222
Title: python类型转换astype时间_python dataframe astype 字段类型转换方法

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/298557/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

最近整理资源【免费获取】:   👉 程序员最新必读书单  | 👏 互联网各方向面试题下载 | ✌️计算机核心资源汇总