目录
*
– DataFrame中单个数据的缺失值判断
– 获得DataFrame缺失值情况
– 保留某列为空值的行
– 保留不为空的行
– 统计每一列缺失值情况
–
+ 最简单的可用.info()
函数
+ .isna()
函数结合.sum()
画图
import numpy as np
import pandas as pd
df = pd.DataFrame(dict(age=[5, 6, np.NaN],
born=[pd.NaT, pd.Timestamp('1939-05-27'),
pd.Timestamp('1940-04-25')],
name=['Alfred', 'Batman', ''],
toy=[None, 'Batmobile', 'Joker']))
'''
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
'''
DataFrame中单个数据的缺失值判断
pandas中单个缺失值实际上为 np.NaN
,
对于单个值x,不可直接用 x == np.NaN
,可用 np.isnan(x)
函数判断
df['age'][2] == np.NaN False
df['age'][2] is np.NaN False
np.isnan(df['age'][2]) True
获得DataFrame缺失值情况
df.isna()
age born name toy
0 False True False True
1 False False False False
2 True False False False
保留某列为空值的行
可综合 .loc[]
与 .isna()
函数综合判断
先看df.loc函数的定义
.loc[]
is primarily label based, but may also be used with a
boolean array.
Allowed inputs are:
- A single label, e.g.
5
or
'a'
, (note that
5
is
interpreted as a *label* of the index, and **never** as an
integer position along the index).
- A list or array of labels, e.g.
['a', 'b', 'c']
.
- A slice object with labels, e.g.
'a':'f'
.
.. warning:: Note that contrary to usual python slices, **both** the
start and the stop are included
- A boolean array of the same length as the axis being sliced,
e.g.
[True, False, True]
.
- An alignable boolean Series. The index of the key will be aligned before
masking.
- An alignable Index. The Index of the returned selection will be the input.
- A
function with one argument (the calling Series or
DataFrame) and that returns valid output for indexing (one of the above)
从可接受输入的倒数第二个点开始,您只需要获得需要判断空值的相应列列表。
[En]
Starting from the penultimate point of acceptable input, you only need to get the corresponding list of columns that need to judge null values.
故可凭借 .isna()
函数返回的数据帧,进一步选择对应列即可达到目的
df.loc[ df.isna()['age'] ]
age born name toy
2 NaN 1940-04-25 Joker
保留不为空的行
直接使用 .dropna()
函数即可
df.dropna(subset = ['age'])
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
统计每一列缺失值情况
探索性数据分析(EDA)中需要初步了解数据,缺失值情况是很重要的一点。
最简单的可用 .info()
函数
`python
df.info()
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
Original: https://blog.csdn.net/MavPig/article/details/115743654
Author: Maverick pig
Title: 对Python-pandas中缺失值问题的归纳
相关阅读
Title: python类型转换astype时间_python dataframe astype 字段类型转换方法
使用astype实现dataframe字段类型转换 # –– coding: UTF-8 ––
import pandas as pd
df = pd.DataFrame([{‘col1′:’a’, ‘col2′:’1’}, {‘col1′:’b’, ‘col2′:’2’}])
print df.dtypes
df[‘col2’] = df[‘col2’].astype(‘int’)
print ‘———–‘
print df.dtypes
df[‘col2’] = df[‘col2’].astype(‘float64’)
print ‘———–‘
print df.dtypes
输出结果: col1 object
col2 object
dtype: object
col1 object
col2 float64
dtype: object
注:data type list Data type Description
bool_ Boolean (True or False) stored as a byte
int_ Default integer type (same as C long; normally either int64 or int32)
intc Identical to C int (normally int32 or int64)
intp Integer used for indexing (same as C ssize_t; normally either int32 or int64)
int8 Byte (-128 to 127)
int16 Integer (-32768 to 32767)
int32 Integer (-2147483648 to 2147483647)
int64 Integer (-9223372036854775808 to 9223372036854775807)
uint8 Unsigned integer (0 to 255)
uint16 Unsigned integer (0 to 65535)
uint32 Unsigned integer (0 to 4294967295)
uint64 Unsigned integer (0 to 18446744073709551615)
float_ Shorthand for float64.
float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
float32 Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
float64 Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
complex_ Shorthand for complex128.
complex64 Complex number, represented by two 32-bit floats (real and imaginary components)
complex128 Complex number, represented by two 64-bit floats (real and imaginary components)
以上这篇python dataframe astype 字段类型转换方法就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持聚米学院。
Original: https://blog.csdn.net/weixin_39616222/article/details/113674191
Author: weixin_39616222
Title: python类型转换astype时间_python dataframe astype 字段类型转换方法
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/298557/
转载文章受原作者版权保护。转载请注明原作者出处!