DataFrame的比较和缺失值比较

Pandas使用NumPy NaN(np.nan)对象表示缺失值。这是一个不等于自身的特殊对象
np.nan == np.nan
#False
Python的None对象是等于自身的
None == None
#True
所有和np.nan的比较都返回False,除了不等于:
9 > np.nan
#False
5 != np.nan
#True
college_ugds_所有值和.0019比较,返回布尔值DataFrame
college = pd.read_csv('data/college.csv', index_col='INSTNM')
college_ugds_ = college.filter(like='UGDS_')
college_ugds_.head() == .0019

UGDS_WHITEUGDS_BLACKUGDS_HISPUGDS_ASIANUGDS_AIANUGDS_NHPIUGDS_2MORUGDS_NRAUGDS_UNKNINSTNM Alabama A & M UniversityFalseFalseFalseTrueFalseTrueFalseFalseFalseUniversity of Alabama at BirminghamFalseFalseFalseFalseFalseFalseFalseFalseFalseAmridge UniversityFalseFalseFalseFalseFalseFalseFalseFalseFalseUniversity of Alabama in HuntsvilleFalseFalseFalseFalseFalseFalseFalseFalseFalseAlabama State UniversityFalseFalseFalseTrueFalseFalseFalseFalseFalse

用DataFrame和DataFrame进行比较

用DataFrame和DataFrame进行比较
college_self_compare = college_ugds_ == college_ugds_
college_self_compare.head()

UGDS_WHITEUGDS_BLACKUGDS_HISPUGDS_ASIANUGDS_AIANUGDS_NHPIUGDS_2MORUGDS_NRAUGDS_UNKNINSTNM Alabama A & M UniversityTrueTrueTrueTrueTrueTrueTrueTrueTrueUniversity of Alabama at BirminghamTrueTrueTrueTrueTrueTrueTrueTrueTrueAmridge UniversityTrueTrueTrueTrueTrueTrueTrueTrueTrueUniversity of Alabama in HuntsvilleTrueTrueTrueTrueTrueTrueTrueTrueTrueAlabama State UniversityTrueTrueTrueTrueTrueTrueTrueTrueTrue

用all()检查是否所有的值都是True;这是因为缺失值不互相等于

用all()检查是否所有的值都是True;这是因为缺失值不互相等于。
college_self_compare.all()

UGDS_WHITE    False
UGDS_BLACK    False
UGDS_HISP     False
UGDS_ASIAN    False
UGDS_AIAN     False
UGDS_NHPI     False
UGDS_2MOR     False
UGDS_NRA      False
UGDS_UNKN     False
dtype: bool

可以用==号判断,然后求和

可以用==号判断,然后求和
(college_ugds_ == np.nan).sum()

UGDS_WHITE    0
UGDS_BLACK    0
UGDS_HISP     0
UGDS_ASIAN    0
UGDS_AIAN     0
UGDS_NHPI     0
UGDS_2MOR     0
UGDS_NRA      0
UGDS_UNKN     0
dtype: int64

统计缺失值最主要方法是使用isnull方法:

统计缺失值最主要方法是使用isnull方法:
college_ugds_.isnull().sum()

UGDS_WHITE    661
UGDS_BLACK    661
UGDS_HISP     661
UGDS_ASIAN    661
UGDS_AIAN     661
UGDS_NHPI     661
UGDS_2MOR     661
UGDS_NRA      661
UGDS_UNKN     661
dtype: int64

比较两个DataFrame最直接的方法是使用equals()方法

比较两个DataFrame最直接的方法是使用equals()方法
from pandas.testing import assert_frame_equal
assert_frame_equal(college_ugds_, college_ugds_)
eq()方法类似于==,和前面的equals有所不同
college_ugds_.eq(.0019).head()

UGDS_WHITEUGDS_BLACKUGDS_HISPUGDS_ASIANUGDS_AIANUGDS_NHPIUGDS_2MORUGDS_NRAUGDS_UNKNINSTNM Alabama A & M UniversityFalseFalseFalseTrueFalseTrueFalseFalseFalseUniversity of Alabama at BirminghamFalseFalseFalseFalseFalseFalseFalseFalseFalseAmridge UniversityFalseFalseFalseFalseFalseFalseFalseFalseFalseUniversity of Alabama in HuntsvilleFalseFalseFalseFalseFalseFalseFalseFalseFalseAlabama State UniversityFalseFalseFalseTrueFalseFalseFalseFalseFalse

Original: https://blog.csdn.net/weixin_48135624/article/details/113828860
Author: 缘 源 园
Title: DataFrame的比较和缺失值比较

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/678765/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球