pandas DataFrame 用法–查看和选择数据

2023年8月17日下午8:30 • Python • 阅读 107

1. 使用 .head() 查看 DataFrame 头部数据

2. 使用 .tail() 查看 DataFrame 尾部数据

3. 使用 .describe() 查看 DataFrame 统计数据

4. 使用 .T 查看 DataFrame 转置数据

5. at 函数：通过行名和列名来取值

6.iat 函数：通过行号和列号来取值

7. loc函数主要通过行标签索引行数据

8. iloc函数主要通过行号、索引行数据

9.ix——通过行标签或者行号索引行数据

10. 使用布尔索引查看符合要求的数据

11. 使用 sample() 查看随机数据

12. 使用 isin() 查看数据是否符合条件

13. 使用 .shape 查看查看行数和列数

14. 使用 .info() 查看索引、数据类型和内存信息

在使用各种api之前，先创建测试使用数据：

代码：

import numpy as np
import pandas as pd
dict_data={"a":list("abcdef"),"b":list("defghi"),"c":list("ghijkl")}
df=pd.DataFrame.from_dict(dict_data)
df

运行结果：

Out[1]:

abc0adg1beh2cfi3dgj4ehk5fil

使用 .head() 查看 DataFrame 头部数据

.head([n]) 用法如下，如果 n 为空，则默认为 5

In [12]: df.head(0)
Out[12]:
In [13]: df.head(1)
Out[13]:
abc0adgIn [16]: df.head(3)
Out[16]:
abc0adg1beh2cfi

使用 .tail() 查看 DataFrame 尾部数据

.tail([n])，如果 n 为空，则默认为 5

In [18]: df.tail(0)
Out[18]:
In [19]: df.tail(1)
Out[19]:
abc5filIn [20]: df.tail(3)
Out[20]:
abc3dgj4ehk5fil

使用 .describe() 查看 DataFrame 统计数据

.describe 语法如下

Help on function describe in module pandas.core.generic:

describe(self: 'FrameOrSeries', percentiles=None, include=None, exclude=None, datetime_is_numeric=False) -> 'FrameOrSeries'
    Generate descriptive statistics.

    Descriptive statistics include those that summarize the central
    tendency, dispersion and shape of a
    dataset's distribution, excluding  values.

    Analyzes both numeric and object series, as well
    as  column sets of mixed data types. The output
    will vary depending on what is provided. Refer to the notes
    below for more detail.

    Parameters
    ----------
    percentiles : list-like of numbers, optional
        The percentiles to include in the output. All should
        fall between 0 and 1. The default is
        [.25, .5, .75], which returns the 25th, 50th, and
        75th percentiles.

    include : 'all', list-like of dtypes or None (default), optional
        A white list of data types to include in the result. Ignored
        for . Here are the options:

        - 'all' : All columns of the input will be included in the output.

        - A list-like of dtypes : Limits the results to the
          provided data types.

          To limit the result to numeric types submit
          .number. To limit it instead to object columns submit
          the .object data type. Strings
          can also be used in the style of
          select_dtypes (e.g. .describe(include=['O'])). To
          select pandas categorical columns, use 'category'
        - None (default) : The result will include all numeric columns.

    exclude : list-like of dtypes or None (default), optional,
        A black list of data types to omit from the result. Ignored
        for . Here are the options:

        - A list-like of dtypes : Excludes the provided data types
          from the result. To exclude numeric types submit
          .number. To exclude object columns submit the data
          type .object. Strings can also be used in the style of
          select_dtypes (e.g. .describe(include=['O'])). To
          exclude pandas categorical columns, use 'category'
        - None (default) : The result will exclude nothing.

    datetime_is_numeric : bool, default False
        Whether to treat datetime dtypes as numeric. This affects statistics
        calculated for the column. For DataFrame input, this also
        controls whether datetime columns are included by default.

.describe() 默认是对数值类进行统计

In [21]: df.describe()
Out[21]:
abccount666unique666topaejfreq111

也可以通过 include=object 来获得对其他的统计

例如当前数据

获得两种不同的结果

使用 .T 查看 DataFrame 转置数据

In [24]: df.T

Out[24]:
012345aabcdefbdefghicghijkl

at 函数 ：通过行名和列名来取值

先创建数据吧

import pandas as pd
import pdb
#pdb.set_trace()
dict_data={"X":list("abcdef"),"Y":list("defghi"),"Z":list("ghijkl")}
df=pd.DataFrame.from_dict(dict_data)
df.index=["A","B","C","D","E","F"]
df

生成如下 DataFrame

用法太简单了，直接把 at 和 iat 都运行上。

A 行 X 列数据,必须两个数据都输入，否则报错
print(df.at["A","X"])
第二 行 第二 列数据，序号从0开始
print(df.iat[2,2])

运行结果

a
i

iat 函数 ：通过行号和列号来取值

请参考7，请注意 at 是按照行名和列名来定位某个元素，而 iat 是按照行号和列号来定位某个元素。

loc函数主要通过行标签索引行数据

当前 df 如下

loc 非常简单，直接看完代码就明白了

指定行名和列名的方式，和at的用法相同
print(df.loc["A","X"],"\n","*"*20)

可以完整切片,这是 at 做不到的
print(df.loc[:,"X"],"\n","*"*20)

可以从某一行开始切片
print(df.loc["B":,"X"],"\n","*"*20)

可以只切某一列
print(df.loc["B",:],"\n","*"*20)

和指定上一条代码效果是一样的
print(df.loc["B"],"\n","*"*20)

运行结果

a
 ********************
A    a
B    b
C    c
D    d
E    e
F    f
Name: X, dtype: object
 ********************
B    b
C    c
D    d
E    e
F    f
Name: X, dtype: object
 ********************
X    b
Y    e
Z    h
Name: B, dtype: object
 ********************
X    b
Y    e
Z    h
Name: B, dtype: object
 ********************

8. iloc函数主要通过行号、索引行数据

当前 df 如下

和 iloc 用法非常类似，直接看代码吧，不再多说

指定行号和列号的方式，和 loc 的用法相同
print(df.iloc[0,0],"\n","*"*20)

可以完整切片
print(df.iloc[:,0],"\n","*"*20)

可以从某一行开始切片
print(df.iloc[1:,0],"\n","*"*20)

可以只切某一列
print(df.iloc[1,:],"\n","*"*20)

和指定上一条代码效果是一样的
print(df.iloc[1],"\n","*"*20)

运行结果

a
 ********************
A    a
B    b
C    c
D    d
E    e
F    f
Name: X, dtype: object
 ********************
B    b
C    c
D    d
E    e
F    f
Name: X, dtype: object
 ********************
X    b
Y    e
Z    h
Name: B, dtype: object
 ********************
X    b
Y    e
Z    h
Name: B, dtype: object
 ********************

9. ix——通过行标签或者行号索引行数据

ix 是基于loc和iloc 的混合，但是现在已经被弃用了。说实话我很喜欢这种弃用，确实它能做的事情，用上面的 loc 和 iloc 也能做到，就不再赘述。

10. 使用布尔索引查看符合要求的数据

当前 df 如下

使用 df[] 切片取出符合筛选条件的数据，& 是条件与，| 是条件或。

取一行数据，这行数据符合两个条件。
1）Y 列 字符的 ASCI 码大于字符 g，
2）Z 列 字符的 ASCI 码小于字符 x，
print(df[(df["Y"].str.lower()>"g")&(df["Z"].str.lower()

运行结果还是挺理想的

X  Y  Z
E  e  h  k
F  f  i  l
 ********************
   X  Y  Z
A  a  d  g

11. 使用 sample() 查看随机数据

语法：DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)

参数说明：

n：这是一个可选参数, 由整数值组成, 并定义生成的随机行数。
frac：它也是一个可选参数, 由浮点值组成, 并返回浮点值数据帧值的长度。 不能与参数n一起使用。*
replace：由布尔值组成，默认值是false。如果为true, 则返回带有替换的样本。
权重：它也是一个可选参数, 由类似于str或ndarray的参数组成。默认值”无”将导致相等的概率加权。
random_state：它也是一个可选参数, 由整数或numpy.random.RandomState组成。如果值为int, 则为随机数生成器或numpy RandomState对象设置种子。
axis：它也是由整数或字符串值组成的可选参数。 0或”行”和1或”列”。

这里只介绍最简单的用法。

print("*"*20)
print(df.sample())
print("*"*20)
print(df.sample())
print("*"*20)
print(df.sample())
print("*"*20)
print(df.sample())

运行结果每次都不一样

********************
   X  Y  Z
A  a  d  g
********************
   X  Y  Z
F  f  i  l
********************
   X  Y  Z
A  a  d  g
********************
   X  Y  Z
B  b  e  h

12. 使用 isin() 查看数据是否符合条件

语法：dataframe.isin(values)，values 可以是dataframe，也可以是一列数据。

#可以整个 dataframe 进行比较
df2=df.copy()
print(id(df)) # 此处用 id ，是为了注明两个dataframe 内存已经不一样了
print(id(df2))
df["G"]=list("MKLHGF")
df2.isin(df)

运行结果

XYZATrueTrueTrueBTrueTrueTrueCTrueTrueTrueDTrueTrueTrueETrueTrueTrueFTrueTrueTrue

取一列进行比较

#取G列 进行值比较
df.G.isin(list("MKLHGF"))

运行结果

A    True
B    True
C    True
D    True
E    True
F    True
Name: G, dtype: bool

13. 使用 .shape 查看查看行数和列数

太简单了

In [23]: df.shape

Out[23]:

(6, 3)

14. 使用 .info() 查看索引、数据类型和内存信息

`
In [21]: df.info()

RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
# Column Non-Null Count Dtype

Original: https://blog.csdn.net/u010701274/article/details/121252117
Author: 江南野栀子
Title: pandas DataFrame 用法–查看和选择数据

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/752412/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

展锐UDX710：MMC概述、SD Card驱动解析及调试

; 一、MMC概述相关定义符合MMC协议接口的存储器，都称为MMC存储体MMC总线，类似于I2C、SPI总线的一种总线结构使用MMC接口规范(HCI, Multimedia C…

Python 2023年11月7日
0030
Anaconda环境离线迁移_CondaPackError处理

在使用Python进行程序开发时，避免不了使用各类的Python包，以最大限度地通过复用模块来减少我们手动编写的代码量，使我们能够更加专注于主体业务逻辑相关的核心代码。谈到Pyth…

Python 2023年9月9日
0034
(conda + pip) 配置各版本 Pytorch 深度学习环境

目录 * – + 1. 前言 + 2. 配置镜像源 + 3. pytorch，torchvision，python 版本对应 + 4. 创建并进入虚拟环境 + 5. P…

Python 2023年9月7日
0042
并发编程之线程池

线程池为什么需要线程池？如果性能允许的话，我们完全可以在 for 循环代码起很多的线程去帮我们执行任务，代码如下 public class ManyThread { publi…

Python 2023年10月17日
0053
ubuntu18.04下，利用shell脚本启动conda虚拟环境，并实现python程序自启动

一、利用shell脚本启动conda虚拟环境1.创建一个run.sh文件，将.bashrc文件中的conda 相关内容复制到sh文件中，具体内容如下：（.bashrc文件在root…

Python 2023年9月8日
0041
学人工智能电脑&主机八大件配置选择指南

来源：深度之眼作者：frank编辑：学姐 < 分为硬件篇x1 跟软件篇x3硬件篇1：主机八大件的选购软件篇1：AI开发过程中常用开发命令、软件安装等软件篇2：软件：Ana…

Python 2023年11月5日
0050
matplotlib.pyplot使用汇总

本文介绍了我在工程开发过程中使用python的matplotlib.pyplot常用的一些功能。引用matplotlib.pyplot import matplotlib.pyp…

Python 2023年9月2日
0084
【十分钟】学会微信小游戏，攀登不止小游戏制作（IVX 快速开发教程十一）

十一、攀登不止小游戏制作制作微信小游戏大致流程与微信小程序、Web类似，不同的在于是组件的使用。我们此节需要完成的小游戏需求为：小球触碰矩形块会跳跃或攀爬小球触碰顶部或底部游…

Python 2023年9月17日
0090
编程入门-字符串翻转

问题：从键盘输入一个字符串，输出它的翻转串。例如，输入：”1234567abc” 则应该输出：”cba7654321″ 思路1：…

Python 2023年6月6日
0080
爬虫日记(84)：Scrapy的Crawler类（一）

Crawler类是一个爬虫类，主要用来管理整个执行引擎ExecutionEngine类和蜘蛛类实例化。在分析这个类之前，我们先来看一下怎么样调用这个类的，代码如下：在调用_cre…

Python 2023年10月5日
0039
Pytest测试框架系列 – pytest 添加自定义命令行参数

前言我们先来考虑一下如果存在下面场景，我们在不改变代码的情况实现：一般来说公司存在几套环境，例如回归测试用例，我们需要在不同的环境进行测试，自动化用例有时候也需要支持在不同测试…

Python 2023年9月10日
0044
Python安装第三方库

目录一、一般情况的安装 1、普通安装 2、指定版本安装二、镜像安装 1、普通安装 2、指定版本安装三、自行下载文件安装 1、下载 2、安装期末报告跑代码，用到很多pytho…

Python 2023年8月3日
0046
Python机器学习13——主成分分析

本系列所有的代码和数据都可以从陈强老师的个人主页上下载：Python数据程序参考书目：陈强.机器学习及Python应用. 北京：高等教育出版社, 2021. 本系列基本不讲数学原…

Python 2023年8月1日
0050
除了 filter 还有什么置灰网站的方式？

大家都知道，当一些重大事件发生的时候，我们的网站，可能需要置灰，像是这样：当然，通常而言，全站置灰是非常简单的事情，大部分前端同学都知道，仅仅需要使用一行 CSS，就能实现全站置…

Python 2023年10月13日
0039
Flask 数据库表关系(一对多和多对多）及增删改查操作

关系查询一、一对多关系的建立: 一方建立关系, 多方建立外键 #种类模型 class Ca…

Python 2023年8月12日
0047
Flask（5）- 动态路由

前言前面几篇文章讲的路由路径(rule)都是固定的，就是一个路径和一个视图函数绑定，当访问这条路径时会触发相应的处理函数这样无法处理复杂的情况，比如常见的一个课程分类下有很多个…

Python 2023年8月15日
0051

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

pandas DataFrame 用法–查看和选择数据

大家都在看