Pandas 模块-操纵数据(3)-iteration 遍历

2023年8月17日上午9:22 • Python • 阅读 50

3. DataFrame 类型的遍历过程

3.1 按行遍历 DataFrame.iterrows()

3.1.1 DataFrame.iterrows() 语法

3.1.2 DataFrame.iterrows() 范例

3.2 按行遍历 DataFrame.itertuples()

3.2.1 DataFrame.itertuples() 语法

3.2.2 DataFrame.itertuples() 范例

3.3 按列遍历 DataFrame.iteritems()

3.3.1 DataFrame.iteritems() 语法

3.3.2 DataFrame.iteritems() 范例

对于 pandas.DataFrame 有以下三种遍历方法

iterrows(): 按行遍历，将 DataFrame 的每一行迭代为 (index, data) 对，可以通过data[column_name] 和 data.column_name 对元素进行访问。
itertuples(): 按行遍历，将 DataFrame 的每一行迭代为元祖，可以通过data[ 列号数值 ] 和 data.column_name 对元素进行访问，不能使用 row[ column_name ]对元素进行访问，比 iterrows() 效率高。
iteritems():按列遍历，将 DataFrame 的每一列迭代为(label, content)对，可以通过content[ index ] 对元素进行访问。
DataFrame 类型的遍历过程

先准备数据

import pandas as pd
import numpy as np
import pymysql
conn=pymysql.connect(host="127.0.0.1",user="root",password="wxf123",database="ivydb")
data=pd.read_sql('''SELECT * FROM  human;''', con = conn)
data

生成数据如下

3.1 按行遍历 DataFrame.iterrows()

3.1.1 DataFrame.iterrows() 语法

首先，DataFrame.iterrows() 函数没有参数

其次，DataFrame.iterrows() 返回 Iterable 的 [index,data] 对，可以理解 index 即行名，data 即此行的数据，为 Series 类型。既然是 Iterable 类型的，意味着可以用 next 来逐步读取。

再次，对于读出来的 data，可以通过 data[column_name] 读取具体的某个元素

最后，请注意应该永远不要修改您正在迭代的内容。这并不能保证在所有情况下都有效。取决于数据类型，迭代器返回的是一个副本而不是一个视图，如果你视图写入，这样做是没有效果的。

简单说，我建议在所有迭代过程中，都不要有写入过程。

Help on method iterrows in module pandas.core.frame:

iterrows() -> 'Iterable[Tuple[Label, Series]]' method of pandas.core.frame.DataFrame instance
    Iterate over DataFrame rows as (index, Series) pairs.

    Yields
    ------
    index : label or tuple of label
        The index of the row. A tuple for a MultiIndex.

    data : Series
        The data of the row as a Series.

    See Also
    --------
    DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.

    DataFrame.items : Iterate over (column name, Series) pairs.

    Notes
    -----
    1. Because  returns a Series for each row,
       it does **not** preserve dtypes across the rows (dtypes are
       preserved across columns for DataFrames). For example,

       >>> df = pd.DataFrame([[1, 1.5]], columns=['int', 'float'])
       >>> row = next(df.iterrows())[1]
       >>> row
       int      1.0
       float    1.5
       Name: 0, dtype: float64
       >>> print(row['int'].dtype)
       float64
       >>> print(df['int'].dtype)
       int64

       To preserve dtypes while iterating over the rows, it is better
       to use :meth:itertuples which returns namedtuples of the values
       and which is generally faster than .

    2. You should **never modify** something you are iterating over.

       This is not guaranteed to work in all cases. Depending on the
       data types, the iterator returns a copy and not a view, and writing
       to it will have no effect.

3.1.2 DataFrame.iterrows() 范例

代码范例，此处使用大家最熟悉的 for 循环

for rowname,row in data.iterrows():
    print("*"*50)
    print(rowname)
    print(type(row))
    print(row)

结果如下，可以看到不同的行名和行数据，

**************************************************
0

id                   1
title          Teacher
age                 36
location       Beijing
comment     1982-01-01
Name: 0, dtype: object
**************************************************
1

id                   2
title           NewMan
age                  3
location      Shanghai
comment     1983-02-01
Name: 1, dtype: object
**************************************************
2

id                   3
title        Policeman
age                 33
location       Beijing
comment     1984-05-09
Name: 2, dtype: object

......................................................

9

id                  10
title           Singer
age                 22
location       Nanjing
comment     1982-01-01
Name: 9, dtype: object

如果想对某个元素来进行读取，有两种方式，第一种是 row.column_name

print(row.id)
print(row.title)
print(row.age)
print(row.location)
print(row.comment)
print(row.name)

运行结果如下

第二种方式是 row[column_name] 方式

print(row["id"])
print(row["title"])
print(row["age"])
print(row["location"])
print(row["comment"])
print(row["name"]) 不能用这个方式读 row 的名字，只能用 row. name 方式

运行结果如下

3.2 按行遍历 DataFrame.itertuples()

itertuples() 也是按照行来进行迭代，和 iterrows() 一样将返回一个迭代器，该方法会把 DataFrame 的每一行生成一个 元组，最关键的是比 iterrows() 效率高。。

3.2.1 DataFrame.itertuples() 语法

itertuples(index: ‘bool’ = True, name: ‘Optional[str]’ = ‘Pandas’)

首先，和 iterrows() 不一样，itertuples() 有两个参数。

index：布尔值，默认为 True，即返回的每行数据里面是否包含 index，如果为 False，则不包含

name：字符串或者为 None，默认为 “Pandas”，是返回的namedtuples的名字，如果为None，则名字也为空。

其次，.itertuples() 返回的是默认是’pandas.core.frame. Pandas‘，是元组类型

Help on method itertuples in module pandas.core.frame:

itertuples(index: 'bool' = True, name: 'Optional[str]' = 'Pandas') method of pandas.core.frame.DataFrame instance Iterate over DataFrame rows as namedtuples.

Parameters
----------
index : bool, default True
    If True, return the index as the first element of the tuple.

name : str or None, default &quot;Pandas&quot;
    The name of the returned namedtuples or None to return regular
    tuples.

Returns
-------
iterator
    An object to iterate over namedtuples for each row in the
    DataFrame with the first field possibly being the index and
    following fields being the column values.

See Also
--------
DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)
    pairs.

DataFrame.items : Iterate over (column name, Series) pairs.

Notes
-----
The column names will be renamed to positional names if they are
invalid Python identifiers, repeated, or start with an underscore.

On python versions &lt; 3.7 regular tuples are returned for DataFrames
with a large number of columns (&gt;254).

Examples
--------
&gt;&gt;&gt; df = pd.DataFrame({&apos;num_legs&apos;: [4, 2], &apos;num_wings&apos;: [0, 2]},
...                   index=[&apos;dog&apos;, &apos;hawk&apos;])
&gt;&gt;&gt; df
      num_legs  num_wings
dog          4          0
hawk         2          2
&gt;&gt;&gt; for row in df.itertuples():
...     print(row)
...

Pandas(Index=&apos;dog&apos;, num_legs=4, num_wings=0)
Pandas(Index=&apos;hawk&apos;, num_legs=2, num_wings=2)

By setting the index parameter to False we can remove the index
as the first element of the tuple:

&gt;&gt;&gt; for row in df.itertuples(index=False):
...     print(row)
...

Pandas(num_legs=4, num_wings=0)
Pandas(num_legs=2, num_wings=2)

With the name parameter set we set a custom name for the yielded
namedtuples:

&gt;&gt;&gt; for row in df.itertuples(name=&apos;Animal&apos;):
...     print(row)
...

Animal(Index=&apos;dog&apos;, num_legs=4, num_wings=0)
Animal(Index=&apos;hawk&apos;, num_legs=2, num_wings=2)

3.2.2 DataFrame.itertuples() 范例

现在我简化一下数据，这样可以看得更加清楚点

1) index 和 name 都为默认的情况

for row  in data.itertuples():
    print("*"*50)
    print(row)
    print(type(row))

运行结果如下，可以看得结果中包含了 index，type 出来的类型名为 ‘pandas.core.frame. Pandas‘

如果想读取具体的元素，如下

print(row.id)
print(row.title)
print(row.age)
print(row.location)
#print(row.name) 此时不可读 row 的名字
print(row.index)
print(row.Index)

运行结果

此外，因为.itertuples() 返回的是 tuple 类型，所以不能使用 row[column_name]的方式读取

可以使用使用 row[column_no]的方式读取

print(row[0:3])

运行结果

2) 如果 index= False，name=”NewPandas”

for row  in data.itertuples(index=False,name="NewPandas"):
    print("*"*50)
    print(row)
    print(type(row))

运行结果如下：

可以看得结果中不再包含了 index，type 出来的类型名为 ‘pandas.core.frame. NewPandas‘

3.3 按列遍历 DataFrame.iteritems()

DataFrame.iteritems()

3.3.1 DataFrame.iteritems() 语法

首先，.iteritems() 没有参数

其次，.iteritems() 生成[label，content] 数据对，对于具体的元素，可以通过 content[index] 和content.index 来读取

最后，

Help on method iteritems in module pandas.core.frame:

iteritems() -> 'Iterable[Tuple[Label, Series]]' method of pandas.core.frame.DataFrame instance
    Iterate over (column name, Series) pairs.

    Iterates over the DataFrame columns, returning a tuple with
    the column name and the content as a Series.

    Yields
    ------
    label : object
        The column names for the DataFrame being iterated over.

    content : Series
        The column entries belonging to each label, as a Series.

    See Also
    --------
    DataFrame.iterrows : Iterate over DataFrame rows as
        (index, Series) pairs.

    DataFrame.itertuples : Iterate over DataFrame rows as namedtuples
        of the values.

    Examples
    --------
    >>> df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
    ...                   'population': [1864, 22000, 80000]},
    ...                   index=['panda', 'polar', 'koala'])
    >>> df
            species   population
    panda   bear      1864
    polar   bear      22000
    koala   marsupial 80000
    >>> for label, content in df.items():
    ...     print(f'label: {label}')
    ...     print(f'content: {content}', sep='\n')
    ...

    label: species
    content:
    panda         bear
    polar         bear
    koala    marsupial
    Name: species, dtype: object
    label: population
    content:
    panda     1864
    polar    22000
    koala    80000
    Name: population, dtype: int64

3.3.2 DataFrame.iteritems() 范例

代码范例，此处使用大家最熟悉的 for 循环

for columnname,column  in data.iteritems():
    print("*"*50)
    print(columnname)
    print(type(columnname))
    print(column)
    print(type(column))

结果如下，可以看到不同的列名和列数据，

**************************************************
id

1    2
2    3
3    4
Name: id, dtype: int64

**************************************************
title

1       NewMan
2    Policeman
3    CodingMan
Name: title, dtype: object

**************************************************
age

1     3
2    33
3    32
Name: age, dtype: int64

**************************************************
location

1    Shanghai
2     Beijing
3     Nanjing
Name: location, dtype: object

因为返回的 content (即代码中的 column) 是 series 类型，所以相关的读取可以参看 Series。

Original: https://blog.csdn.net/u010701274/article/details/121768096
Author: 江南野栀子
Title: Pandas 模块-操纵数据(3)-iteration 遍历

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/751950/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

关于java.sql.SQLNonTransientConnectionException: No operations allowed after connection closed.的解决方案

声明：此异常在我本身项目中的出现，可能和别人的原因不一样。今天用serlvet连接数据库的时候，执行项目时出现java.sql.SQLNonTransientConnectio…

Python 2023年11月7日
0047
计算机网络最新复习【太原理工大学】

一、题型二、考点选择题 40 个，每个 1 分，共 40 分。（大部分可一眼看出答案）填空题 15 个，每个 1 分，共 15 分。（需简单记忆）简答题 5 个，每个 5 …

Python 2023年9月26日
0042
Redis/Mysql/SQLite/MongoDB 数据库对比

一、Redis： redis是一个key-value存储系统。和Memcached类似，它支持存储的value类型相对更多，包括string(字符串)、list(链表)、set(集…

Python 2023年6月10日
00136
机器学习中的数据预处理方法与步骤

数据预处理是准备原始数据并使其适用于机器学习模型的过程。这是创建机器学习模型的第一步，也是至关重要的一步。在创建机器学习项目时，我们并不总是遇到干净且格式化的数据。并且在对数据进…

Python 2023年10月26日
0059
python 画条形图加误差线_matplotlib 数据可视化 – 条形图

import numpy as np import matplotlib.pyplot as pt 直方图相关函数hist(),该函数用于生成直方图，它会返回一个元组结果，包含对…

Python 2023年9月5日
0070
自注意力机制与注意力机制

基本内容理解的话推荐看一下这篇博客Transformer：注意力机制（attention）和自注意力机制（self-attention）的学习总结，这个博主讲的很细致，比较容易理解…

Python 2023年9月28日
0071
认识一下 Mobx

我们是袋鼠云数栈 UED 团队，致力于打造优秀的一站式数据中台产品。我们始终保持工匠精神，探索前端道路，为社区积累并传播经验价值。本文作者：霜序(LuckyFBB) 前言在之前…

Python 2023年10月11日
0036
python塔防之“双层箭塔”（一）

欢迎加入我们卧虎藏龙的python讨论qq群：729683466 长文预警 ● 导语 ● 很长时间没有更新了这段时间我一直在写塔防游戏没有时间更新公众号今天塔防游戏有完成…

Python 2023年9月23日
0043
application.properties与application.yml之间的区别

注意：application.yml：冒号后面都需要有空格 Original: https://www.cnblogs.com/gsxm/p/16516672.htmlAuthor…

Python 2023年6月10日
0071
【偷偷卷死小伙伴Pytorch20天】-【day11】-【张量的结构操作】

系统教程20天拿下Pytorch最近和中哥、会哥进行一个小打卡活动，20天pytorch，这是第11天。欢迎一键三连。后面可能会考虑加速，开学前刷完。文章目录一、创建张量二、…

Python 2023年8月27日
0078
什么是Web3 ?它是如何工作的?

Web3提供了一种潜在的解决方案，可以更容易地在万维网上找到内容的原始来源。我们将讨论Web 3是什么以及它是如何工作的。万维网一直以来都是一个不受限制地创造和分享信息和思想的平…

Python 2023年11月8日
0032
分享9个一般人不知道的Python骚操作，让你的代码更上一层楼

Original: https://www.cnblogs.com/pythonQqun200160592/p/15723019.htmlAuthor: python可乐编程Tit…

Python 2023年5月24日
0078
【黑马-python】—学习笔记(4）—项目实战及Vi学习

### 回答1： 2019年黑马项目-畅购商城springcloud微服务实战是一门以实战为主的课程，旨在通过项目_实践的方式，帮助学员深入理解和掌握SpringCloud微…

Python 2023年9月24日
0053
Python 跨文件调用函数 + 在一个文件中执行另一个文件

在很多时候，为了保持代码与代码文件的简洁和逻辑清晰，通常会将许多相类似的函数放在一个文件中，在一个中去调用它们。调用 Python 函数的时候，大致会存在如下几种情况。假如现在…

Python 2023年8月1日
0057
python实现词语统计并柱状图显示

(1）实现一篇文档的读入功能; (2)筛选出文档中出现重复的字的词频数量; (3)用柱状图显示重复的字出现次数前十的数据; (4)包括简单的异常处理功能; (5)要求功能完整,无明…

Python 2023年9月3日
0040
Asp.net core 少走弯路系列教程（cnblogs 博客园首发）

这是专门为正在学习的新人准备的福利教程，在国庆节发表了八篇文章，请务必从第一篇开始看。如果有疑问加QQ群：560611514【.NET C#爱好者】，里面有新人同学，也有老师傅帮…

Python 2023年10月19日
0042

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Pandas 模块-操纵数据(3)-iteration 遍历

3.1 按行遍历 DataFrame.iterrows()

3.1.1 DataFrame.iterrows() 语法

3.1.2 DataFrame.iterrows() 范例

3.2 按行遍历 DataFrame.itertuples()

3.2.1 DataFrame.itertuples() 语法

3.2.2 DataFrame.itertuples() 范例

3.3 按列遍历 DataFrame.iteritems()

3.3.1 DataFrame.iteritems() 语法

3.3.2 DataFrame.iteritems() 范例

大家都在看