Pandas 模块-操纵数据(6)-DataFrame 使用自定义函数

2023年7月18日上午2:28 • 人工智能 • 阅读 41

6. DataFrame 使用自定义函数

6.1 操作整个 DataFrame 的函数：.pipe()

6.1.1 .pipe() 语法

6.1.2 .pipe() 范例

6.2 操作行或者列的函数：.apply()

6.2.1 .apply() 语法

6.2.2 .apply() 范例

6.3 操作作单一元素的函数：.applymap(）

6.3.1 .applymap(）语法

6.3.2 .applymap(）范例

如果想要应用自定义的函数，或者把其他库中的函数应用到 Pandas.DataFrame 对象中，有以下三种方法：

操作整个 DataFrame 的函数：pipe()
操作行或者列的函数：apply()
操作单一元素的函数：applymap(）
DataFrame 使用自定义函数

先准备数据吧

import pandas as pd
dict_data={"a":list("abcdef"),"b":list("defghi"),"c":list("ghijkl")}
df=pd.DataFrame.from_dict(dict_data)
df

得到结果

6.1 操作整个 DataFrame 的函数：.pipe()

6.1.1 .pipe() 语法

语法结构：DataFrame.pipe(func, args, *kwargs)

参数说明：

func：一个应用于Series/DataFrame的函数，args, *kwargs都是应用于这个函数的参数
args：迭代的参数，可选，可以是元组类型，也可以是列表类型或者其他。
kwargs：映射的参数，可选，是一个包含关键字的字典。

返回值：返回值由 func 的返回值决定

请注意，使用.pipe(）时候，默认不会修改 DataFrame 本身

Help on method pipe in module pandas.core.generic:

pipe(func, *args, **kwargs) method of pandas.core.frame.DataFrame instance
    Apply func(self, \*args, \*\*kwargs).

    Parameters
    ----------
    func : function
        Function to apply to the Series/DataFrame.

        , and  are passed into .

        Alternatively a (callable, data_keyword) tuple where
        data_keyword is a string indicating the keyword of
         that expects the Series/DataFrame.

    args : iterable, optional
        Positional arguments passed into .

    kwargs : mapping, optional
        A dictionary of keyword arguments passed into .

    Returns
    -------
    object : the return type of .

    See Also
    --------
    DataFrame.apply : Apply a function along input axis of DataFrame.

    DataFrame.applymap : Apply a function elementwise on a whole DataFrame.

    Series.map : Apply a mapping correspondence on a
        :class:~pandas.Series.

    Notes
    -----
    Use .pipe when chaining together functions that expect
    Series, DataFrames or GroupBy objects. Instead of writing

6.1.2 .pipe() 范例

.pipe(）的用法非常简单，先看个代码就明白了

def f(dataframe,*args,**kwargs):
    for tmparg in args:
        dataframe+="-"+str(tmparg)
    for tmpkey,tmpvalue in kwargs.items():
        dataframe+="-"+str(tmpkey)+str(tmpvalue)
    return dataframe
print(df)
params_tuple=list(range(3))
params_dict={"A":1,"B":2}
df.pipe(f,*params_tuple,**params_dict)

运行结果如下：

6.2 操作行或者列的函数：.apply()

6.2.1 .apply() 语法

语法结构：DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)

参数说明：

func： 一个应用于每一行或者每一列的函数。
axis：{0 or ‘index’, 1 or ‘columns’}, 默认为 0，即对行进行操作，如果设置为 1 或者 ‘columns’ 则对列进行操作。
raw ：布尔值，默认为 False，这个值了决定行或者列是作为 Series 还是 ndarray类型进行传递；当值为 False 时候，每一行或者每一列都作为 Series 进行传递，为 True 时候则作为 ndarray 对象进行传递。一般情况下只有你进行 NumPy 的函数活动时候，选择 True 会获得更好的性能回报。
result_type：{‘expand’, ‘reduce’, ‘broadcast’, None}, 默认为 None 。这个只有 axis=1，即对列 (columns) 进行操作时候才起作用。
‘expand’ : 类似列表 list 的结果将转换为列。
‘reduce’ : 如果可能，返回一个序列 series ，而不是展开类似列表 list 的结果。

这与”expand”相反。

‘broadcast’ : 结果将广播到数据帧的原始形状，原始索引和列将保留。
默认行为 ‘None’ 取决于应用函数的返回值：类似列表list的结果将作为 series 结构结果返回。但是，如果apply函数返回一个series ，这些序列将展开为列。
args：元组类型，是应用于 func 的参数
kwds：映射的参数，可选，是一个包含关键字的字典。

返回值：Series 或者 DataFrame，返回值由 func 的返回值决定

Help on method apply in module pandas.core.frame:

apply(func, axis=0, raw=False, result_type=None, args=(), **kwds) method of pandas.core.frame.DataFrame instance
    Apply a function along an axis of the DataFrame.

    Objects passed to the function are Series objects whose index is
    either the DataFrame's index (=0) or the DataFrame's columns
    (=1). By default (result_type=None), the final return type
    is inferred from the return type of the applied function. Otherwise,
    it depends on the result_type argument.

    Parameters
    ----------
    func : function
        Function to apply to each column or row.

    axis : {0 or 'index', 1 or 'columns'}, default 0
        Axis along which the function is applied:

        * 0 or 'index': apply function to each column.

        * 1 or 'columns': apply function to each row.

    raw : bool, default False
        Determines if row or column is passed as a Series or ndarray object:

        *  : passes each row or column as a Series to the
          function.

        *  : the passed function will receive ndarray objects
          instead.

          If you are just applying a NumPy reduction function this will
          achieve much better performance.

    result_type : {'expand', 'reduce', 'broadcast', None}, default None
        These only act when =1 (columns):

        * 'expand' : list-like results will be turned into columns.

        * 'reduce' : returns a Series if possible rather than expanding
          list-like results. This is the opposite of 'expand'.

        * 'broadcast' : results will be broadcast to the original shape
          of the DataFrame, the original index and columns will be
          retained.

        The default behaviour (None) depends on the return value of the
        applied function: list-like results will be returned as a Series
        of those. However if the apply function returns a Series these
        are expanded to columns.

    args : tuple
        Positional arguments to pass to func in addition to the
        array/series.

    **kwds
        Additional keyword arguments to pass as keywords arguments to
        func.

    Returns
    -------
    Series or DataFrame
        Result of applying  along the given axis of the
        DataFrame.

6.2.2 .apply() 范例

6.2.2.1 func

这次我想请大家先注意.apply() 的返回值：Series 或者 DataFrame，返回值由 func 的返回值决定

事实上，func 的特点非常重要，尤其是在和 .pipe() 做对比时候。现在以两种不同的 func 来举例说明。

第一种，func 返回值是 series 时候

def f(series):
    return(series.eq('1.0'))
print("*"*30+"df 数据"+"*"*30)
print(df)
print("*"*30+"df.apply(f) 数据"+"*"*30)
print(df.apply(f))
print("*"*30+"df 数据"+"*"*30)
print(df)

运行结果如下，返回一个 DataFrame 类型的数据，没有影响原始数据

看起来和使用 .pipe() 没什么区别

第二种，func 返回值是 series 时候

def f2(series):
    res="-"
    return(res.join(series))

print("*"*30+"df 数据"+"*"*30)
print(df)
print("*"*30+"df.apply(f2) 数据"+"*"*30)
print(df.apply(f2))
print("*"*30+"df 数据"+"*"*30)
print(df)

这个时候运行结果是

而此时调用 .pipe() 的结果是

在这个地方，大家再体会一下下面的文字：

操作整个 DataFrame 的函数：pipe()
*操作行或者列的函数：apply()

6.2.2.2 axis

axis：{0 or ‘index’, 1 or ‘columns’}, 默认为 0，即对行进行操作，如果设置为 1 或者 ‘columns’ 则对列进行操作。

6.2.2.3 args, kwds

和 .pipe() 中类似，不再赘述

6.3 操作作单一元素的函数：.applymap(）

6.3.1 .applymap(）语法

语法结构：DataFrame.applymap(func, na_action: ‘Optional[str]’ = None)

参数说明：

func : 可调用的 python 函数, 从一个单独的值（DataFrame 的元素）返回一个单独的值（转变后的值）。

na_action：可选，可以是 {None, ‘ignore’}, 默认是 None。处理 NaN 变量，如果为 None 则不处理 NaN 对象，如果为’ignore’则将 NaN 对象当做普通对象带入规则。

返回值：请注意，这里同样是DataFrame。

Help on method applymap in module pandas.core.frame:

applymap(func, na_action: 'Optional[str]' = None) -> 'DataFrame' method of pandas.core.frame.DataFrame instance
    Apply a function to a Dataframe elementwise.

    This method applies a function that accepts and returns a scalar
    to every element of a DataFrame.

    Parameters
    ----------
    func : callable
        Python function, returns a single value from a single value.

    na_action : {None, 'ignore'}, default None
        If 'ignore', propagate NaN values, without passing them to func.

        .. versionadded:: 1.2

    Returns
    -------
    DataFrame
        Transformed DataFrame.

6.3.2 .applymap(）范例

这是合适的 func，转换后的数据元素是之前的数据元素的重复拷贝。

def f3(value):
    return(value*2)
print("*"*30+"df.applymap(f3) 数据"+"*"*30)
print(df.applymap(f3))
print("*"*30+"df 数据"+"*"*30)
print(df)

运行结果为

如果想试试 na_action 用法

代码如下：

def f4(value):
    if( pd.isna(value)):
        return('++NaN++')
    elif(pd.isnull(value)):
        return('++ None ++')
    else:
        return(value*2)
print("*"*30+"df.applymap(f4,na_action='ignore') 数据"+"*"*30)
print(df.applymap(f4,na_action='ignore' ))
print("*"*30+"df.applymap(f4,na_action=None) 数据"+"*"*30)
print(df.applymap(f4,na_action=None)) #也就是默认的情况

运行结果如语法中所说，当 na_action = ‘ignore’ 时候，会将 NaN 值当做普通对象带入规则。

Original: https://blog.csdn.net/u010701274/article/details/121784065
Author: 江南野栀子
Title: Pandas 模块-操纵数据(6)-DataFrame 使用自定义函数

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/699952/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

Filterin

问题介绍在数据处理和信号处理中，滤波（Filtering）是一个常见的问题。它的目标是通过去除或压制信号中的某些成分，从而改变信号的特性或提取出感兴趣的信息。滤波在很多领域都得到…

人工智能 2024年1月2日
0030
最近很火的配音软件| 自媒体零基础必备, 热门短视频达人都在用

随着短视频行业的迅速崛起，很多人开始从事自媒体，但由于声音不好、缺乏情感、面对镜头紧张等原因，他们放弃了。其实配音工具可以帮你解决这个问题，今天就和你分享三款简单易用的文字转语音工…

人工智能 2023年5月25日
0069
碰撞检测技术介绍

自动驾驶决策规划模块中会经常使用到碰撞检测计算分析Ego vehicle行为的安全性，并且可以用在planning计算的多个方面。例如下图中第一幅图，黄色车辆为主车，灰色车辆为交通…

人工智能 2023年6月19日
0080
第15章：模板匹配

第15章：模板匹配 * – 一、模板匹配基础： – + 1. cv2.matchTemplate()函数： + 2. 匹配原理： + 3. 查找最值： &#…

人工智能 2023年5月28日
0070
Node.js学习笔记

引言 💥为什么 JavaScript 可以在浏览器中执行？不同的浏览器使用不同的 JavaScript 解析引擎 Chrome浏览器 = > V8 ; 💥为什么 JavaS…

人工智能 2023年6月26日
0095
在 Python 中将字符串转换为数组

使用 str.split() 方法将字符串转换为数组，例如 array = string.split(‘,’)。 str.split() 方法将在每次出现提供的分隔符时将字符串拆分…

人工智能 2023年7月4日
0067
数学建模-10.聚类

聚类注意聚类和分类的区别：分类是已知类别，聚类未知 K-means算法原理及主要流程 K-means聚类的算法流程：一、指定需要划分的簇[cù]的个数K值（类的个数）;二、随机地…

人工智能 2023年6月3日
0066
神经网络权重初始化代码 init.kaiming_uniform_和kaiming_normal_

神经网络权重初始化–容易忽视的细节为什么要初始化 kaiming初始化方法由来代码实现 PReLu的使用后话禁止转载！！为什么要初始化神经网络要优化一个非常…

人工智能 2023年6月25日
0097
算法对数据量的要求是什么

问题概述本问题关于算法对数据量的要求是什么。具体而言，我们将探讨算法原理、计算步骤，并通过一个复杂的Python代码示例来详细说明。算法原理算法对数据量的要求是指在给定的数据…

人工智能 2024年1月2日
0029
Autoware中pure pursuit纯跟踪算法的代码分析（一）

pure pursuit纯跟踪算法在汽车智能驾驶领域目前的应用很广泛，主要用于汽车的循迹。这种算法比较基础，利用的是数学几何知识。在已知当前点坐标和目标循迹点坐标后，通过计算两个点…

人工智能 2023年6月10日
0080
厦大数字图像处理期末复习中

内容概括 5.图像复原基本模型比较认真的讲了噪声的模型，前几个单元都只是说了滤波器全逆滤波维纳滤波 5.图像复原图像退化/恢复过程的模型•噪声模型•仅存在噪声的恢复-空间…

人工智能 2023年6月22日
0073
人工智能项目实战-使用OMR完成答题卡识别判卷

😊😊😊 欢迎来到本博客😊😊😊本次博客内容将继续讲解关于OpenCV的相关知识🎉 作者简介：⭐️⭐️⭐️ 目前计算机研究生在读。主要研究方向是人工智能和群智能算法方向。目前熟悉pyt…

人工智能 2023年6月23日
0096
大学生第一款浏览器怎么选，这款浏览器适合学生用

浏览器是了解世界的入口，它早就成为电脑端和移动端的标配。可见，浏览器是非常重要的。对于大学生用户而言，选择一款适合自己的浏览器尤其重要。一款适合自己的浏览器对学习效率的提升是不可或…

人工智能 2023年5月30日
00216
【linux】linux实操篇之任务调度

目录前言 * crond 任务调度 – 概述基本语法快速入门案例 + 案例一：每隔一分钟将ls -l /etc/ 追加到 /tmp/to.txt 文件案例二：…

人工智能 2023年7月30日
0085
【跨境电商】EDM邮件营销完整指南（一）：概念，区别与优势

关键词：EDM营销，邮件营销，跨境电商 2020年，全球每天发送和接收3064亿封电子邮件。世界上几乎每个人都有一个电子邮件地址，并且电子邮件营销继续拥有最高的投资回报率，这使得电…

人工智能 2023年7月24日
0094
AI智能语音识别算法原理二

AI智能语音识别算法的信号处理有以下几种方式一、声源定位 1、电扫阵列当系统扫描到输出信号功率最大时所对应的波束方向就是认为是声源的DOA方向，从而可以声源定位。电扫阵列的方式…

人工智能 2023年5月25日
0083

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Pandas 模块-操纵数据(6)-DataFrame 使用自定义函数

6.1 操作整个 DataFrame 的函数：.pipe()

6.1.1 .pipe() 语法

6.1.2 .pipe() 范例

6.2 操作行或者列的函数：.apply()

6.2.1 .apply() 语法

6.2.2 .apply() 范例

6.3 操作作单一元素的函数：.applymap(）

6.3.1 .applymap(） 语法

6.3.2 .applymap(） 范例

大家都在看

6.3.1 .applymap(）语法

6.3.2 .applymap(）范例