Pandas 模块-操纵数据(6)-DataFrame 使用自定义函数

2023年8月21日上午9:16 • Python • 阅读 63

6. DataFrame 使用自定义函数

6.1 操作整个 DataFrame 的函数：.pipe()

6.1.1 .pipe() 语法

6.1.2 .pipe() 范例

6.2 操作行或者列的函数：.apply()

6.2.1 .apply() 语法

6.2.2 .apply() 范例

6.3 操作作单一元素的函数：.applymap(）

6.3.1 .applymap(）语法

6.3.2 .applymap(）范例

如果想要应用自定义的函数，或者把其他库中的函数应用到 Pandas.DataFrame 对象中，有以下三种方法：

操作整个 DataFrame 的函数：pipe()
操作行或者列的函数：apply()
操作单一元素的函数：applymap(）
DataFrame 使用自定义函数

先准备数据吧

import pandas as pd
dict_data={"a":list("abcdef"),"b":list("defghi"),"c":list("ghijkl")}
df=pd.DataFrame.from_dict(dict_data)
df

得到结果

6.1 操作整个 DataFrame 的函数：.pipe()

6.1.1 .pipe() 语法

语法结构：DataFrame.pipe(func, args, *kwargs)

参数说明：

func：一个应用于Series/DataFrame的函数，args, *kwargs都是应用于这个函数的参数
args：迭代的参数，可选，可以是元组类型，也可以是列表类型或者其他。
kwargs：映射的参数，可选，是一个包含关键字的字典。

返回值：返回值由 func 的返回值决定

请注意，使用.pipe(）时候，默认不会修改 DataFrame 本身

Help on method pipe in module pandas.core.generic:

pipe(func, *args, **kwargs) method of pandas.core.frame.DataFrame instance
    Apply func(self, \*args, \*\*kwargs).

    Parameters
    ----------
    func : function
        Function to apply to the Series/DataFrame.

        , and  are passed into .

        Alternatively a (callable, data_keyword) tuple where
        data_keyword is a string indicating the keyword of
         that expects the Series/DataFrame.

    args : iterable, optional
        Positional arguments passed into .

    kwargs : mapping, optional
        A dictionary of keyword arguments passed into .

    Returns
    -------
    object : the return type of .

    See Also
    --------
    DataFrame.apply : Apply a function along input axis of DataFrame.

    DataFrame.applymap : Apply a function elementwise on a whole DataFrame.

    Series.map : Apply a mapping correspondence on a
        :class:~pandas.Series.

    Notes
    -----
    Use .pipe when chaining together functions that expect
    Series, DataFrames or GroupBy objects. Instead of writing

6.1.2 .pipe() 范例

.pipe(）的用法非常简单，先看个代码就明白了

def f(dataframe,*args,**kwargs):
    for tmparg in args:
        dataframe+="-"+str(tmparg)
    for tmpkey,tmpvalue in kwargs.items():
        dataframe+="-"+str(tmpkey)+str(tmpvalue)
    return dataframe
print(df)
params_tuple=list(range(3))
params_dict={"A":1,"B":2}
df.pipe(f,*params_tuple,**params_dict)

运行结果如下：

6.2 操作行或者列的函数：.apply()

6.2.1 .apply() 语法

语法结构：DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)

参数说明：

func： 一个应用于每一行或者每一列的函数。
axis：{0 or ‘index’, 1 or ‘columns’}, 默认为 0，即对行进行操作，如果设置为 1 或者 ‘columns’ 则对列进行操作。
raw ：布尔值，默认为 False，这个值了决定行或者列是作为 Series 还是 ndarray类型进行传递；当值为 False 时候，每一行或者每一列都作为 Series 进行传递，为 True 时候则作为 ndarray 对象进行传递。一般情况下只有你进行 NumPy 的函数活动时候，选择 True 会获得更好的性能回报。
result_type：{‘expand’, ‘reduce’, ‘broadcast’, None}, 默认为 None 。这个只有 axis=1，即对列 (columns) 进行操作时候才起作用。
‘expand’ : 类似列表 list 的结果将转换为列。
‘reduce’ : 如果可能，返回一个序列 series ，而不是展开类似列表 list 的结果。

这与”expand”相反。

‘broadcast’ : 结果将广播到数据帧的原始形状，原始索引和列将保留。
默认行为 ‘None’ 取决于应用函数的返回值：类似列表list的结果将作为 series 结构结果返回。但是，如果apply函数返回一个series ，这些序列将展开为列。
args：元组类型，是应用于 func 的参数
kwds：映射的参数，可选，是一个包含关键字的字典。

返回值：Series 或者 DataFrame，返回值由 func 的返回值决定

Help on method apply in module pandas.core.frame:

apply(func, axis=0, raw=False, result_type=None, args=(), **kwds) method of pandas.core.frame.DataFrame instance
    Apply a function along an axis of the DataFrame.

    Objects passed to the function are Series objects whose index is
    either the DataFrame's index (=0) or the DataFrame's columns
    (=1). By default (result_type=None), the final return type
    is inferred from the return type of the applied function. Otherwise,
    it depends on the result_type argument.

    Parameters
    ----------
    func : function
        Function to apply to each column or row.

    axis : {0 or 'index', 1 or 'columns'}, default 0
        Axis along which the function is applied:

        * 0 or 'index': apply function to each column.

        * 1 or 'columns': apply function to each row.

    raw : bool, default False
        Determines if row or column is passed as a Series or ndarray object:

        *  : passes each row or column as a Series to the
          function.

        *  : the passed function will receive ndarray objects
          instead.

          If you are just applying a NumPy reduction function this will
          achieve much better performance.

    result_type : {'expand', 'reduce', 'broadcast', None}, default None
        These only act when =1 (columns):

        * 'expand' : list-like results will be turned into columns.

        * 'reduce' : returns a Series if possible rather than expanding
          list-like results. This is the opposite of 'expand'.

        * 'broadcast' : results will be broadcast to the original shape
          of the DataFrame, the original index and columns will be
          retained.

        The default behaviour (None) depends on the return value of the
        applied function: list-like results will be returned as a Series
        of those. However if the apply function returns a Series these
        are expanded to columns.

    args : tuple
        Positional arguments to pass to func in addition to the
        array/series.

    **kwds
        Additional keyword arguments to pass as keywords arguments to
        func.

    Returns
    -------
    Series or DataFrame
        Result of applying  along the given axis of the
        DataFrame.

6.2.2 .apply() 范例

6.2.2.1 func

这次我想请大家先注意.apply() 的返回值：Series 或者 DataFrame，返回值由 func 的返回值决定

事实上，func 的特点非常重要，尤其是在和 .pipe() 做对比时候。现在以两种不同的 func 来举例说明。

第一种，func 返回值是 series 时候

def f(series):
    return(series.eq('1.0'))
print("*"*30+"df 数据"+"*"*30)
print(df)
print("*"*30+"df.apply(f) 数据"+"*"*30)
print(df.apply(f))
print("*"*30+"df 数据"+"*"*30)
print(df)

运行结果如下，返回一个 DataFrame 类型的数据，没有影响原始数据

看起来和使用 .pipe() 没什么区别

第二种，func 返回值是 series 时候

def f2(series):
    res="-"
    return(res.join(series))

print("*"*30+"df 数据"+"*"*30)
print(df)
print("*"*30+"df.apply(f2) 数据"+"*"*30)
print(df.apply(f2))
print("*"*30+"df 数据"+"*"*30)
print(df)

这个时候运行结果是

而此时调用 .pipe() 的结果是

在这个地方，大家再体会一下下面的文字：

操作整个 DataFrame 的函数：pipe()
*操作行或者列的函数：apply()

6.2.2.2 axis

axis：{0 or ‘index’, 1 or ‘columns’}, 默认为 0，即对行进行操作，如果设置为 1 或者 ‘columns’ 则对列进行操作。

6.2.2.3 args, kwds

和 .pipe() 中类似，不再赘述

6.3 操作作单一元素的函数：.applymap(）

6.3.1 .applymap(）语法

语法结构：DataFrame.applymap(func, na_action: ‘Optional[str]’ = None)

参数说明：

func : 可调用的 python 函数, 从一个单独的值（DataFrame 的元素）返回一个单独的值（转变后的值）。

na_action：可选，可以是 {None, ‘ignore’}, 默认是 None。处理 NaN 变量，如果为 None 则不处理 NaN 对象，如果为’ignore’则将 NaN 对象当做普通对象带入规则。

返回值：请注意，这里同样是DataFrame。

Help on method applymap in module pandas.core.frame:

applymap(func, na_action: 'Optional[str]' = None) -> 'DataFrame' method of pandas.core.frame.DataFrame instance
    Apply a function to a Dataframe elementwise.

    This method applies a function that accepts and returns a scalar
    to every element of a DataFrame.

    Parameters
    ----------
    func : callable
        Python function, returns a single value from a single value.

    na_action : {None, 'ignore'}, default None
        If 'ignore', propagate NaN values, without passing them to func.

        .. versionadded:: 1.2

    Returns
    -------
    DataFrame
        Transformed DataFrame.

6.3.2 .applymap(）范例

这是合适的 func，转换后的数据元素是之前的数据元素的重复拷贝。

def f3(value):
    return(value*2)
print("*"*30+"df.applymap(f3) 数据"+"*"*30)
print(df.applymap(f3))
print("*"*30+"df 数据"+"*"*30)
print(df)

运行结果为

如果想试试 na_action 用法

代码如下：

def f4(value):
    if( pd.isna(value)):
        return('++NaN++')
    elif(pd.isnull(value)):
        return('++ None ++')
    else:
        return(value*2)
print("*"*30+"df.applymap(f4,na_action='ignore') 数据"+"*"*30)
print(df.applymap(f4,na_action='ignore' ))
print("*"*30+"df.applymap(f4,na_action=None) 数据"+"*"*30)
print(df.applymap(f4,na_action=None)) #也就是默认的情况

运行结果如语法中所说，当 na_action = ‘ignore’ 时候，会将 NaN 值当做普通对象带入规则。

Original: https://blog.csdn.net/u010701274/article/details/121784065
Author: 江南野栀子
Title: Pandas 模块-操纵数据(6)-DataFrame 使用自定义函数

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/755585/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

matplotlib之pyplot模块——交互式绘图模式管理（ion()、ioff()、isinteractive()）

当前有效 matplotlib版本为： 3.4.1。交互模式当 matplotlib使用交互式后端时，可实现交互式绘图。如果处于交互模式，新创建的图形将会立刻显示，修改图形（…

Python 2023年9月4日
0055
基于U-Net网络的图像分割的MindStudio实践

摘要：本实践是基于Windows版MindStudio 5.0.RC3，远程连接ECS服务器使用，ECS是基于官方分享的CANN6.0.RC1_MindX_Vision3.0.RC…

Python 2023年10月29日
0046
如何计算方阵的特征值和特征向量np.linalg.eig()

关于这部分的理论知识可以参考我的这篇博客《特征值与特征向量》定义、意义及例子,下面主要介绍如何计算方阵的特征值和特征向量目录 1.np.linalg.eig() 2.例子 3. …

Python 2023年8月26日
0047
使用 Django 实现私有云盘

众所周知，当前使用最广泛的云盘是百度网盘，而百度网盘下载的龟速让人难以忍受，再加上百度各种其他的骚操作，让人边骂还不得不用，别无选择。现在阿里云盘也入局了，说的很美好，但是也和百度…

Python 2023年8月3日
0043
matplotlib色彩（colors）之色彩基础知识（色彩模型，matplotlib色彩格式，matplotlib默认色彩映射）

色彩模型（RGB，RGBA，CMYK灰度） matplotlib中的色彩定义主要用到了 RGB、 RGBA、 CMYK、 灰色四种模型。…

Python 2023年9月5日
0057
Java图片或视频生成GIF动图，发送微信

目录前言 GIF简介代码生成 * 图片合成GIF 自定义GIF动图视频生成GIF 发送微信小结前言别人的博客文章中有动态显示这是怎么做到的呢？别人的微信发送的表情动态为…

Python 2023年10月9日
0058
Vue 模板语法

Vue.js使用基于HTML的模板语法,允许开发者声明式地将DOM绑定至地层Vue实例的数据. 所有Vue.js的模板都是合法的HTML,所以能被遵循规范的浏览器和HTML解析器解…

Python 2023年6月12日
0050
『德不孤』Pytest框架 — 14、Pytest参数化

软件测试中，输入相应值，检查期望值，是常见测试方法。在自动化测试中，一个测试用例对应一个测试点，通常一组测试数据是无法完全覆盖测试范围的，所以需要参数化来传递多组数据。 Unit…

Python 2023年9月14日
0041
【无敌Python 】增强视频画质，就应该这么做

原理我想知道你们小时候是不是经常玩这个？ [En] I wonder if you used to play this when you were kids? 这就是最早的动画是…

Python 2023年5月24日
0088
Python爬虫案例：采集青创网批发商品数据（附代码）

开发环境 Python 3.8 Pycharm 2021.2 模块使用 selenium >>> pip install selenium==3.141.0 (指…

Python 2023年5月24日
0082
Python实现直播弹幕自动发送

import requests url = ‘https://api.live.bilibili.com/msg/send’ data = { ‘bubble’: ‘0’, ‘ms…

Python 2023年11月9日
0035
Postman中的Pre-request Scrip详解

Postman中的Pre-request Scrip详解一、Pre-request Scrip的简介 1、Pre-request Script是在请求发送之前需要执行的代码片段；…

Python 2023年10月23日
0054
【图解】连狗子都能看懂的Python基础总结！

介绍在这篇文章中，我写了关于机器学习和DeepLerning用户的Python编程基础知识我以初学者容易理解的方式进行了说明，如果你有任何问题，请评论。本文所介绍的 &#82…

Python 2023年8月24日
0043
Python基础够扎实了，不妨挑战一下二十分钟开发一款游戏

一、Python及Pygame简介作为一款开源的程序设计语言，Python有着众多优势。凭借着越来越丰富、功能越来越强大的扩展库，Python在各个领域越来越突出：数据分析（R语…

Python 2023年9月21日
0050
轻量级bug管理平台——注册

虚拟环境配置（略）’ 3、本地配置： local_settings.py 4、代码仓库（码云） day02 1、内容回顾： local_settings的作用？本地配…

Python 2023年6月12日
0073
用Python写一个新年倒计时

❤️‍🔥前言：春回大地，万象更新!春在招手，朋友们，我们一起互相祝愿吧!一年更比一年好。时光苒，岁月如梭。踏着新年欢快的钟声，我们又迎来了这个期待已久的日子过去的一年，我们有泪水也…

Python 2023年8月12日
0049

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Pandas 模块-操纵数据(6)-DataFrame 使用自定义函数

6.1 操作整个 DataFrame 的函数：.pipe()

6.1.1 .pipe() 语法

6.1.2 .pipe() 范例

6.2 操作行或者列的函数：.apply()

6.2.1 .apply() 语法

6.2.2 .apply() 范例

6.3 操作作单一元素的函数：.applymap(）

6.3.1 .applymap(） 语法

6.3.2 .applymap(） 范例

大家都在看

6.3.1 .applymap(）语法

6.3.2 .applymap(）范例