Help on method groupby in module pandas.core.frame:

groupby(by=None, axis=0, level=None, as_index: 'bool' = True, sort: 'bool' = True, group_keys: 'bool' = True, squeeze: 'bool' = , observed: 'bool' = False, dropna: 'bool' = True) -> 'DataFrameGroupBy' method of pandas.core.frame.DataFrame instance
    Group DataFrame using a mapper or by a Series of columns.

    A groupby operation involves some combination of splitting the
    object, applying a function, and combining the results. This can be
    used to group large amounts of data and compute operations on these
    groups.

    Parameters
    ----------
    by : mapping, function, label, or list of labels
        Used to determine the groups for the groupby.

        If  is a function, it's called on each value of the object's
        index. If a dict or Series is passed, the Series or dict VALUES
        will be used to determine the groups (the Series' values are first
        aligned; see .align() method). If an ndarray is passed, the
        values are used as-is to determine the groups. A label or list of
        labels may be passed to group by the columns in . Notice
        that a tuple is interpreted as a (single) key.

    axis : {0 or 'index', 1 or 'columns'}, default 0
        Split along rows (0) or columns (1).

    level : int, level name, or sequence of such, default None
        If the axis is a MultiIndex (hierarchical), group by a particular
        level or levels.

    as_index : bool, default True
        For aggregated output, return object with group labels as the
        index. Only relevant for DataFrame input. as_index=False is
        effectively "SQL-style" grouped output.

    sort : bool, default True
        Sort group keys. Get better performance by turning this off.

        Note this does not influence the order of observations within each
        group. Groupby preserves the order of rows within each group.

    group_keys : bool, default True
        When calling apply, add group keys to index to identify pieces.

    squeeze : bool, default False
        Reduce the dimensionality of the return type if possible,
        otherwise return a consistent type.

        .. deprecated:: 1.1.0

    observed : bool, default False
        This only applies if any of the groupers are Categoricals.

        If True: only show observed values for categorical groupers.

        If False: show all values for categorical groupers.

    dropna : bool, default True
        If True, and if group keys contain NA values, NA values together
        with row/column will be dropped.

        If False, NA values will also be treated as the key in groups

        .. versionadded:: 1.1.0

    Returns
    -------
    DataFrameGroupBy
        Returns a groupby object that contains information about the groups.

1.2 .groupby() 参数说明

by : 可以是映射, 函数, 标签或者标签列表，用于指定 groupby 的组。
如果 by 是函数，它会按照 index 调用每一个值。
如果 by 是字典或者 Series ，它们的值将用于决定组（Series 的值会第一个对齐，参看 .align() method）。
如果 by 是 ndarray，这个值将用于 as-is 以决定分组。
如果 by 是标签或者标签列表，将会被用于根据相应的列进行分组。
如果 by 是tuple 元组类型，那么将会被视作一个单独的 key 。
axis ：可以是 {0 or ‘index’, 1 or ‘columns’}, 默认为 0 。决定是按照行还是列进行切割。
level : 可以是 int 数值，level 名称，或者相关的序列。默认为 None。如果 axis 是一个多级索引，那么可以按照一个特定的 level 或者特定的多级 level 进行分组。
as_index : 布尔值，是否将分组列名作为输出的索引，默认为True；当设置为False时相当于加了reset_index功能
sort ：布尔值，默认为 True。是对组排序的关键。如果想要更好的性能，可以关掉它。请注意，这个并不影响每个组的内在顺序， Groupby 保留每个组中的行顺序。
group_keys ：布尔值, default True When calling apply, add group keys to index to identify pieces。
dropna：True or False ，默认为True，为真则删除含有空值的行和列。
.groupby() 范例

准备数据

import  random
random.seed()
df = pd.DataFrame({'str':['a', 'a', 'b', 'b', 'a'],
'no':['one', 'two', 'one', 'two', 'one'],
'data1':np.random.randn(5),
'data2':np.random.randn(5)})
df

2.1 分组字段：by

2.1.1 单列作为分组字段，不设置索引

grouped=df.groupby('str')

返回的是一个 DataFrameGroupBy 对象，可以看到其内存。

通过 list 命令看内容，已经按照 str 列表的数据（a，b）将原始 DataFrame 的内容进行了分组。

得到这个对象只是第一步，我们往往要得到相关信息，如每个组，中位数，最大值等，即 apply 后的结果。

grouped.data1.sum() #每组的 data1 的总和

grouped.data1.max() #每组的 data1 的最大值

我们可以利用 for 循环对分好的组进行遍历

for name,group in grouped:
    print(type(name))
    print(name)
    print(type(group))
    print(group)

此外对 DataFrameGroupBy 对象常用的操作很多，如：

aggregate：Aggregate using one or more operations over the specified axis.

apply ：Apply function func group-wise | and combine the results together.

transform ：Aggregate using one or more | operations over the specified axis.

本文篇幅有限，会在后续说，此处不再多说。

2.1.2 单列字段的数据转换一下作为分组字段

例如# 以 no 的长度作为分组依据，因为长度都是3，得到一组

以 no 的长度作为分组依据，因为长度都是3，得到一组
grouped=df.groupby(df['no'].str.len())
list(grouped)

以 no 第一个字符作为分组依据

以 no 第一个字符作为分组依据，分为 0 组 和 t 组
grouped=df.groupby(df['no'].str[0])
list(grouped)

2.1.4 以多列作为分组字段

例如

以 "str","no"o 作为分组依据，分为 4 组
grouped=df.groupby(["str","no"])
list(grouped)

2.1.5 以字典作为分组字段，根据索引对记录进行映射分组

例如

使用字典，对 index 进行分类
a_dict={0:"起始",1:"中间",2:"中间",3:"中间",4:"末尾"}
grouped=df.groupby(a_dict)
list(grouped)

2.1.6 以Series 作为分组字段，根据索引对记录进行映射分组

使用 series，对 index 进行分类，结果类似于以字典作为分组字段

使用 series，对 index 进行分类
a_dict={0:"起始",1:"中间",2:"中间",3:"中间",4:"末尾"}
a_series=pd.Series(a_dict)
grouped=df.groupby(a_series)
list(grouped)

运行结果

2.1.7 以函数作为分组字段，根据函数对索引的执行结果进行分组

使用函数，对 index 进行分类
grouped=df.groupby(lambda x: True if x%2!=0 else False)
list(grouped)

运行结果

2.2 轴：axis

请看 2.3 level 小节。

2.3 级别：level

准备测试数据

columns=pd.MultiIndex.from_arrays([['China','China','China','Japan','Japan'],
                                   ["1","2","3","1","2"]],names=['Country','no'])
hier_df=pd.DataFrame(abs(np.random.randn(4,5)*1000),columns=columns)

默认的 axis=0，即按照行来进行分组，如果 axis =1 即按照列来进行分组，但是什么情况下按列分组呢。往往是对多级别的列表进行分组时候。如下这个范例，有两个列级别，第一个是 Country，第二个是 no。

按照 level=’Country’，axis=1 即按照列来分组数据

list(hier_df.groupby(level='Country',axis=1))

按照 level=’no’，axis=1 即按照列来分组数据

按照 level='no'，axis=1 即按照列来分组数据

2.4 是否保留 index 名：as_index

as_index : 布尔值，是否将分组列名作为输出的索引，默认为 True ；当设置为 False 时相当于加了reset_index功能。

1) 请注意，as_index 只有在 axis=0 时候有效，否则就会报错。

2) 请注意，as_index 一般在使用聚合函数时候有体现，用 list 或者 for 循环时候是看不出来差异的。

使用 list 时候查看分组是没有区别的

如果使用聚合函数就不一样了

2.5 对组排序：sort

sort ：布尔值，默认为 True。是对组排序的关键。如果想要更好的性能，可以关掉它。请注意，这个并不影响每个组的内在顺序， Groupby 保留每个组中的行顺序。

2.6 组键值：group_keys

group_keys ：布尔值, default True When calling apply, add group keys to index to identify pieces。这个后续在讲 groupby.apply 时候再讲。

”’

要是大家觉得写得还行，麻烦点个赞或者收藏吧，想给博客涨涨人气，非常感谢！

”’

Original: https://blog.csdn.net/u010701274/article/details/121868272
Author: 江南野栀子
Title: Pandas 模块-操纵数据(10)-数据分组 .groupby()

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/756652/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

Conda 下安装 Allennlp

避免踩坑，步入正文。首先默认已经安装好了Conda，如果没有，自行安装。 1 : AllenNLP 要求 Python 3.6.1版本往上，并且要求PyTorch。创建并激活环…

Python 2023年9月8日
0041
啃书《利用python进行数据分析》Pandas入门

Pandas 入门究极细啃！丝毫不放过细节！欢迎讨论交流！我会继续肯下去的！欢迎关注是一维数组型对象，包含了一个值序列和数据标签index 1.Series对象的创建 2….

Python 2023年8月29日
0046
scrapy mysql pipeline_Scrapy用Pipeline写入MySQL

–– coding: utf-8 –– Define your item pipelines here Don’t fo…

Python 2023年10月6日
0050
已解决ERROR: No matching distribution found for PIL

已解决（pip安装PIL库报错）ERROR: Could not find a version that satisfies the requirement PIL (from v…

Python 2023年8月2日
0062
自动化测试框架Pytest（八）——断言

断言是一种除错机制，用于验证代码是否符合编码人员的预期。 pytest自带的assert断言有以下几种语法： import time 测&#x8BD5…

Python 2023年9月9日
0042
Flask连接PostgreSQL数据库

Ubuntu下Flask的安装配置记录如何在Ubuntu 20.04上安装Flask过程（Flask Web框架安装也不难）创建虚拟环境 mkdir myproject cd …

Python 2023年8月11日
0055
多场景业务实战

知道游戏行业关键数据指标掌握ARPU， ARPPU等指标的计算方法激活数据总激活码发放量、总激活量、总登录账号数激活率、激活登录率激活率 = 激活量 / 安装量 (激活码…

Python 2023年9月24日
0044
pandas 之 DataFrame、Series使用详解

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

Python 2023年8月8日
0064
中断-NVIC与EXTI外设详解(超全面)

✅作者简介：嵌入式入坑者，与大家一起加油，希望文章能够帮助各位！！！！📃个人主页：@rivencode的个人主页🔥系列专栏：玩转STM32💬推荐一款模拟面试、刷题神器，从基础到大厂…

Python 2023年10月11日
0062
评论情感分析—-多种机器学习模型测试总结

文章目录 * – 前言 – + Step1: 读取评论文件 + Step2: 去除重复评论信息 + Step3: 使用jieba库进行分词操作 + Step…

Python 2023年10月7日
0040
pytest学习

官方文档： Full pytest documentation — pytest documentation https://docs.pytest.org/en/latest/c…

Python 2023年9月10日
0045
智能合约–如何实现可升级的智能合约

一. 什么是智能合约智能合约通俗点说就是写在区块链上面的代码，代码里面编写着严谨完善的规则，一旦某个用户满足了合约里面的规则条件，就会触发里面的代码，执行某个方法。二. 为什么…

Python 2023年11月8日
0056
JUC源码学习笔记6——ReentrantReadWriteLock

系列文章目录和关于我阅读此文需要有AQS独占和AQS共享的源码功底，推荐阅读： JUC源码学习笔记1——AQS独占模式和ReentrantLock JUC源码学习笔记2——AQS共…

Python 2023年10月13日
0043
python数据处理—-Apply自定义函数和向量化函数

什么是Apply自定义函数？ Pandas提供了很多处理数据的API，如果自己的需求不能被这些API满足的时候，我们就需要写自定义函数使用apply函数 apply函数接收一个自定…

Python 2023年8月19日
0070
Pandas之DataFrame详解

二维数据，Series容器，既有行索引，又有列索引 1. 创建DataFrame 1.1 通过list 创建DataFrame 需要指定 data，index 行，columns …

Python 2023年8月19日
0062
Python中将字典转为成员变量

当我们在Python中写一个class时，如果有一部分的成员变量需要用一个字典来命名和赋值，此时应该如何操作呢？这个场景最常见于从一个文件（比如json、npz之类的文件）中读取字…

Python 2023年5月24日
0075

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Pandas 模块-操纵数据(10)-数据分组 .groupby()

1.1 .groupby() 语法结构