Pandas根据日期列进行分组统计及绘图的使用示例
导入所需要的库
from logging import warning
import os, sys
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams["font.sans-serif"] = ["SimHei"]
plt.rcParams["axes.unicode_minus"] = False
df = pd.read_excel("./无标题1.xlsx")
df.head()
m_idmax_ivrecord_idrecord_namerecord_sub_namepub_timelanguageversioncreate_timecount_singer0000DmZbU1RUrem49359118379野摩托NaN2021-04-151.0NaN2022-09-28 22:27:5411000SOicI1YXaDP6035981658不可以这样(反英雄)NaN2021-04-151.0NaN2022-09-29 05:29:0212000ibM5x4Jx2f49782425705心存侥幸NaN2021-04-151.0NaN2022-09-29 01:00:0313004OL7tM1gklPK12212310361如果在一起NaN2021-04-151.0NaN2022-09-29 00:03:1714004D5jPe0h8Q6C12194413841钗头凤.十年生死两茫茫NaN2021-04-151.0NaN2022-09-29 00:54:411
日期列非index的处理方式
对日期列series进行apply变换处理,得到用于分组的key:
- 对 pub_time 列进行 year和month的key提取;
- 然后将其送入 数据框 的groupby;
- 然后对分组内容进行 相应统计值的方法调用;
key_year = lambda x:x.year
key_month = lambda x:x.month
df1 = df.groupby([df['pub_time'].apply(key_year),df['pub_time'].apply(key_month)]).count()
df1['m_id']
pub_time pub_time
2021 4 567
5 1026
6 1163
7 1337
8 1386
9 1465
10 1498
11 1478
12 1482
2022 1 1205
2 761
3 1224
4 1410
5 1324
6 1347
7 966
8 965
9 909
10 639
11 145
Name: m_id, dtype: int64
df1['m_id'].plot()
plt.title('完整时序指数数据每月数量')
plt.xlabel('year,month')
plt.ylabel('m_id count')
plt.grid()
plt.show()
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4JLTHrIi-1668063889609)(output_6_0.png)]
日期作为index的处理方式
df.index = pd.to_datetime(df.pub_time)
df2 = df.groupby([df.index.year, df.index.month]).count()
df2['m_id']
ym_agg = df.groupby([df.index.year, df.index.month]).agg({'max_iv':['max','mean','min','count']})
ym_agg
max_ivmaxmeanmincountpub_timepub_time20214421189066834.20987760275675541295677094.117934600510266274486372879.793637600011637595171778934.391922600313378358016257704.013709600813869455031959929.7672356008146510377105553772.2783716008149811624210472782.0852506001147812636264684906.5296906008148220221202053461655.124481600712052932501575054.32851560167613180821558802.9517976000122441093250081737.102837600914105469069074068.342900600413246458175063106.03489260031347715988460100434.18633560179668107315642211451.45285060019659412523690661.900990600790910145097680869.0203446007639112586974133693.7034486056145
ym_agg.plot()
plt.xlabel('year,month')
plt.ylabel('max_iv analysis')
plt.grid()
plt.show()
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9GFZFVGr-1668063889611)(output_9_0.png)]
ym_agg.max_iv['mean'].plot()
plt.xlabel('year,month')
plt.ylabel('max_iv mean')
plt.grid()
plt.show()
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-TMWDeO8s-1668063889611)(output_10_0.png)]
ym_agg.max_iv['max'].plot()
plt.xlabel('year,month')
plt.ylabel('max_iv max')
plt.grid()
plt.show()
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0cPUHXLn-1668063889612)(output_11_0.png)]
ym_agg.max_iv['min'].plot()
plt.xlabel('year,month')
plt.ylabel('max_iv min')
plt.grid()
plt.show()
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0D9D7Il7-1668063889612)(output_12_0.png)]
ym_agg.max_iv['count'].plot()
plt.xlabel('year,month')
plt.ylabel('max_iv count')
plt.grid()
plt.show()
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fi2rCmfL-1668063889613)(output_13_0.png)]
Original: https://blog.csdn.net/baby_hua/article/details/127789265
Author: baby_hua
Title: TOOLS_Pandas根据日期列进行分组统计及绘图的使用示例
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/742480/
转载文章受原作者版权保护。转载请注明原作者出处!