python groupby填充缺失值_然后Pandas groupby会填充缺失的行

解决方案

输入数据帧:LCLid energy(kWh/hh)

day_time

2014-01-01 00:00:00 MAC000006 0.270453

2014-01-01 00:00:00 MAC000007 0.170603

2014-01-01 00:30:00 MAC000006 0.716418

2014-01-01 00:30:00 MAC000007 0.276678

2014-01-01 03:00:00 MAC000006 0.819146

2014-01-01 03:00:00 MAC000007 0.027490

2014-01-01 03:30:00 MAC000006 0.688879

2014-01-01 03:30:00 MAC000007 0.868017

你需要做的是:

^{pr2}$

结果:LCLid energy(kWh/hh)

2014-01-01 00:00:00 MAC000006 0.270453

2014-01-01 00:00:00 MAC000007 0.170603

2014-01-01 00:30:00 MAC000006 0.716418

2014-01-01 00:30:00 MAC000007 0.276678

2014-01-01 01:00:00 MAC000006 0.716418

2014-01-01 01:00:00 MAC000007 0.276678

2014-01-01 01:30:00 MAC000006 0.716418

2014-01-01 01:30:00 MAC000007 0.276678

2014-01-01 02:00:00 MAC000006 0.819146

2014-01-01 02:00:00 MAC000007 0.027490

2014-01-01 02:30:00 MAC000006 0.819146

2014-01-01 02:30:00 MAC000007 0.027490

2014-01-01 03:00:00 MAC000006 0.819146

2014-01-01 03:00:00 MAC000007 0.027490

2014-01-01 03:30:00 MAC000006 0.688879

2014-01-01 03:30:00 MAC000007 0.868017

首先,我将构建一个类似于您的的示例数据帧

import numpy as np

import pandas as pd

Building an example DataFrame that looks like yours

df = pd.DataFrame({

‘day_time’: [

pd.Timestamp(2014, 1, 1, 0, 0),

pd.Timestamp(2014, 1, 1, 0, 0),

pd.Timestamp(2014, 1, 1, 0, 30),

pd.Timestamp(2014, 1, 1, 0, 30),

pd.Timestamp(2014, 1, 1, 3, 0),

pd.Timestamp(2014, 1, 1, 3, 0),

pd.Timestamp(2014, 1, 1, 3, 30),

pd.Timestamp(2014, 1, 1, 3, 30),

‘LCLid’: [

‘MAC000006’,

‘MAC000007’,

‘MAC000006’,

‘MAC000007’,

‘MAC000006’,

‘MAC000007’,

‘MAC000006’,

‘MAC000007’,

‘energy(kWh/hh)’: np.random.rand(8)

).set_index(‘day_time’)

结果:LCLid energy(kWh/hh)

day_time

2014-01-01 00:00:00 MAC000006 0.270453

2014-01-01 00:00:00 MAC000007 0.170603

2014-01-01 00:30:00 MAC000006 0.716418

2014-01-01 00:30:00 MAC000007 0.276678

2014-01-01 03:00:00 MAC000006 0.819146

2014-01-01 03:00:00 MAC000007 0.027490

2014-01-01 03:30:00 MAC000006 0.688879

2014-01-01 03:30:00 MAC000007 0.868017

请注意,我们是如何缺少以下时间戳的:2014-01-01 01:00:00

2014-01-01 01:30:00

2014-01-02 02:00:00

2014-01-02 02:30:00

在数据框重新索引()

首先要知道的是,df.reindex()允许您填充缺少的索引值,对于缺少的值,默认值为NaN。在您的例子中,您需要提供完整的时间戳范围索引,包括在起始数据帧中没有显示的值。在

在这里,我使用pd.date_range()列出最小和最大起始索引值之间的所有时间戳,以30分钟为单位。警告:这种方式意味着,如果丢失的时间戳值在开头或结尾,则不会重新添加它们!所以也许你想显式地指定start和{}。在full_idx = pd.date_range(start=df.index.min(), end=df.index.max(), freq=’30T’)

结果:DatetimeIndex([‘2014-01-01 00:00:00’, ‘2014-01-01 00:30:00’,

‘2014-01-01 01:00:00’, ‘2014-01-01 01:30:00’,

‘2014-01-01 02:00:00’, ‘2014-01-01 02:30:00’,

‘2014-01-01 03:00:00’, ‘2014-01-01 03:30:00’],

dtype=’datetime64[ns]’, freq=’30T’)

现在,如果我们使用它来重新索引一个分组的子数据帧,我们将得到:grouped_df = df[df.LCLid == ‘MAC000006’]

grouped_df.reindex(full_idx)

结果:LCLid energy(kWh/hh)

2014-01-01 00:00:00 MAC000006 0.270453

2014-01-01 00:30:00 MAC000006 0.716418

2014-01-01 01:00:00 NaN NaN

2014-01-01 01:30:00 NaN NaN

2014-01-01 02:00:00 NaN NaN

2014-01-01 02:30:00 NaN NaN

2014-01-01 03:00:00 MAC000006 0.819146

2014-01-01 03:30:00 MAC000006 0.688879

您说过要使用最近的可用周围值来填充缺少的值。这可以在重新编制索引期间执行,如下所示:grouped_df.reindex(full_idx, method=’nearest’)

结果:LCLid energy(kWh/hh)

2014-01-01 00:00:00 MAC000006 0.270453

2014-01-01 00:30:00 MAC000006 0.716418

2014-01-01 01:00:00 MAC000006 0.716418

2014-01-01 01:30:00 MAC000006 0.716418

2014-01-01 02:00:00 MAC000006 0.819146

2014-01-01 02:30:00 MAC000006 0.819146

2014-01-01 03:00:00 MAC000006 0.819146

2014-01-01 03:30:00 MAC000006 0.688879

同时使用数据框groupby()

现在我们想将此转换应用到数据帧中的每个组,其中

组由其LCLid定义。在(

.groupby(‘LCLid’, as_index=False) # use LCLid as groupby key, but don’t add it as a group index

.apply(lambda group: group.reindex(full_idx, method=’nearest’)) # do this for each group

.reset_index(level=0, drop=True) # get rid of the automatic index generated during groupby

.sort_index() # This is optional, just in case you want timestamps in chronological order

结果:LCLid energy(kWh/hh)

2014-01-01 00:00:00 MAC000006 0.270453

2014-01-01 00:00:00 MAC000007 0.170603

2014-01-01 00:30:00 MAC000006 0.716418

2014-01-01 00:30:00 MAC000007 0.276678

2014-01-01 01:00:00 MAC000006 0.716418

2014-01-01 01:00:00 MAC000007 0.276678

2014-01-01 01:30:00 MAC000006 0.716418

2014-01-01 01:30:00 MAC000007 0.276678

2014-01-01 02:00:00 MAC000006 0.819146

2014-01-01 02:00:00 MAC000007 0.027490

2014-01-01 02:30:00 MAC000006 0.819146

2014-01-01 02:30:00 MAC000007 0.027490

2014-01-01 03:00:00 MAC000006 0.819146

2014-01-01 03:00:00 MAC000007 0.027490

2014-01-01 03:30:00 MAC000006 0.688879

2014-01-01 03:30:00 MAC000007 0.868017

相关文件:

Original: https://blog.csdn.net/weixin_31955925/article/details/113961715
Author: 蟲小山
Title: python groupby填充缺失值_然后Pandas groupby会填充缺失的行

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/739928/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球