pandas算加权平均值_使用Pandas数据框计算加权平均值

I have te following pandas dataframe:

data_df = pd.DataFrame({‘ind’:[‘la’,’p’,’la’,’la’,’p’,’g’,’g’,’la’],

‘dist’:[10.,5.,7.,8.,7.,2.,5.,3.],

‘diff’:[0.54,3.2,8.6,7.2,2.1,1.,3.5,4.5],

‘cas’:[1.,2.,3.,4.,5.,6.,7.,8.]})

that is

cas diff dist ind

0 1 0.54 10 la

1 2 3.20 5 p

2 3 8.60 7 la

3 4 7.20 8 la

4 5 2.10 7 p

5 6 1.00 2 g

6 7 3.50 5 g

7 8 4.50 3 la

I need to compute the weighted average of all the columns where the weights are in the ‘dist’ column and group the values by ‘ind’.

For example for ‘ind’=’la’ and the ‘diff’ column:

((100.54)+(8.607)+(7.208)+(4.503))/(10+7+8+3) = 4.882143

The result I want to obtain is the following

cas diff

ind

g 6.714286 2.785714

la 3.107143 4.882143

p 3.750000 2.558333

which is obtained by multiplying each value of each colums by the corrisponding value in the ‘dist’ column, sum the results with the same ‘ind’ and then dividing the result by the sum of all the ‘dist’ values corrisponding to the same ind.

I thought this would have been an easy task done by the dataframe ‘groupby’ method, but actually it’s kind of tricky.

Can someone please help me?

解决方案

You may obtain within groups normalized weights by using transform:

df[‘weight’] = df[‘dist’] / df.groupby(‘ind’)[‘dist’].transform(‘sum’)

df[‘weight’]

0 0.357143

1 0.416667

2 0.250000

3 0.285714

4 0.583333

5 0.285714

6 0.714286

7 0.107143

Name: weight, dtype: float64

Then, you just need to multiply these weight by the values, and take the sum:

df[‘wcas’], df[‘wdiff’] = (df[n] * df[‘weight’] for n in (‘cas’, ‘diff’))

df.groupby(‘ind’)[[‘wcas’, ‘wdiff’]].sum()

wcas wdiff

ind

g 6.714286 2.785714

la 3.107143 4.882143

p 3.750000 2.558333

Edit: with in-place mutation:

backup = df.copy() # make a backup copy to mutate in place

cols = df.columns[:2] # cas, diff

df[cols] = df[‘weight’].values[:, None] * df[cols]

df.groupby(‘ind’)[cols].sum()

cas diff

ind

g 6.714286 2.785714

la 3.107143 4.882143

p 3.750000 2.558333

Original: https://blog.csdn.net/weixin_30357183/article/details/112947098
Author: 小女巫诺诺
Title: pandas算加权平均值_使用Pandas数据框计算加权平均值

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/678315/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球