pandas 根据列名索引多列数据_python – Pandas DataFrame根据列,索引值比较更改值

我认为你需要

numpy.select广播:

m1 = df.index.values[:, None] > df.columns.values

m2 = df.index.values[:, None] == df.columns.values

df = pd.DataFrame(np.select([m1, m2], [‘k’,’U’], ‘Y’), columns=df.columns, index=df.index)

print (df)

2 4 8

10 k k k

4 k U Y

2 U Y Y

性能:

np.random.seed(1000)

N = 1000

a = np.random.randint(100, size=N)

b = np.random.randint(100, size=N)

df = pd.DataFrame(np.random.choice(list(‘abcdefgh’), size=(N, N)), columns=a, index=b)

print (df)

def us(df):

values = np.array(np.array([df.index]).transpose() – np.array([df.columns]), dtype=’object’)

greater = values > 0

less = values < 0

same = values == 0

values[greater] = ‘k’

values[less] = ‘Y’

values[same] = ‘U’

return pd.DataFrame(values, columns=df.columns, index=df.index)

def jez(df):

m1 = df.index.values[:, None] > df.columns.values

m2 = df.index.values[:, None] == df.columns.values

return pd.DataFrame(np.select([m1, m2], [‘k’,’U’], ‘Y’), columns=df.columns, index=df.index)

In [236]: %timeit us(df)

107 ms ± 358 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [237]: %timeit jez(df)

64 ms ± 299 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Original: https://blog.csdn.net/weixin_42318225/article/details/114354728
Author: bellebiself
Title: pandas 根据列名索引多列数据_python – Pandas DataFrame根据列,索引值比较更改值

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/743418/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球