pandas行和列的获取

DataFrame的行和列:df[‘行’, ‘列’]

DataFrame行和列的获取分三个维度

  • 行和列选取:df[],一次只能选取行或列
  • 区域选取:df.loc[], df.iloc[], df.ix[],可以同时为行或列设置筛选条件
  • 单元格选取:df.at[], df.iat[],准确选取某个单元格

先随机生成一个dataframe

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(10,5), index=list('abcdefghij'), columns=list('ABCDE'))

       A            B          C            D           E
a   0.299206    -0.383297   -0.931467   -0.591609   -1.131105
b   0.074351    0.791849    1.637467    -1.408712   -1.376527
c   -0.359802   -2.049489   -0.615742   -1.953994   0.685243
d   0.232557    1.768284    -0.447015   2.373358    1.220536
e   -0.997380   -0.447236   0.632368    -0.352590   -0.064736
f   -1.220178   -0.314304   1.202184    0.018326    1.072153
g   -1.508916   0.380466    0.359506    -0.742657   -0.373764
h   1.031420    -3.236676   0.444769    1.396802    -0.405590
i   0.166133    -0.051614   -0.146943   0.609431    -0.351814
j   1.857521    -0.159101   0.899745    1.108722    -0.615379

1.1 根据索引获取行

  • 获取前3行数据
df[:3]

       A            B          C            D           E
a   0.299206    -0.383297   -0.931467   -0.591609   -1.131105
b   0.074351    0.791849    1.637467    -1.408712   -1.376527
c   -0.359802   -2.049489   -0.615742   -1.953994   0.685243
  • 获取第2到3行数据
df[1:3]
df['b':'c']

       A            B          C            D           E
b   0.074351    0.791849    1.637467    -1.408712   -1.376527
c   -0.359802   -2.049489   -0.615742   -1.953994   0.685243
  • 获取特定行数据

df[[True,False,True,False,False,False, True, True, False, True]]

       A            B          C            D           E
a   0.299206    -0.383297   -0.931467   -0.591609   -1.131105
c   -0.359802   -2.049489   -0.615742   -1.953994   0.685243
g   -1.508916   0.380466    0.359506    -0.742657   -0.373764
h   1.031420    -3.236676   0.444769    1.396802    -0.405590
j   1.857521    -0.159101   0.899745    1.108722    -0.615379

1.2 根据条件获取行

  • 获取A列大于0的行
df[df.A > 0]

       A            B          C            D           E
a   0.299206    -0.383297   -0.931467   -0.591609   -1.131105
b   0.074351    0.791849    1.637467    -1.408712   -1.376527
d   0.232557    1.768284    -0.447015   2.373358    1.220536
h   1.031420    -3.236676   0.444769    1.396802    -0.405590
i   0.166133    -0.051614   -0.146943   0.609431    -0.351814
j   1.857521    -0.159101   0.899745    1.108722    -0.615379
  • 获取A列和B列大于0的行
df[(df.A > 0) & (df.B > 0)]

       A            B          C            D           E
b   0.074351    0.791849    1.637467    -1.408712   -1.376527
d   0.232557    1.768284    -0.447015   2.373358    1.220536
  • 获取A列或列大于0的行
df[(df.A > 0) | (df.B > 0)]

       A            B          C            D           E
a   0.299206    -0.383297   -0.931467   -0.591609   -1.131105
b   0.074351    0.791849    1.637467    -1.408712   -1.376527
d   0.232557    1.768284    -0.447015   2.373358    1.220536
g   -1.508916   0.380466    0.359506    -0.742657   -0.373764
h   1.031420    -3.236676   0.444769    1.396802    -0.405590
i   0.166133    -0.051614   -0.146943   0.609431    -0.351814
j   1.857521    -0.159101   0.899745    1.108722    -0.615379

1.3 获取列


df['A']
df[['A']]

df[['A', 'B']]
df[df.columns[0:2]]
  • df.loc[] 只能使用标签索引,不能使用整数索引,通过便签索引切边进行筛选时,前闭后闭。
  • df.iloc[] 只能使用整数索引,不能使用标签索引,通过整数索引切边进行筛选时,前闭后开。
  • df.ix[]既可以使用标签索引,也可以使用整数索引。

2.1 df.loc[]

  • 获取a行

df.loc['a']
df.loc['a', :]

A    0.299206
B   -0.383297
C   -0.931467
D   -0.591609
E   -1.131105
Name: a, dtype: float64

df.loc[['a']]
df.loc[['a'], :]

       A            B          C            D           E
a   0.299206    -0.383297   -0.931467   -0.591609   -1.131105
  • 获取a, b, d行

df.loc[['a', 'b', 'd']]
df.loc[['a', 'b', 'd'], :]

df[[True, True, False, True, False, False, False, True, False, True]]

       A            B          C            D           E
a   0.299206    -0.383297   -0.931467   -0.591609   -1.131105
b   0.074351    0.791849    1.637467    -1.408712   -1.376527
d   0.232557    1.768284    -0.447015   2.373358    1.220536
  • 获取a到d行
df.loc['a':'d', :]

       A            B          C            D           E
a   0.299206    -0.383297   -0.931467   -0.591609   -1.131105
b   0.074351    0.791849    1.637467    -1.408712   -1.376527
c   -0.359802   -2.049489   -0.615742   -1.953994   0.685243
d   0.232557    1.768284    -0.447015   2.373358    1.220536
  • 选取A列大于0的行
df.loc[df.A > 0]
df.loc[df.A > 0, :]

       A            B          C            D           E
a   0.299206    -0.383297   -0.931467   -0.591609   -1.131105
b   0.074351    0.791849    1.637467    -1.408712   -1.376527
d   0.232557    1.768284    -0.447015   2.373358    1.220536
h   1.031420    -3.236676   0.444769    1.396802    -0.405590
i   0.166133    -0.051614   -0.146943   0.609431    -0.351814
j   1.857521    -0.159101   0.899745    1.108722    -0.615379

df.loc[:, 'A']

df.loc[:, ['A', 'C']]

df.loc[:, 'A':'C']

df.loc['c', 'B']

df.loc[((df.A > 0) & (df.B > 0)), ['C', 'D']]

df.loc['a', :] = 10

df.loc[:, 'B'] = 50

df.loc[['b', 'c'], 'C':'F'] = 30

df.loc[df.C < 0] = 0

Example

tuples = [
   ('cobra', 'mark i'), ('cobra', 'mark ii'),
   ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
   ('viper', 'mark ii'), ('viper', 'mark iii')
]
index = pd.MultiIndex.from_tuples(tuples)
values = [[12, 2], [0, 4], [10, 20],
        [1, 4], [7, 1], [16, 36]]

df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)

df
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36
df.loc['cobra']
         max_speed  shield
mark i          12       2
mark ii          0       4

df.loc[('cobra', 'mark ii')]
max_speed    0
shield       4
Name: (cobra, mark ii), dtype: int64

df.loc[[('cobra', 'mark ii')]]
               max_speed  shield
cobra mark ii          0       4

df.loc['cobra', 'mark i']
max_speed    12
shield        2
Name: (cobra, mark i), dtype: int64
df.loc[('cobra', 'mark i'), 'shield']
2
df.loc[('cobra', 'mark i'):'viper']
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
                    max_speed  shield
cobra      mark i          12       2
           mark ii          0       4
sidewinder mark i          10      20
           mark ii          1       4
viper      mark ii          7       1

2.2 df.iloc[ ]

  • 选取第二行

df.iloc[1]
df.iloc[1, :]

df.iloc[[1]]
df.iloc[[1], :]
  • 选取前三行
df.iloc[:3, :]
df.iloc[:3]
  • 选取第一、三、五行
df.iloc[[1, 3, 5]]
df.iloc[[1, 3, 5], :]
  • 选取第二列
df.iloc[:, 1]
  • 选取前三列
df.iloc[:, 0:3]
df.iloc[:,:3]
  • 选取第一三四列
df.iloc[:, [0, 2, 3]]
  • 选取第一行第二列的值
df.iloc[0, 1]
  • 选取第二三行的第二到四列
df.iloc[[1,2], 1:4]

2.3 df.ix[ ]

可以混合标签索引和整数索引

However, when an axis is integer based, ONLY label based access and not positional access is supported. Thus, in such cases, it’s usually better to be explicit and use .iloc or .loc.

  • df.at[ ] 只能使用 标签索引
  • df.iat[ ] 只能使用 *整数索引

3.1 df.at[]

  • 获取c行C列的值
df.at['c', 'C']
  • 把c行C列赋值为10
df.at['c', 'C'] = 10

3.2 df.iat[]

  • 获取第三行第三列的值
df.iat[2, 2]
  • 把第三行第三列赋值为10
df.iat[2, 2] = 10

Original: https://blog.csdn.net/weixin_46599926/article/details/122795057
Author: 羊羊猪
Title: pandas行和列的获取

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/675117/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球