dataframe小技巧

一.基本操作篇

1.新建
自己定义column_name:

import pandas as pd
df = pd.DataFrame(columns=['col1','col2', 'col3', 'col4'])

延用之前的column_name:

df = pd.DataFrame(columns=list(article_df))csc

将一个list的值赋给某列:

df['B'] = listB

2.统计


list(mydf['mycolumnname']).count("xxx")
df['column_name'].unique()
newdf = df[(df.origin == "JFK") & (df.carrier == "B6")]

之二:

content='672844410'
x=df[df['column_name']==content]

之三:

df.query('code=="000002.SZ"')

多条件:


df.query('code=="000002.SZ" | code=="000006.SZ"')

多条件参考2,输入为list:

import pandas as pd
df = pd.DataFrame({'a':[1, 2, 3, 4, 5, 6],
     'b':[1, 2, 3, 4, 5, 6],
     'c':[1, 2, 3, 4, 5, 6]})
query_list = [1, 2]
df_2 = df.query('c not in @query_list')[['a', 'b']]

根据列名筛选:

df['column_name']

看两列:

article_df[['column_name1','column_name2']]

根据行数/列数筛选(iloc):

df.iloc[:5,]
df.iloc[1:5,]
df.iloc[5,0]
df.iloc[1:5,0]
df.iloc[1:5,:5]
df.iloc[2:7,1:3]

看某几列:
1.按多少列

df.iloc[:,[0,1]]

2.按列名

df.loc[:,('列名','列名')]

根据hash_map筛选:

wrong_list = []
map = {'a': 1, 'b': 3, 'c': 7,'d':-1}
for index, row in article_df.iterrows():
    aging=row['key']
    if key is not None and key in map:
        value= map[key]
        if value!=-1:
            if row['key']!=value:
                   wrong_list.append(row['key'])

5.取列名

1.[column for column in df]
2.df.columns.values 返回 array
3.list(df)
4.df.columns 返回Index,可以通过 tolist(), 或者 list(array) 转换为list

6.df与常用类型转化
df ->list:

   list = df.values.tolist()
df=pd.DataFrame(list)

df->array:

 x=df.values

两个array合并:

np.hstack((mm.values,nn.values))

将array转化为DataFrame:

pd.DataFrame(np.hstack((mm.values,nn.values)))

7.设置列名

df.columns=['col1']
df1 = df1.append(df.iloc[0], ignore_index=True)

merge:列拼接

 data = pandas.merge(u_data, u_item, on="movie_id", how='left')
 data = pandas.merge(data, u_user, on="user_id", how='left')

9.df转csv

df.to_csv ("testfoo.csv" , encoding = "utf-8")

10.筛选 isin

`python
import pandas as pd

df = pd.read_excel(‘分类标准-新.xlsx’)

list0=[7664,7669,7674,7679,7684,7690,7695,7700,7706,7664,7711,7716,7721,7727,7732,7737,7743,7711,7748,7753,7758,7764,7769,7774,7780,7748,5047,5047]
a = df[(df[‘元素编码’].isin(list0))].index.tolist()

result=[]
for x in list0:

a = df[(df['元素编码']==x)].index.tolist()
print (a[0])
result.append(a[0])

df.ix[result][‘章节/元素名称’].tolist()
df.ix[result][‘元素名’].tolist()

Original: https://blog.csdn.net/yiweiwei516/article/details/121996069
Author: yiweiwei516
Title: dataframe小技巧

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/741960/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球