from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df1 = pd.DataFrame(mlb.fit_transform(df[‘genres’]),columns=mlb.classes_, index=df.index)
df = df.join(df1)
print (df)
genres Action Adventure Comedy Drama Family \
0 [Drama] 0 0 0 1 0
1 [Music, Drama, Romance] 0 0 0 1 0
2 [Action, Adventure, Comedy] 1 1 1 0 0
3 [Thriller, Romance, Drama] 0 0 0 1 0
4 [Adventure, Family] 0 1 0 0 1
Music Romance Thriller
0 0 0 0
1 1 1 0
2 0 0 0
3 0 1 1
4 0 0 0
如果需要按列表筛选流派添加
reindex
genres = [‘Action’, ‘Adventure’, ‘Comedy’, ‘Drama’]
df1 = pd.DataFrame(mlb.fit_transform(df[‘genres’]),columns=mlb.classes_, index=df.index)
df = df.join(df1.reindex(columns=genres, fill_value=0))
print (df)
genres Action Adventure Comedy Drama
0 [Drama] 0 0 0 1
1 [Music, Drama, Romance] 0 0 0 1
2 [Action, Adventure, Comedy] 1 1 1 0
3 [Thriller, Romance, Drama] 0 0 0 1
4 [Adventure, Family] 0 1 0 0
Original: https://blog.csdn.net/weixin_33054847/article/details/112922651
Author: 狐狸君raphael
Title: python的向量表示_python-dataframe生成表示向量的列
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/677911/
转载文章受原作者版权保护。转载请注明原作者出处!