Amazon Review Dataset数据集记录了用户对亚马逊网站商品的评价,是推荐系统的经典数据集,并且Amazon一直在更新这个数据集,根据时间顺序,Amazon数据集可以分成三类:
Amazon数据集可以根据商品类别分为 Books,Electronics,Movies and TV,CDs and Vinyl等子数据集,这些子数据集包含两类信息:
以2014版数据集为例:
Amazon数据集读取:
因为下载的数据是json文件,不易操作,这里主要介绍如何将json文件转化为csv格式文件。以2014版Amazon Electronics数据集的转化为例:
商品信息读取
import pickle
import pandas as pd
file_path = 'meta_Electronics.json'
fin = open(file_path, 'r')
df = {}
useless_col = ['imUrl','salesRank','related','title','description']
i = 0
for line in fin:
d = eval(line)
for s in useless_col:
if s in d:
d.pop(s)
df[i] = d
i += 1
df = pd.DataFrame.from_dict(df, orient='index')
df.to_csv('meta_Electronics.csv',index=False)
用户评分记录数据读取
file_path = 'Electronics_10.json'
fin = open(file_path, 'r')
df = {}
useless_col = ['reviewerName','reviewText','unixReviewTime','summary']
i = 0
for line in fin:
d = eval(line)
for s in useless_col:
if s in d:
d.pop(s)
df[i] = d
i += 1
df = pd.DataFrame.from_dict(df, orient='index')
df.to_csv('Electronics_10.csv',index=False)
Original: https://blog.csdn.net/springtostring/article/details/113407712
Author: springtostring
Title: Amazon Review Dataset数据集介绍
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/752957/
转载文章受原作者版权保护。转载请注明原作者出处!