一、分析背景
这两天乌克兰局势日趋紧张,任何战争最终都会伤害无辜平民,所以没有真正的赢家!
[En]
The situation in Ukraine has become increasingly tense in the past two days, and innocent civilians will be hurt in the end by any war, so there is no real winner!
愿战争早日结束,世界和平!
[En]
Pray for an early end to the war and world peace!
油管上讨论乌克兰局势的评论声音不断,采用python的文本情感分析技术,挖掘网友舆论导向。
二、整体思路
选取5个近期”乌克兰”相关视频,分析每个视频下的Top300热评:
- 爬虫采集评论(requests)
- 情感分类打分、打标判定结果(积极/中性/消极)(中文用SnowNLP,英文用TextBlob)
- 统计出Top10高频词(jieba.analyse)
- 绘制词云图(wordcloud)
三、代码讲解
3.1 爬虫采集
爬虫程序还是用上一次爬李子奇油管评论的程序,这里就不重复了。
[En]
The crawler program still uses the procedure of climbing Li Ziqi oil pipe comments last time, so I won’t repeat it here.
封装爬虫程序以捕获多个视频评论:
[En]
Encapsulate the crawler program to capture multiple video comments:
video_id_list = ['pYLjb7xIbOk', 'HnFnyNEuCUk', 'F0lYqJmGf-M', 't51ebUWe0Ag', '0RiEMEpKqic']
def download_comments(video_id_list):
"""
下载视频评论
:param video_id_list: 视频id列表
:return: None
"""
cnt = 1
for id in video_id_list:
print('正在爬取第{}个视频的评论'.format(cnt))
cmd = r"python downloader.py -y={} -o={}.json -s 0 -l 300".format(id, id) # 按热门排序,爬取前300条评论
print('开始爬取:{}'.format(id))
a = os.system(cmd) # 执行爬取评论命令
print('结束爬取:{}'.format(id))
cnt += 1
print('爬取完成:{}'.format(id))
这样,就把5个代表性视频的前300条热门评论爬取到了,爬取下来是json文件,转换为excel文件:
把json批量转换为excel
for file in os.listdir('./'):
if file.endswith('json'):
print(file)
f_head, f_tail = file.split(".")
print(f_head, " || ", f_tail)
try:
df = pd.read_json(file, lines=True)
df.to_excel('{}.xlsx'.format(f_head), index=False, engine='xlsxwriter', encoding='UTF-8')
except Exception as e:
print('Excepted-》{}: {}'.format(file, str(e)))
3.2 情感判定
对于每个评论数据,给情感打分,确定情感结果,核心逻辑代码:
[En]
For each comment data, score the emotion, determine the emotion result, the core logic code:
if not is_chinese(comment): # 不是中文,是英文评论
judge = TextBlob(comment)
sentiments_score = judge.sentiment.polarity
if sentiments_score < 0:
tag = '消极'
elif sentiments_score == 0:
tag = '中性'
else:
tag = '积极'
else: # 是中文评论
sentiments_score = SnowNLP(comment).sentiments
if sentiments_score < 0.5:
tag = '消极'
else:
tag = '积极'
当然,我们还可以统计正面、中性和负面的百分比,并绘制饼图,这对分析结果更具说服力。
[En]
Of course, we can also count the percentage of positive, neutral and negative, and draw a pie chart, which is more convincing to the analysis results.
3.3 Top10高频词
用jieba自带的统计功能,直接获取到高频词和权重,就不要自己造轮子了!
用jieba分词统计评论内容的前10关键词
keywords_top10 = jieba.analyse.extract_tags(v_cmt_str, withWeight=True, topK=10)
topK参数传入几,就是统计前几名。
3.4 词云图
采用wordcloud库绘制词云图,词云图也是一种体现高频词的统计方式。
def make_wordcloud(v_str, v_stopwords, v_outfile):
"""
绘制词云图
:param v_str: 输入字符串
:param v_stopwords: 停用词
:param v_outfile: 输出文件
:return: None
"""
print('开始生成词云图:{}'.format(v_outfile))
try:
stopwords = v_stopwords # 停用词
backgroud_Image = plt.imread('乌克兰地图.jpg') # 读取背景图片
wc = WordCloud(
scale=4, # 清晰度
background_color="white", # 背景颜色
max_words=1500, # 最大词数
width=1500, # 图宽
height=1200, # 图高
font_path='/System/Library/Fonts/SimHei.ttf', # 字体文件路径,根据实际情况(Mac)替换
# font_path="C:\Windows\Fonts\simhei.ttf", # 字体文件路径,根据实际情况(Windows)替换
stopwords=stopwords, # 停用词
mask=backgroud_Image, # 背景图片
)
wc.generate(v_str) # 生成词云图
wc.to_file(v_outfile) # 保存图片文件
print('词云文件保存成功:{}'.format(v_outfile))
except Exception as e:
print('make_wordcloud except: {}'.format(str(e)))
wordcloud的核心参数说明,我已经加到注释上了↑,请查阅。
四、得出结论
从情绪判断、高频词统计和词云图来看,网友在本次活动的负面中性情绪中占了很大一部分。
[En]
From the emotional judgment, high-frequency word statistics and word cloud map, netizens accounted for a large part of the negative and neutral emotions of this event.
而细看正面的评论,很多都是祈求乌克兰和其他人的祝福,所以不是对战争的正面评价。
[En]
And take a closer look at the positive comments, many of them are praying for the blessings of Ukraine and other people, so they are not positive comments on the war.
因此,总体来看,负面评论较多。
[En]
Therefore, on the whole, there are more negative comments.
五、同步视频演示
https://www.zhihu.com/zvideo/1480526739331387392
六、附完整源码
附完整源码:点击这里完整源码
更多源码案例 ->马哥python说
Original: https://www.cnblogs.com/mashukui/p/16247849.html
Author: 马哥python说
Title: 【爬虫+情感判定+Top10高频词+词云图】”乌克兰”油管热评python舆情分析
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/499624/
转载文章受原作者版权保护。转载请注明原作者出处!