不知道天气咋样？一起用Python爬取天气数据分析告诉你

2023年8月2日下午11:08 • Python • 阅读 63

前言

今天我们分享一个小案例，获取天气数据，进行可视化分析，带你直观了解天气情况！

一、核心功能设计

总体来说，我们需要先对中国天气网中的天气数据进行爬取，保存为csv文件，并将这些数据进行可视化分析展示。

拆解需求，大致可以整理出我们需要分为以下几步完成：

通过爬虫获取中国天气网7.20-7.21的降雨数据，包括城市，风力方向，风级，降水量，相对湿度，空气质量。
对获取的天气数据进行预处理，分析河南的风力等级和风向，绘制风向风级雷达图。
根据获取的温度和湿度绘制温湿度相关性分析图，进行温度、湿度对比分析。
根据获取的各城市的降雨量，可视化近24小时的每小时时段降水情况。
绘制各城市24小时的累计降雨量。

二、实现步骤

1. 爬取数据

首先我们需要获取各个城市的降雨数据，通过对中国天气网网址分析发现，城市的天气网址为：http://www.weather.com.cn/weather/101180101.shtml。

根据对数据分析，返回的json格式数据，不难发现：

101180101就是代表城市编号
7天的天气预报数据信息在 div标签中并且id=”7d”
日期、天气、温度、风级等信息都在ul和li标签

网页结构我们上面已经分析好了，那么我们就可以来动手爬取所需要的数据了。获取到所有的数据资源之后，可以把这些数据保存下来。

请求网站：

天气网的网址：http://www.weather.com.cn/weather/101180101.shtml。如果想爬取不同的地区只需修改最后的101180101地区编号，前面的weather代表是7天的网页。

def getHTMLtext(url):
    """请求获得网页内容"""
    try:
        r = requests.get(url, timeout = 30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        print("Success")
        return r.text
    except:
        print("Fail")
        return" "

处理数据：

采用BeautifulSoup库对刚刚获取的字符串进行数据提取。获取我们需要的风力方向，风级，降水量，相对湿度，空气质量等。

def get_content(html,cityname):
    """处理得到有用信息保存数据文件"""
    final = []
    bs = BeautifulSoup(html, "html.parser")
    body = bs.body
    data = body.find('div', {'id': '7d'})

    data2 = body.find_all('div',{'class':'left-div'})
    text = data2[2].find('script').string
    text = text[text.index('=')+1 :-2]
    jd = json.loads(text)
    dayone = jd['od']['od2']
    final_day = []
    count = 0
    for i in dayone:
        temp = []
        if count 23:
            temp.append(i['od21'])
            temp.append(cityname+'市')
            temp.append(i['od22'])
            temp.append(i['od24'])
            temp.append(i['od25'])
            temp.append(i['od26'])
            temp.append(i['od27'])
            temp.append(i['od28'])

            final_day.append(temp)
            data_all.append(temp)
        count = count +1

    ul = data.find('ul')
    li = ul.find_all('li')
    i = 0
    for day in li:
        if i < 7 and i > 0:
            temp = []
            date = day.find('h1').string
            date = date[0:date.index('日')]
            temp.append(date)
            inf = day.find_all('p')
            temp.append(inf[0].string)

            tem_low = inf[1].find('i').string

            if inf[1].find('span') is None:
                tem_high = None
            else:
                tem_high = inf[1].find('span').string
            temp.append(tem_low[:-1])
            if tem_high[-1] == '℃':
                temp.append(tem_high[:-1])
            else:
                temp.append(tem_high)

            wind = inf[2].find_all('span')
            for j in wind:
                temp.append(j['title'])

            wind_scale = inf[2].find('i').string
            index1 = wind_scale.index('级')
            temp.append(int(wind_scale[index1-1:index1]))
            final.append(temp)
        i = i + 1
    return final_day,final

城市的天气数据拿到了，同理我们可以根据不同的地区编号获取河南省各个地级市的天气数据。

Citycode = { "郑州": "101180101",
             "新乡": "101180301",
             "许昌": "101180401",
             "平顶山": "101180501",
             "信阳": "101180601",
             "南阳": "101180701",
             "开封": "101180801",
             "洛阳": "101180901",
             "商丘": "101181001",
             "焦作": "101181101",
             "鹤壁": "101181201",
             "濮阳": "101181301",
             "周口": "101181401",
             "漯河": "101181501",
             "驻马店": "101181601",
             "三门峡": "101181701",
             "济源": "101181801",
             "安阳": "101180201"}
citycode_lists = list(Citycode.items())
for city_code in citycode_lists:
    city_code = list(city_code)
    print(city_code)
    citycode = city_code[1]
    cityname = city_code[0]
    url1 = 'http://www.weather.com.cn/weather/'+citycode+ '.shtml'
    html1 = getHTMLtext(url1)
    data1, data1_7 = get_content(html1,cityname)

存储数据：

def write_to_csv(file_name, data, day=14):
    """保存为csv文件"""
    with open(file_name, 'a', errors='ignore', newline='') as f:
        if day == 14:
            header = ['日期','城市','天气','最低气温','最高气温','风向1','风向2','风级']
        else:
            header = ['小时','城市','温度','风力方向','风级','降水量','相对湿度','空气质量']
        f_csv = csv.writer(f)
        f_csv.writerow(header)
        f_csv.writerows(data)
write_to_csv('河南天气.csv',data_all,1)

这样我们就可以把全省的各个地级市天气数据保存下来了。

2. 风向风级雷达图

统计全省的风力和风向，因为风力风向使用 极坐标的方式展现比较清晰，所以我们采用极坐标的方式展现一天的风力风向图，将圆分为8份， 每一份代表一个风向，半径代表平均风力，并且随着风级增高，蓝色加深。

def wind_radar(data):
    """风向雷达图"""
    wind = list(data['风力方向'])
    wind_speed = list(data['风级'])
    for i in range(0,24):
        if wind[i] == "北风":
            wind[i] = 90
        elif wind[i] == "南风":
            wind[i] = 270
        elif wind[i] == "西风":
            wind[i] = 180
        elif wind[i] == "东风":
            wind[i] = 360
        elif wind[i] == "东北风":
            wind[i] = 45
        elif wind[i] == "西北风":
            wind[i] = 135
        elif wind[i] == "西南风":
            wind[i] = 225
        elif wind[i] == "东南风":
            wind[i] = 315
    degs = np.arange(45,361,45)
    temp = []
    for deg in degs:
        speed = []

        for i in range(0,24):
            if wind[i] == deg:
                speed.append(wind_speed[i])
        if len(speed) == 0:
            temp.append(0)
        else:
            temp.append(sum(speed)/len(speed))
    print(temp)
    N = 8
    theta = np.arange(0.+np.pi/8,2*np.pi+np.pi/8,2*np.pi/8)

    radii = np.array(temp)

    plt.axes(polar=True)

    colors = [(1-x/max(temp), 1-x/max(temp),0.6) for x in radii]
    plt.bar(theta,radii,width=(2*np.pi/N),bottom=0.0,color=colors)
    plt.title('河南风级图--Dragon少年',x=0.2,fontsize=16)
    plt.show()

结果如下：

观察可以发现，当天的东北风最多，平均风级达到了1.75级。

3. 温湿度相关性分析

我们可以分析温度和湿度之间是否存在关系，为了更加清楚直观地验证，可以使用 离散点plt.scatter()方法将温度为横坐标、湿度为纵坐标，每个时刻的点在图中点出来，并且计算相关系数。

def calc_corr(a, b):
    """计算相关系数"""
    a_avg = sum(a)/len(a)
    b_avg = sum(b)/len(b)
    cov_ab = sum([(x - a_avg)*(y - b_avg) for x,y in zip(a, b)])
    sq = math.sqrt(sum([(x - a_avg)**2 for x in a])*sum([(x - b_avg)**2 for x in b]))
    corr_factor = cov_ab/sq
    return corr_factor

def corr_tem_hum(data):
    """温湿度相关性分析"""
    tem = data['温度']
    hum = data['相对湿度']
    plt.scatter(tem,hum,color='blue')
    plt.title("温湿度相关性分析图--Dragon少年")
    plt.xlabel("温度/℃")
    plt.ylabel("相对湿度/%")

    plt.show()
    print("相关系数为："+str(calc_corr(tem,hum)))

结果如下：

观察可以发现，一天的温度和湿度具有强烈的相关性，呈负相关。当温度较低时，空气中水分含量较多，湿度自然较高，而温度较高时，水分蒸发，空气就比较干燥，湿度较低。

4. 24小时内每小时时段降水

from pyecharts import options as opts
from pyecharts.charts import Map,Timeline

def timeline_map(data):
    tl = Timeline().add_schema(play_interval =300,height=40,is_rewind_play=False,orient = "horizontal",is_loop_play = True,is_auto_play=False)
    for h in time_line_final:
        x =data[data["小时"]==h]['城市'].values.tolist()
        y=data[data["小时"]==h]['降水量'].values.tolist()
        map_shape = (
            Map()
            .add("{}h时降水量（mm）".format(h),[list(z) for z in zip(x, y)],"河南")
            .set_series_opts(label_opts=opts.LabelOpts("{b}"))
            .set_global_opts(
                title_opts=opts.TitleOpts(title="河南省降雨分布--Dragon少年"),
                visualmap_opts=opts.VisualMapOpts(max_=300,
                                                  is_piecewise=True,
                                                  pos_top = "60%",
                                                  pieces=[
                                                        {"min": 101, "label": '>100ml', "color": "#FF0000"},
                                                        {"min": 11, "max": 50, "label": '11-50ml', "color": "#FF3333"},
                                                        {"min": 6, "max": 10, "label": '6-10ml', "color": "#FF9999"},
                                                        {"min": 0.1, "max": 5, "label": '0.1-5ml', "color": "#FFCCCC"}])
        ))
        tl.add(map_shape, "{}h".format(h))
    return tl
timeline_map(data).render("rainfall.html")

效果如下：

观察可以发现，7.20当天大部分地区均有雨，7月20日17时-18时一小时的降水量达到了201.9mm。

4. 24小时累计降雨量

from pyecharts import options as opts
from pyecharts.charts import Map,Timeline

time_line_final = list(data1['小时'].iloc[0:24])
def timeline_map(data1):
    tl = Timeline().add_schema(play_interval =200,height=40,is_rewind_play=False,orient = "horizontal",is_loop_play = True,is_auto_play=True)
    for h in time_line_final:
        x =data1[data1["小时"]==h]['城市'].values.tolist()
        y=data1[data1["小时"]==h]['降水量'].values.tolist()
        map_shape1 = (
            Map()
            .add("{}h时累计降水量（mm）".format(h),[list(z) for z in zip(x, y)],"河南")
            .set_series_opts(label_opts=opts.LabelOpts("{b}"))
            .set_global_opts(
                title_opts=opts.TitleOpts(title="河南省累计降雨分布--Dragon少年"),
                visualmap_opts=opts.VisualMapOpts(max_=300,
                                                  is_piecewise=True,
                                                  pos_top = "60%",
                                                  pieces=[
                                                        {"min": 251, "label": '特大暴雨', "color": "#800000"},
                                                        {"min": 101, "max": 250, "label": '暴雨', "color": "#FF4500"},
                                                        {"min": 51, "max": 100, "label": '暴雨', "color": "#FF7F50"},
                                                        {"min": 25, "max": 50, "label": '大雨', "color": "#FFFF00"},
                                                        {"min": 10, "max": 25, "label": '中雨', "color": "#1E90FF"},
                                                        {"min": 0.1, "max": 9.9, "label": '小雨', "color": "#87CEFA"}])
        ))
        tl.add(map_shape1, "{}h".format(h))
    return tl
timeline_map(data1).render("rainfalltoall_1.html")

效果如下：

至此，天气数据分析可视化就完成啦~

今天我们就到这里，明天继续努力！

若本篇内容对您有所帮助，请三连点赞，关注，收藏支持下。

创作不易，白嫖不好，各位的支持和认可，就是我创作的最大动力，我们下篇文章见！

Dragon少年 | 文

如果本篇博客有任何错误，请批评指教，不胜感激！

Original: https://blog.csdn.net/hhladminhhl/article/details/119154761
Author: Dragon少年
Title: 不知道天气咋样？一起用Python爬取天气数据分析告诉你

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/731732/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

安装numpy问题总结

一、WARNING，YOU are using pip version 22.0.4；however，version 22.1 is available 原因分析：提醒您正在使用p…

Python 2023年8月23日
0050
DataGear 制作基于Vue2、Element UI弹窗效果的数据可视化看板

DataGear 在4.4.0版本新增了 dg-chart-manual-render特性，用于手动控制看板内图表的渲染，而非在页面加载时自动渲染。利用这一特性，可以很方便制作具有…

Python 2023年11月6日
0057
conda的源配置

修改conda源可以直接修改.condarc文件，也可以通过代码conda config –add添加 conda的源和大多数编程软件一样，都有清华镜像，中科大镜像等等…

Python 2023年9月9日
0059
分布式存储系统之Ceph集群访问接口启用

前文我们使用ceph-deploy工具简单拉起了ceph底层存储集群RADOS，回顾请参考https://www.cnblogs.com/qiuhom-1874/p/1672447…

Python 2023年10月20日
0067
Numpy简介和特点（一）

一、NumPy是什么？ NumPy(Numerical Python) 是 Python 语言的一个扩展程序库，支持大量的维度数组与矩阵运算，此外也针对数组运算提供大量的数学函数库…

Python 2023年8月25日
0053
【Python 小白到精通 | 课程笔记】第三章：数据处理就像侦探游戏（函数和包）

文章目录 🚩 写在前面划分学习内容学到的一些操作（简单的罗列）保留问题 🌵 课后作业 * 1、写出1960年GDP最高的国家：有一行是World不是国家 – 问题…

Python 2023年8月24日
0083
大数据关键技术：常规机器学习方法

机器学习方法简介机器学习研究和构建的是一种特殊算法（而非某一个特定的算法），能够让计算机自己在数据中学习从而进行预测。 Arthur Samuel给出的定义指出，机器学习是这样的…

Python 2023年10月15日
0056
面试-Django实现注册短信验证码发送

注册的逻辑注册需要的参数用户名，密码等，主要是图片验证码等输入输入图片验证码之后，点击获得验证码，这时候要验证图片验证码的正确性图片验证码正确才会发生短信，用户收到短信之后，…

Python 2023年8月3日
0043
Chapter1数据分析

一.数据分析的介绍二.jupyter和conda的使用三.matplotlib的基础绘图四.matplotlib的基础绘图和调整x轴的刻度五.matplotlib设置显示中…

Python 2023年9月4日
0053
计算机网络原理谢希仁（第8版）第四章习题答案

4-01 网络层向上提供的服务有哪两种？试比较其优缺点。面向连接的和无连接。面向连接优点：通过虚电路发送分组，分组只用填写虚电路编号，分组开销较小；分组按序达到终点。面向连…

Python 2023年9月30日
0066
【4天快速入门Python数据挖掘之第1天】Matplotlib的使用

🔥一个人走得远了，就会忘记自己为了什么而出发，希望你可以不忘初心，不要随波逐流，一直走下去🎶🦋 欢迎关注🖱点赞👍收藏🌟留言🐾🦄 本文由程序喵正在路上原创，CSDN首发！💖 系列…

Python 2023年8月30日
0054
flask 定时任务 flask-apscheduler

目录基本使用 interval启动方式 cron启动方式使用装饰器定时启动任务 flask-apscheduler将 apscheduler移植到了 flask应用中，使得在 …

Python 2023年8月11日
0073
python对王者荣耀英雄皮肤进行图片采集~

Original: https://www.cnblogs.com/Qqun261823976/p/16522249.htmlAuthor: python倩Title: pytho…

Python 2023年5月23日
0086
【数据处理】Pandas读取CSV文件示例及常用方法（入门）

文章目录 * – 1. 导入常用包 – 2. 文件读取 – 3. 查看有哪些列 – 4. 查看前几行数据 – 5. 查看…

Python 2023年7月31日
0067
python测试框架之pytest （一）

概述 pytest官方文档介绍： pytest: helps you write better programs pytest is a framework that makes …

Python 2023年9月11日
0030
Pandas数据分析

什么是Pandas？一、读取数据 * 读取csv文件读取txt文件，自己指定分隔符、列名读取EXCEl文件读取MySQL数据库二、Pandas数据结构 * 仅有数据列表即…

Python 2023年8月7日
0053

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31