记一次python清洗疫情历史数据

2023年7月16日下午7:23 • 人工智能 • 阅读 70

在我2020年大三的一个实训的大作业中，我整了一个新冠肺炎疫情的数据采集和可视化分析系统，大致就是先找数据，然后将数据导入hive中，然后使用hive对数据进行清洗，然后将清洗后的数据使用hql导入MySql，之后就是用ssm开发后台数据接口，然后前端使用echarts和表格对数据进行可视化。具体可以查看：https://qkongtao.cn/?p=514。由于那时候主要要求使用hive处理数据，但那时的数据是来自于某位大佬的数据接口中获取的，最后用hive处理再导入数据库的确是大材小用。因此只是在数据的处理上不太妥，其他对数据的处理和数据的可视化做的还是不错的。
这次是有位小伙伴也想做一个疫情的数据采集和可视化系统，想借鉴我之前做的，并且让我指点。那么问题就来了：之前的数据是比较少的，直接从网上提供的免费接口就可以直接获取，而现在疫情已经过去了两年多，如果要整理出历史各省份、城市每一天的数据，那这个数据就相对庞大，再想找现成的符合功能的接口几乎是没有，因此我做了以下的工作获取数据和处理数据：

数据获取

数据的来源是用了GitHub上这个我收藏了很久的项目：https://lab.isaaclin.cn/nCoV/
数据仓库链接：https://github.com/BlankerL/DXY-COVID-19-Data/releases

这个另外部署了一个数据仓库，每天0点，程序将准时执行，数据会被推送至Release中。
我们就可以从大佬的那个数据仓库直接下载现成爬虫爬取的数据，数据直接下载csv格式的DXYArea.csv就好了，方便用于做处理。
下载后打开，会发现这个92MB的的文件里面有近100W条数据。直接读取的话肯定会有点慢了。
因此这时候我就想到可以尝试使用python的pandas分块读取数据，这个工具对数据处理很方便，对数据的读取也贼快。

; 2. 使用python读取csv

读取csv选择使用pandas模块，使用原生读取很对很慢
注：py脚本文件和csv文件放在同一目录下

import pandas as pd
import numpy as np

filePath = "DXYArea.csv"

def read_csv_feature(filePath):

    f = open(filePath, encoding='utf-8')
    reader = pd.read_csv(f, sep=',', iterator=True)
    loop = True
    chunkSize = 1000000
    chunks = []
    while loop:
        try:
            chunk = reader.get_chunk(chunkSize)
            chunks.append(chunk)
        except StopIteration:
            loop = False
    df = pd.concat(chunks, axis=0, ignore_index=True)
    f.close()
    return df
data = read_csv_feature(filePath)
print('数据读取成功---------------')

csv数据读取成功之后，就全部存在data里面了，而这个data是一个数据集。
可以使用numpy模块工具对数据集进行筛选、导出转换成list,方便对数据进行操作

countryName = np.array(data["countryName"])
countryEnglishName = np.array(data["countryEnglishName"])
provinceName = np.array(data["provinceName"])
province_confirmedCount = np.array(data["province_confirmedCount"])
province_curedCount = np.array(data["province_curedCount"])
province_deadCount = np.array(data["province_deadCount"])
updateTime = np.array(data["updateTime"])
cityName = np.array(data["cityName"])
city_confirmedCount = np.array(data["city_confirmedCount"])
city_curedCount = np.array(data["city_curedCount"])
city_deadCount = np.array(data["city_deadCount"])

这样就把所有需要用到的数据筛选出来了。

3.使用pyhon进行数据清洗

这里的清洗我还是使用了笨方法，很直接暴力的把数据装进对应的list中：


historyed = list()

totaled = list()

provinceed = list()

areaed = list()

for i in range(len(data)):
    if(countryName[i] == "中国"):
        updatetimeList = str(updateTime[i]).split(' ')
        updatetime = updatetimeList[0]

        historyed_temp = list()
        if(provinceName[i] == "中国"):

            if(len(totaled) == 0):
                totaled.append(str(updateTime[i]))
                totaled.append(int(province_confirmedCount[i]))
                totaled.append(int(province_curedCount[i]))
                totaled.append(int(province_deadCount[i]))

            if((len(historyed) > 0) and (str(updatetime) != historyed[len(historyed) - 1][0])):
                historyed_temp.append(str(updatetime))
                historyed_temp.append(int(province_confirmedCount[i]))
                historyed_temp.append(int(province_curedCount[i]))
                historyed_temp.append(int(province_deadCount[i]))
            if(len(historyed) == 0):
                historyed_temp.append(str(updatetime))
                historyed_temp.append(int(province_confirmedCount[i]))
                historyed_temp.append(int(province_curedCount[i]))
                historyed_temp.append(int(province_deadCount[i]))

        if(len(historyed_temp) > 0):
            historyed.append(historyed_temp)

        areaed_temp = list()
        if(provinceName[i] != "中国"):
            if(provinceName[i] != "内蒙古自治区" and provinceName[i] != "黑龙江省"):
                provinceName[i] = provinceName[i][0:2]
            else:
                provinceName[i] = provinceName[i][0:3]
            flag = 1
            for item in areaed:
                if(item[1] == str(cityName[i])):
                    flag = 0
            if(flag == 1):
                areaed_temp.append(str(provinceName[i]))
                areaed_temp.append(str(cityName[i]))
                areaed_temp.append(int(0 if np.isnan(city_confirmedCount[i]) else city_confirmedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_curedCount[i]) else city_curedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_deadCount[i]) else city_deadCount[i]))
                areaed.append(areaed_temp)
            flag = 1
            for item in areaed_tmp:
                if(item[0] == str(provinceName[i])):
                    flag = 0
            if(flag == 1):
                areaed_temp.append(str(provinceName[i]))
                areaed_temp.append(str(cityName[i]))
                areaed_temp.append(int(0 if np.isnan(city_confirmedCount[i]) else city_confirmedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_curedCount[i]) else city_curedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_deadCount[i]) else city_deadCount[i]))
                areaed_tmp.append(areaed_temp)

province_temp = list()
for temp in areaed_tmp:
    if(len(provinceed) == 0 and len(province_temp) == 0):
        province_temp.append(temp[0])
        province_temp.append(temp[2])
        province_temp.append(temp[3])
        province_temp.append(temp[4])
    else:
        if(temp[0] == province_temp[0]):
            province_temp[1] = province_temp[1] + temp[2]
            province_temp[1] = province_temp[2] + temp[3]
            province_temp[1] = province_temp[3] + temp[4]
        else:
            provinceed.append(province_temp)
            province_temp = list()
            province_temp.append(temp[0])
            province_temp.append(temp[2])
            province_temp.append(temp[3])
            province_temp.append(temp[4])
provinceed.append(province_temp)
print('数据清洗成功---------------')

这里没有什么说的，完全是体力活，将上面筛选出来的数据进行清洗，需要注意的是要仔细的观察读取出来的数据的数据格式，有些数据格式不是很标准，需要手动处理。

将清洗的数据自动导入MySql

将数据导入Mysql这里还是使用python，使用了python的pymysql模块

import pymysql

"""
将数据导入数据库
"""

db=pymysql.connect(host="localhost",user="root",password="123456",database="yq")

cursor = db.cursor()

cursor.execute('CREATE DATABASE IF NOT EXISTS yq DEFAULT CHARSET utf8 COLLATE utf8_general_ci;')
print('创建yq数据库成功')

cursor.execute('drop table if exists areaed')
sql="""
CREATE TABLE IF NOT EXISTS areaed  (
  provinceName varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  cityName varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  confirmedCount int(11) NULL DEFAULT NULL,
  deadCount int(11) NULL DEFAULT NULL,
  curedCount int(11) NULL DEFAULT NULL,
  currentCount int(11) NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;
"""
cursor.execute(sql)
cursor.execute('drop table if exists provinceed')
sql="""
CREATE TABLE provinceed  (
  provinceName varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  confirmedNum varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  deathsNum int(11) NULL DEFAULT NULL,
  curesNum int(11) NULL DEFAULT NULL,
  currentNum int(11) NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;
"""
cursor.execute(sql)
cursor.execute('drop table if exists totaled')
sql="""
CREATE TABLE totaled  (
  date varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  diagnosed int(11) NULL DEFAULT NULL,
  death int(11) NULL DEFAULT NULL,
  cured int(11) NULL DEFAULT NULL,
  current int(11) NULL DEFAULT NULL
) ENGINE = MyISAM CHARACTER SET = latin1 COLLATE = latin1_swedish_ci ROW_FORMAT = Dynamic;
"""
cursor.execute(sql)
cursor.execute('drop table if exists historyed')
sql="""
CREATE TABLE historyed  (
  date varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  confirmedNum int(11) NULL DEFAULT NULL,
  deathsNum int(11) NULL DEFAULT NULL,
  curesNum int(11) NULL DEFAULT NULL,
  currentNum int(11) NULL DEFAULT NULL
) ENGINE = MyISAM CHARACTER SET = latin1 COLLATE = latin1_swedish_ci ROW_FORMAT = Dynamic;
"""
cursor.execute(sql)
print('创建相关表成功')

for item in historyed:
    sql='INSERT INTO historyed VALUES(%s,"%s","%s","%s","%s")'
    try:
        cursor.execute(sql,(str(item[0]),item[1],item[3],item[2],item[1]-item[2]-item[3]))
        db.commit()
    except Exception as ex:
        print("error:")
        print("出现如下异常%s"%ex)
        db.rollback()
        break

print("导入historyed成功-------------")

for item in areaed:
    sql='INSERT INTO areaed VALUES(%s,"%s","%s","%s","%s","%s")'
    try:
        cursor.execute(sql,(item[0],item[1],item[2],item[4],item[3],item[2]-item[3]-item[4]))
        db.commit()
    except Exception as ex:
        print("error:")
        print("出现如下异常%s"%ex)
        db.rollback()
        break
print("导入areaed成功-------------")

for item in provinceed:
    sql='INSERT INTO provinceed VALUES(%s,"%s","%s","%s","%s")'
    try:
        cursor.execute(sql,(str(item[0]),item[1],item[3],item[2],item[1]-item[2]-item[3]))
        db.commit()
    except Exception as ex:
        print("error:")
        print("出现如下异常%s"%ex)
        db.rollback()
        break

print("导入provinceed成功-------------")

sql='INSERT INTO totaled VALUES(%s,"%s","%s","%s","%s")'
try:
    cursor.execute(sql,(str(totaled[0]),totaled[1],totaled[3],totaled[2],totaled[1]-totaled[2]-totaled[3]))
    db.commit()
except Exception as ex:
    print("error:")
    print("出现如下异常%s"%ex)
    db.rollback()
print("导入totaled成功-------------")

cursor.close()
db.close()

这里为了脚本的使用方便，首先进行了建库、然后建表、最后将清洗的数据导入MySql

完整代码

import pandas as pd
import numpy as np
import pymysql

"""
@ProjectName: cleanData
@FileName: cleanData.py
@Author: tao
@Date: 2022/05/03
"""

filePath = "DXYArea.csv"

historyed = list()

totaled = list()

provinceed = list()

areaed = list()

def read_csv_feature(filePath):

    f = open(filePath, encoding='utf-8')
    reader = pd.read_csv(f, sep=',', iterator=True)
    loop = True
    chunkSize = 1000000
    chunks = []
    while loop:
        try:
            chunk = reader.get_chunk(chunkSize)
            chunks.append(chunk)
        except StopIteration:
            loop = False
    df = pd.concat(chunks, axis=0, ignore_index=True)
    f.close()
    return df
data = read_csv_feature(filePath)
print('数据读取成功---------------')

areaed_tmp = list()
countryName = np.array(data["countryName"])
countryEnglishName = np.array(data["countryEnglishName"])
provinceName = np.array(data["provinceName"])
province_confirmedCount = np.array(data["province_confirmedCount"])
province_curedCount = np.array(data["province_curedCount"])
province_deadCount = np.array(data["province_deadCount"])
updateTime = np.array(data["updateTime"])
cityName = np.array(data["cityName"])
city_confirmedCount = np.array(data["city_confirmedCount"])
city_curedCount = np.array(data["city_curedCount"])
city_deadCount = np.array(data["city_deadCount"])

for i in range(len(data)):
    if(countryName[i] == "中国"):
        updatetimeList = str(updateTime[i]).split(' ')
        updatetime = updatetimeList[0]

        historyed_temp = list()
        if(provinceName[i] == "中国"):

            if(len(totaled) == 0):
                totaled.append(str(updateTime[i]))
                totaled.append(int(province_confirmedCount[i]))
                totaled.append(int(province_curedCount[i]))
                totaled.append(int(province_deadCount[i]))

            if((len(historyed) > 0) and (str(updatetime) != historyed[len(historyed) - 1][0])):
                historyed_temp.append(str(updatetime))
                historyed_temp.append(int(province_confirmedCount[i]))
                historyed_temp.append(int(province_curedCount[i]))
                historyed_temp.append(int(province_deadCount[i]))
            if(len(historyed) == 0):
                historyed_temp.append(str(updatetime))
                historyed_temp.append(int(province_confirmedCount[i]))
                historyed_temp.append(int(province_curedCount[i]))
                historyed_temp.append(int(province_deadCount[i]))

        if(len(historyed_temp) > 0):
            historyed.append(historyed_temp)

        areaed_temp = list()
        if(provinceName[i] != "中国"):
            if(provinceName[i] != "内蒙古自治区" and provinceName[i] != "黑龙江省"):
                provinceName[i] = provinceName[i][0:2]
            else:
                provinceName[i] = provinceName[i][0:3]
            flag = 1
            for item in areaed:
                if(item[1] == str(cityName[i])):
                    flag = 0
            if(flag == 1):
                areaed_temp.append(str(provinceName[i]))
                areaed_temp.append(str(cityName[i]))
                areaed_temp.append(int(0 if np.isnan(city_confirmedCount[i]) else city_confirmedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_curedCount[i]) else city_curedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_deadCount[i]) else city_deadCount[i]))
                areaed.append(areaed_temp)
            flag = 1
            for item in areaed_tmp:
                if(item[0] == str(provinceName[i])):
                    flag = 0
            if(flag == 1):
                areaed_temp.append(str(provinceName[i]))
                areaed_temp.append(str(cityName[i]))
                areaed_temp.append(int(0 if np.isnan(city_confirmedCount[i]) else city_confirmedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_curedCount[i]) else city_curedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_deadCount[i]) else city_deadCount[i]))
                areaed_tmp.append(areaed_temp)

province_temp = list()
for temp in areaed_tmp:
    if(len(provinceed) == 0 and len(province_temp) == 0):
        province_temp.append(temp[0])
        province_temp.append(temp[2])
        province_temp.append(temp[3])
        province_temp.append(temp[4])
    else:
        if(temp[0] == province_temp[0]):
            province_temp[1] = province_temp[1] + temp[2]
            province_temp[1] = province_temp[2] + temp[3]
            province_temp[1] = province_temp[3] + temp[4]
        else:
            provinceed.append(province_temp)
            province_temp = list()
            province_temp.append(temp[0])
            province_temp.append(temp[2])
            province_temp.append(temp[3])
            province_temp.append(temp[4])
provinceed.append(province_temp)
print('数据清洗成功---------------')

print(totaled)

"""
print(len(provinceed))
for item in provinceed:
    print(item[1]-item[2]-item[3])
"""

"""
将数据导入数据库
"""

db=pymysql.connect(host="localhost",user="root",password="123456",database="yq")

cursor = db.cursor()

cursor.execute('CREATE DATABASE IF NOT EXISTS yq DEFAULT CHARSET utf8 COLLATE utf8_general_ci;')
print('创建yq数据库成功')

cursor.execute('drop table if exists areaed')
sql="""
CREATE TABLE IF NOT EXISTS areaed  (
  provinceName varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  cityName varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  confirmedCount int(11) NULL DEFAULT NULL,
  deadCount int(11) NULL DEFAULT NULL,
  curedCount int(11) NULL DEFAULT NULL,
  currentCount int(11) NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;
"""
cursor.execute(sql)
cursor.execute('drop table if exists provinceed')
sql="""
CREATE TABLE provinceed  (
  provinceName varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  confirmedNum varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  deathsNum int(11) NULL DEFAULT NULL,
  curesNum int(11) NULL DEFAULT NULL,
  currentNum int(11) NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;
"""
cursor.execute(sql)
cursor.execute('drop table if exists totaled')
sql="""
CREATE TABLE totaled  (
  date varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  diagnosed int(11) NULL DEFAULT NULL,
  death int(11) NULL DEFAULT NULL,
  cured int(11) NULL DEFAULT NULL,
  current int(11) NULL DEFAULT NULL
) ENGINE = MyISAM CHARACTER SET = latin1 COLLATE = latin1_swedish_ci ROW_FORMAT = Dynamic;
"""
cursor.execute(sql)
cursor.execute('drop table if exists historyed')
sql="""
CREATE TABLE historyed  (
  date varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  confirmedNum int(11) NULL DEFAULT NULL,
  deathsNum int(11) NULL DEFAULT NULL,
  curesNum int(11) NULL DEFAULT NULL,
  currentNum int(11) NULL DEFAULT NULL
) ENGINE = MyISAM CHARACTER SET = latin1 COLLATE = latin1_swedish_ci ROW_FORMAT = Dynamic;
"""
cursor.execute(sql)
print('创建相关表成功')

for item in historyed:
    sql='INSERT INTO historyed VALUES(%s,"%s","%s","%s","%s")'
    try:
        cursor.execute(sql,(str(item[0]),item[1],item[3],item[2],item[1]-item[2]-item[3]))
        db.commit()
    except Exception as ex:
        print("error:")
        print("出现如下异常%s"%ex)
        db.rollback()
        break

print("导入historyed成功-------------")

for item in areaed:
    sql='INSERT INTO areaed VALUES(%s,"%s","%s","%s","%s","%s")'
    try:
        cursor.execute(sql,(item[0],item[1],item[2],item[4],item[3],item[2]-item[3]-item[4]))
        db.commit()
    except Exception as ex:
        print("error:")
        print("出现如下异常%s"%ex)
        db.rollback()
        break
print("导入areaed成功-------------")

for item in provinceed:
    sql='INSERT INTO provinceed VALUES(%s,"%s","%s","%s","%s")'
    try:
        cursor.execute(sql,(str(item[0]),item[1],item[3],item[2],item[1]-item[2]-item[3]))
        db.commit()
    except Exception as ex:
        print("error:")
        print("出现如下异常%s"%ex)
        db.rollback()
        break

print("导入provinceed成功-------------")

sql='INSERT INTO totaled VALUES(%s,"%s","%s","%s","%s")'
try:
    cursor.execute(sql,(str(totaled[0]),totaled[1],totaled[3],totaled[2],totaled[1]-totaled[2]-totaled[3]))
    db.commit()
except Exception as ex:
    print("error:")
    print("出现如下异常%s"%ex)
    db.rollback()
print("导入totaled成功-------------")

cursor.close()
db.close()

脚本运行效果

数据库可以看到以下表和数据

最后我们的数据就已经有了，此时的数据处理的格式还是参照我之前整的新冠肺炎疫情的数据采集和可视化分析系统对接的，集体后台和可视化的实现可以参考：https://qkongtao.cn/?p=514

Original: https://blog.csdn.net/qq_42038623/article/details/124642785
Author: 不愿意做鱼的小鲸鱼
Title: 记一次python清洗疫情历史数据

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/697062/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

机器学习及其MATLAB实现——BP神经网络

Multiply its output delta and input activation to get the gradient of the weight.、 Bring t…

人工智能 2023年6月13日
0044
基于numpy的非线性回归_一文看懂线性回归（3个优缺点+8种方法评测）

线性回归是很基础的机器学习算法，本文将通俗易懂的介绍线性回归的基本概念，优缺点，8 种方法的速度评测，还有和逻辑回归的比较。什么是线性回归？线性回归的位置如上图所示，它属于机器…

人工智能 2023年6月18日
00129
FigDraw 7. SCI 文章绘图之折线图 (Lineplot)

点击关注，桓峰基因桓峰基因生物信息分析，SCI文章撰写及生物信息基础知识学习：R语言学习，perl基础编程，linux系统命令，Python遇见更好的你 110篇原创内容公众…

人工智能 2023年6月17日
0069
Windows下安装及配置CUDA过程详解

安装及配置过程一、下载安装CUDA Toolkit * 1.查看当前系统所支持CUDA版本 2.官网下载安装合适的CUDA 3.配置环境变量 4.测试CUDA安装是否成功二、下…

人工智能 2023年7月5日
00121
pytorch安装

（一）安装Anaconda 1、详细步骤这里不多说。anaconda安装时会自带安装相应的python版本。安装完成后，可以运行cmd，使用conda –version…

人工智能 2023年7月20日
0051
粒子群优化算法及其应用

产生背景粒子群优化（Particle Swarm Optimization, PSO）算法是由美国普渡大学的Kennedy和Eberhart于1995年提出，它的基本概念源于对鸟…

人工智能 2023年7月27日
0088
Anaconda及pytorch详细安装及使用教程

Anaconda的介绍 Anaconda指的是一个开源的Python发行版本，其包含了conda、Python等180多个科学包及其依赖项。因为包含了大量的科学包，Anacond…

人工智能 2023年7月22日
0062
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublas‘

调用nn.linear时出现RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublas&…

人工智能 2023年7月21日
0043
创造虚拟环境报错An unexpected error has occurred. Conda has prepared the above report.解决方案

一、创造一个虚拟环境报错： An unexpected error has occurred. Conda has prepared the above report. Uploa…

人工智能 2023年7月5日
0073
数字图像处理——双边滤波

双边滤波的实验原理和在python上的具体代码实现图像去噪是用于解决图像由于噪声干扰而导致其质量下降的问题，通过去噪技术可以有效地提高图像质量，增大信噪比，更好的体现原来图像所携…

人工智能 2023年6月17日
0077
Anaconda下载速度慢

问题很多做机器学习和深度学习的小伙伴都需要用到anaconda，很自然就会跑到官网去下载。在国内下载，速度是巨慢的，因为anaconda官网的下载链接是放在国外的服务器上的。解…

人工智能 2023年6月17日
0081
VisionTransformer（二）—— 多头注意力-Multi-Head Attention及其实现

多头注意力-Multi-Head Attention及其实现目录多头注意力-Multi-Head Attention及其实现前言一、为什么要有Attention，注意力是什…

人工智能 2023年7月28日
0070
NLP-词性标注-隐马尔可夫模型实现

NLP-词性标注-隐马尔可夫模型实现一、词性标注二、HMM词性标注构建 * 1.词性标注任务目标 2. 模型状态集合 3. 观察状态集合 4. 状态转移概率分布矩阵 5. 观测…

人工智能 2023年5月31日
0059
迁移学习-域适应损失函数MMD-代码实现及验证

MMD介绍 MMD（Max mean discrepancy 最大均值差异）是迁移学习，尤其是Domain adaptation （域适应）中使用最广泛（目前）的一种损失函数，主要…

人工智能 2023年6月15日
00234
不平衡多分类问题模型评估指标探讨与sklearn.metrics实践

我们在用机器学习、深度学习建模、训练模型过程中，需要对我们模型进行评估、评价，并依据评估结果决策下一步工作策略，常用的评估指标有准确率、精准率、召回率、F1分数、ROC、AUC、M…

人工智能 2023年7月2日
00251
ArcGIS应用基础3 属性表的操作-以人口密度分布图为例

🎯🎯🎯其他GIS空间分析文章目录一、实验名称二、实验目的三、实验准备 1.数据准备 2.软件准备四、实验步骤 1.数据加载 2.属性表添加字段 3.属性表连接 4.字段计…

人工智能 2023年7月15日
0092

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

记一次python清洗疫情历史数据

目录

大家都在看