2020CCF 路况预测–数据读取部分：将数据格式转化成dataFrame

2023年8月18日上午9:23 • Python • 阅读 69

从最基础的部分开始学习，完整的解决一个任务！本文的代码借鉴自2020CCF冠军的开源代码。简单的数据格式转换是我数据挖掘道路上的拦路虎，今天看到人家的处理方式，惊呼！学到了~

将txt文件中的数据用 open(file)读取出来是一堆字符串类型的数据，处理起来很不方便。以下代码，使用pandas的 read_csv读取 txt文件，使用 sep字段对字符串进行划分。先粗略划分，随后逐步细化、合并，详情看代码。
最后将处理好的dataframe数据并没有用CSV存储，而使用 pkl格式，原因是：pickle文件的读取速度比csv文件读取速度快2倍！csv VS pkl 请查看这篇博文
原txt文件248MB，转换为pkl文件420MB
遇到多个相关数据集，找到共有的字段，并用该字段将多个数据集进行 merge

2020CCF 路况预测–数据读取部分：

将数据格式转化成dataFrame

import pandas as pd
import numpy as np

import torch
import scipy
import sklearn
import logging
from tqdm import tqdm

def get_data(path):

    columns = ['link', 'label', 'current_slice_id', 'future_slice_id', 'recent_feature_1', 'recent_feature_2',
               'recent_feature_3', 'recent_feature_4', 'recent_feature_5']
    for i in range(1, 5):
        for j in range(1, 6):
            columns.append('history_feature_cycle{}_gap{}'.format(i, j))
    data = pd.read_csv(path, header=None, sep=';')
    print(data.shape)

    data['temp'] = data[0].apply(lambda x: x.split())
    for i in range(4):
        data[columns[i]] = data['temp'].apply(lambda x: x[i])

    for i in range(1, 6):
        data['temp'] = data[i].apply(lambda x: x.split())
        for j in range(5 * i - 1, 5 * i + 4):
            data[columns[j]] = data['temp'].apply(lambda x: x[j - 5 * i + 1])

    data = data[columns]
    data_columns = data.columns
    save_columns = []
    for col in tqdm(data_columns):
        if 'feature' in col:
            data['temp'] = data[col].apply(lambda x: x.split(','))
            data[col + '_speed'] = data['temp'].apply(lambda x: float(x[0].split(':')[1]))
            data[col + '_eta'] = data['temp'].apply(lambda x: float(x[1]))
            data[col + '_status'] = data['temp'].apply(lambda x: float(x[2]))
            data[col + '_num_car'] = data['temp'].apply(lambda x: float(x[3]))
            save_columns.extend([col + '_speed', col + '_eta', col + '_status', col + '_num_car'])
    data = data[columns[:4] + save_columns]
    return data

logging.warning(u"get or data success!!!")
logging.warning(u"assert env:")
logging.warning("------------------------------")
logging.warning(u"pd.__version__:{}".format(pd.__version__))
logging.warning(u"sklearn.__version__{}".format(sklearn.__version__))
logging.warning(u'numpy version{}'.format(np.__version__))
logging.warning(u'lgb version{}'.format(lgb.__version__))
logging.warning(u'torch version{}'.format(torch.__version__))
logging.warning(u'torch.cuda.is_available{}'.format(torch.cuda.is_available()))
logging.warning(u'scipy.__version__{}'.format(scipy.__version__))
logging.warning("------------------------------")

if __name__ == '__main__':

    or_data_path = r'D:\DATA\DIDI_CCF_RoadConditionForecast\traffic\{}.txt'
    save_data_path = r'D:\DATA\DIDI_CCF_RoadConditionForecast\train_{}.pkl'
    test_data_path = r'D:\DATA\DIDI_CCF_RoadConditionForecast\20190801_testdata.txt'
    test_save_data_path = r'D:\DATA\DIDI_CCF_RoadConditionForecast\test_new.pkl'

    for i in range(1, 31):

        print(i)
        num = 20190700 + i
        train_df = get_data(or_data_path.format(num))
        train_df['day'] = i
        print(train_df.shape)
        train_df.to_pickle(save_data_path.format(i))

    test = get_data(test_data_path)
    test['day'] = 32
    test.to_pickle(test_save_data_path)

数据一:历史与实时路况 `traffic/201907XX.txt`

每天对应一个 txt文件，总体格式：link_id、label、current_slice_id、future_slice_id、recent_feature、history_feature1、history_feature2、history_feature3、history_feature4

字段名称字段含义备注link路段的idlabel对应时间的link的路况状态：取值 {1,2,3,4}

current_slice_id当前时间片id：
( s l i c e _ i d + 720 ) % 720 (slice_id+720)\%720 (s l i c e _i d +7 2 0 )%7 2 0

future_slice_id待预测时间片id
0 < = f u t u r e _ s l i c e _ i d − c u r r e n t _ s l i c e _ i d < = 30 0

recent_feature1. 近期n个时间片路况特征，n=5，时间片之间空格分隔，字段之间”逗号”分隔。

具体格式： 时间片:路况速度,eta速度,路况状态,参与路况计算的车辆数
特征都为0时，说明此时间片无车经过history_feature历史同期n个时间片路况特征，星期之间”分号”分隔，共4组（-28,-21,-14,-7），每组格式和recent_feature一致；

4表示一个月4个星期的同一天，比如上面recent_feature用的 周一

，那么这四条样本也是 周一

，采用 周一

相同时间点的4天

每天的数据集分别包括：503556、505953、507552、516549、517413、492267、462417、519387、518511、516024、521820、522081、491574、469377、519537、524658、523518、528582、525174、495456、465870、517290、516813、512130、509850、521751、478371、454839、513840、510414条样本，每条样本的数据组成如下：

353495 1 236 245;

232:29.80,32.40,1,4 233:31.60,32.20,1,2 234:20.00,21.90,2,2 235:22.20,25.90,2,5 236:21.30,26.30,2,4;

245:30.00,32.70,0,9 246:30.00,36.10,0,10 247:27.40,35.20,1,12 248:26.90,35.70,1,10 249:28.90,37.00,1,9;
245:36.10,37.30,1,7 246:29.30,38.50,1,7 247:27.70,39.70,1,6 248:28.60,40.20,1,3 249:29.60,38.70,1,4;
245:30.40,40.10,1,6 246:32.30,40.10,1,6 247:30.60,41.10,1,5 248:29.60,39.20,1,4 249:28.00,37.90,1,4;
245:28.30,38.40,1,7 246:28.20,39.40,1,6 247:28.80,35.10,1,3 248:30.00,35.60,1,4 249:29.40,37.20,1,5

以上数据对应的意思如表格所示，分号隔开有六个部分：

第一表示路段的基本信息，包括路段ID、路段的label、当前时间片ID和未来时间片ID： link label current_slice_id future_slice_id;
第二部分表示最近 n=5个时间片的实时路段信息，用recent_feature表示: 每一个时间片的格式为： 时间片ID:路况速度,eta速度,路况状态,参与路况计算的车辆数
第三~六部分表示历史路段信息，包括 4组数据，每组包括 n=5个时间片：用 history_feature表示，时间片格式同上。分别表示recent时间的7天前、14天前、21天前、28天前相同时间片的路况状态。

处理后的格式，二维表格，105个字段：

['link',
 'label',
 'current_slice_id',
 'future_slice_id',
 'day',
 'recent_feature_1_speed',
 'recent_feature_1_eta',
 'recent_feature_1_status',
 'recent_feature_1_num_car',
 'recent_feature_2_speed',
 'recent_feature_2_eta',
 'recent_feature_2_status',
 'recent_feature_2_num_car',
 'recent_feature_3_speed',
 'recent_feature_3_eta',
 'recent_feature_3_status',
 'recent_feature_3_num_car',
 'recent_feature_4_speed',
 'recent_feature_4_eta',
 'recent_feature_4_status',
 'recent_feature_4_num_car',
 'recent_feature_5_speed',
 'recent_feature_5_eta',
 'recent_feature_5_status',
 'recent_feature_5_num_car',
 'history_feature_cycle1_gap1_speed',
 'history_feature_cycle1_gap1_eta',
 'history_feature_cycle1_gap1_status',
 'history_feature_cycle1_gap1_num_car',
 'history_feature_cycle1_gap2_speed',
 'history_feature_cycle1_gap2_eta',
 'history_feature_cycle1_gap2_status',
 'history_feature_cycle1_gap2_num_car',
 'history_feature_cycle1_gap3_speed',
 'history_feature_cycle1_gap3_eta',
 'history_feature_cycle1_gap3_status',
 'history_feature_cycle1_gap3_num_car',
 'history_feature_cycle1_gap4_speed',
 'history_feature_cycle1_gap4_eta',
 'history_feature_cycle1_gap4_status',
 'history_feature_cycle1_gap4_num_car',
 'history_feature_cycle1_gap5_speed',
 'history_feature_cycle1_gap5_eta',
 'history_feature_cycle1_gap5_status',
 'history_feature_cycle1_gap5_num_car',
 'history_feature_cycle2_gap1_speed',
 'history_feature_cycle2_gap1_eta',
 'history_feature_cycle2_gap1_status',
 'history_feature_cycle2_gap1_num_car',
 'history_feature_cycle2_gap2_speed',
 'history_feature_cycle2_gap2_eta',
 'history_feature_cycle2_gap2_status',
 'history_feature_cycle2_gap2_num_car',
 'history_feature_cycle2_gap3_speed',
 'history_feature_cycle2_gap3_eta',
 'history_feature_cycle2_gap3_status',
 'history_feature_cycle2_gap3_num_car',
 'history_feature_cycle2_gap4_speed',
 'history_feature_cycle2_gap4_eta',
 'history_feature_cycle2_gap4_status',
 'history_feature_cycle2_gap4_num_car',
 'history_feature_cycle2_gap5_speed',
 'history_feature_cycle2_gap5_eta',
 'history_feature_cycle2_gap5_status',
 'history_feature_cycle2_gap5_num_car',
 'history_feature_cycle3_gap1_speed',
 'history_feature_cycle3_gap1_eta',
 'history_feature_cycle3_gap1_status',
 'history_feature_cycle3_gap1_num_car',
 'history_feature_cycle3_gap2_speed',
 'history_feature_cycle3_gap2_eta',
 'history_feature_cycle3_gap2_status',
 'history_feature_cycle3_gap2_num_car',
 'history_feature_cycle3_gap3_speed',
 'history_feature_cycle3_gap3_eta',
 'history_feature_cycle3_gap3_status',
 'history_feature_cycle3_gap3_num_car',
 'history_feature_cycle3_gap4_speed',
 'history_feature_cycle3_gap4_eta',
 'history_feature_cycle3_gap4_status',
 'history_feature_cycle3_gap4_num_car',
 'history_feature_cycle3_gap5_speed',
 'history_feature_cycle3_gap5_eta',
 'history_feature_cycle3_gap5_status',
 'history_feature_cycle3_gap5_num_car',
 'history_feature_cycle4_gap1_speed',
 'history_feature_cycle4_gap1_eta',
 'history_feature_cycle4_gap1_status',
 'history_feature_cycle4_gap1_num_car',
 'history_feature_cycle4_gap2_speed',
 'history_feature_cycle4_gap2_eta',
 'history_feature_cycle4_gap2_status',
 'history_feature_cycle4_gap2_num_car',
 'history_feature_cycle4_gap3_speed',
 'history_feature_cycle4_gap3_eta',
 'history_feature_cycle4_gap3_status',
 'history_feature_cycle4_gap3_num_car',
 'history_feature_cycle4_gap4_speed',
 'history_feature_cycle4_gap4_eta',
 'history_feature_cycle4_gap4_status',
 'history_feature_cycle4_gap4_num_car',
 'history_feature_cycle4_gap5_speed',
 'history_feature_cycle4_gap5_eta',
 'history_feature_cycle4_gap5_status',
 'history_feature_cycle4_gap5_num_car']

数据二:道路属性 `attr.txt`

字段名称字段类型字段含义linkidcategoriclink的idlengthnumericlink的长度，以m为单位directioncategoriclink的通行方向pathclasscategoriclink的的功能等级speedclasscategoriclink的速度限制等级LaneNumcategoriclink的车道数speedlimitnumericlink的限速，以m/s为单位levelcategoriclink的levelwidthnumericlink的宽度，以m为单位

用同样的方式读取文件并处理数据：

考虑到category的内存比int64要小很多，读取出数据之后，将其类型进行转换。
数值类型的进行归一化处理

road_attr = pd.read_csv(road_attr_path, header=None, sep='\t')
road_attr.columns=['link','length','direction','path_class','speed_class','LaneNum','speed_limit','level','width']

category_columns = ['direction','path_class','speed_class','LaneNum','level']
for cate in category_columns:
    road_attr[cate] = road_attr[cate].astype('category')
road_attr.memory_usage(deep=True)

def scale_numeric(data,columns):
    for col in columns:
        data[col] = (data[col].values-np.mean(data[col].values))/np.std(data[col].values)

numeric_columns = ['length','speed_limit','width']
scale_numeric(road_attr, numeric_columns)

处理好的 link attr通过 link与训练集进行左连接：

train = train.merge(road_attr,on='link',how='left')
test = test.merge(road_attr,on='link',how='left')

数据三:路网拓扑 `topo.txt`

keyvaluelink_id下游linkid1,下游linkid2,下游linkid3,…

road_topo = pd.read_csv(road_topo_path,header=None, sep='\t')
road_topo.columns = ['link','next_link']
road_topo['next_link'] = road_topo['next_link'].apply(lambda x : [int(i) for i in x.split(',')])

再次通过 link与训练集左连接：

train=train.merge(road_topo,on='link',how='left')
test=test.merge(road_topo,on='link',how='left')

最终

增加时间特征，从日期中获取周几作为新字段，
得到的训练集维度为： 1500w,117，将30天的订单数据处理后沿着 axis:0拼接成 1500w,105，然后拼接上8个 link attr字段，在拼接上3个时间字段，最后拼接1个 next_links得到最终的训练集，其字段如下：

['current_slice_id',
 'day',
 'future_slice_id',
 'history_feature_cycle1_gap1_eta',
 'history_feature_cycle1_gap1_num_car',
 'history_feature_cycle1_gap1_speed',
 'history_feature_cycle1_gap1_status',
 'history_feature_cycle1_gap2_eta',
 'history_feature_cycle1_gap2_num_car',
 'history_feature_cycle1_gap2_speed',
 'history_feature_cycle1_gap2_status',
 'history_feature_cycle1_gap3_eta',
 'history_feature_cycle1_gap3_num_car',
 'history_feature_cycle1_gap3_speed',
 'history_feature_cycle1_gap3_status',
 'history_feature_cycle1_gap4_eta',
 'history_feature_cycle1_gap4_num_car',
 'history_feature_cycle1_gap4_speed',
 'history_feature_cycle1_gap4_status',
 'history_feature_cycle1_gap5_eta',
 'history_feature_cycle1_gap5_num_car',
 'history_feature_cycle1_gap5_speed',
 'history_feature_cycle1_gap5_status',
 'history_feature_cycle2_gap1_eta',
 'history_feature_cycle2_gap1_num_car',
 'history_feature_cycle2_gap1_speed',
 'history_feature_cycle2_gap1_status',
 'history_feature_cycle2_gap2_eta',
 'history_feature_cycle2_gap2_num_car',
 'history_feature_cycle2_gap2_speed',
 'history_feature_cycle2_gap2_status',
 'history_feature_cycle2_gap3_eta',
 'history_feature_cycle2_gap3_num_car',
 'history_feature_cycle2_gap3_speed',
 'history_feature_cycle2_gap3_status',
 'history_feature_cycle2_gap4_eta',
 'history_feature_cycle2_gap4_num_car',
 'history_feature_cycle2_gap4_speed',
 'history_feature_cycle2_gap4_status',
 'history_feature_cycle2_gap5_eta',
 'history_feature_cycle2_gap5_num_car',
 'history_feature_cycle2_gap5_speed',
 'history_feature_cycle2_gap5_status',
 'history_feature_cycle3_gap1_eta',
 'history_feature_cycle3_gap1_num_car',
 'history_feature_cycle3_gap1_speed',
 'history_feature_cycle3_gap1_status',
 'history_feature_cycle3_gap2_eta',
 'history_feature_cycle3_gap2_num_car',
 'history_feature_cycle3_gap2_speed',
 'history_feature_cycle3_gap2_status',
 'history_feature_cycle3_gap3_eta',
 'history_feature_cycle3_gap3_num_car',
 'history_feature_cycle3_gap3_speed',
 'history_feature_cycle3_gap3_status',
 'history_feature_cycle3_gap4_eta',
 'history_feature_cycle3_gap4_num_car',
 'history_feature_cycle3_gap4_speed',
 'history_feature_cycle3_gap4_status',
 'history_feature_cycle3_gap5_eta',
 'history_feature_cycle3_gap5_num_car',
 'history_feature_cycle3_gap5_speed',
 'history_feature_cycle3_gap5_status',
 'history_feature_cycle4_gap1_eta',
 'history_feature_cycle4_gap1_num_car',
 'history_feature_cycle4_gap1_speed',
 'history_feature_cycle4_gap1_status',
 'history_feature_cycle4_gap2_eta',
 'history_feature_cycle4_gap2_num_car',
 'history_feature_cycle4_gap2_speed',
 'history_feature_cycle4_gap2_status',
 'history_feature_cycle4_gap3_eta',
 'history_feature_cycle4_gap3_num_car',
 'history_feature_cycle4_gap3_speed',
 'history_feature_cycle4_gap3_status',
 'history_feature_cycle4_gap4_eta',
 'history_feature_cycle4_gap4_num_car',
 'history_feature_cycle4_gap4_speed',
 'history_feature_cycle4_gap4_status',
 'history_feature_cycle4_gap5_eta',
 'history_feature_cycle4_gap5_num_car',
 'history_feature_cycle4_gap5_speed',
 'history_feature_cycle4_gap5_status',
 'label',
 'link',
 'recent_feature_1_eta',
 'recent_feature_1_num_car',
 'recent_feature_1_speed',
 'recent_feature_1_status',
 'recent_feature_2_eta',
 'recent_feature_2_num_car',
 'recent_feature_2_speed',
 'recent_feature_2_status',
 'recent_feature_3_eta',
 'recent_feature_3_num_car',
 'recent_feature_3_speed',
 'recent_feature_3_status',
 'recent_feature_4_eta',
 'recent_feature_4_num_car',
 'recent_feature_4_speed',
 'recent_feature_4_status',
 'recent_feature_5_eta',
 'recent_feature_5_num_car',
 'recent_feature_5_speed',
 'recent_feature_5_status',
 'length',
 'direction',
 'path_class',
 'speed_class',
 'LaneNum',
 'speed_limit',
 'level',
 'width',
 'week_day',
 'hour',
 'time_gap',
 'next_link']

pandas 内存缩减

def pandas_reduce_mem_usage(df, igore_columns=[]):
    start_mem = df.memory_usage().sum() / 1024 ** 2
    print('Memory usage of dataframe is {:.2f} MB'.format(start_mem))
    start_time = datetime.datetime.now()
    for col in df.columns:
        if col in igore_columns:
            continue
        col_type = df[col].dtype
        if col_type != object:
            c_min = df[col].min()
            c_max = df[col].max()
            print('{} column dtype is {} and begin convert to others'.format(col, col_type))
            if str(col_type)[:3] == 'int':
                if c_min < 0:
                    if c_min >= np.iinfo(np.int8).min and c_max  np.iinfo(np.int8).max:
                        df[col] = df[col].astype(np.int8)
                    elif c_min >= np.iinfo(np.int16).min and c_max  np.iinfo(np.int16).max:
                        df[col] = df[col].astype(np.int16)
                    elif c_min >= np.iinfo(np.int32).min and c_max  np.iinfo(np.int32).max:
                        df[col] = df[col].astype(np.int32)
                    else:
                        df[col] = df[col].astype(np.int64)
                else:
                    if c_min >= np.iinfo(np.uint8).min and c_max  np.iinfo(np.uint8).max:
                        df[col] = df[col].astype(np.uint8)
                    elif c_min >= np.iinfo(np.uint16).min and c_max  np.iinfo(np.uint16).max:
                        df[col] = df[col].astype(np.uint16)
                    elif c_min >= np.iinfo(np.uint32).min and c_max  np.iinfo(np.uint32).max:
                        df[col] = df[col].astype(np.uint32)
                    else:
                        df[col] = df[col].astype(np.uint64)
            else:
                if c_min >= np.finfo(np.float16).min and c_max  np.finfo(np.float16).max:
                    df[col] = df[col].astype(np.float16)
                elif c_min >= np.finfo(np.float32).min and c_max  np.finfo(np.float32).max:
                    df[col] = df[col].astype(np.float32)
                else:
                    df[col] = df[col].astype(np.float64)
            print('\t\tcolumn dtype is {}'.format(df[col].dtype))
        else:
            print('\t\tcolumns dtype is object and will convert to category')
            df[col] = df[col].astype('category')
    end_mem = df.memory_usage().sum() / 1024 ** 2
    end_time = datetime.datetime.now()
    print('consume times: {:.4f}'.format((end_time - start_time).seconds))
    print('Memory usage after optimization is: {:.2f} MB'.format(end_mem))
    print('Decreased by {:.1f}%'.format(100 * (start_mem - end_mem) / start_mem))
    return df

缩减前的文件大小：

Original: https://blog.csdn.net/qq_33866063/article/details/117235093
Author: 小卜妞~
Title: 2020CCF 路况预测–数据读取部分：将数据格式转化成dataFrame

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/752915/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

模拟费用流总结

学习自https://www.luogu.com.cn/blog/command-block/mu-ni-fei-yong-liu-xiao-ji 可以系统地解决一系列贪心问题的思…

Python 2023年10月18日
0065
python调用prometheus_使用Python编写Prometheus监控的方法

要使用python编写Prometheus监控，需要你先开启Prometheus集群。可以参考//www.jb51.net/article/148895.htm 安装。在pytho…

Python 2023年8月14日
0046
Flask框架-SSTI模板注入漏洞

文章目录 * – 使用Flask搭建网站 – 模板渲染 – + * flask有2个渲染方法 * 测试代码1 * 测试代码2 * 存在漏洞的代码…

Python 2023年8月13日
0048
python + django 搭建页面—-登录页面

Python3.7 + Django3.2 + MySQL8.0 文章目录 Python3.7 + Django3.2 + MySQL8.0 一、创建一个web服务器二、创建ap…

Python 2023年8月3日
0075
数据分析基础：数据可视化+数据分析报告（Python实现）

提示：数据展现用来直观高效地向他人展示数据分析结果，也称数据可视化。本文介绍通过python如何快速调用第三方库实现数据可视化。本文内容为日常学习记录，提供了最基础的Python数…

Python 2023年8月30日
0068
100天精通Python（数据分析篇）——第55天：Pandas之DataFrame对象大总结

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。 Original: https://blo…

Python 2023年8月7日
0055
scrapy项目入门指南

Scrapy简介一种纯python实现的，基于twisted异步爬虫处理框架。优点基本组件概念 Scrapy主要包含5大核心组件：引擎（scrapy）调度器（Schedul…

Python 2023年10月4日
0044
利用scrapy进行爬虫（windows系统）

利用scrapy进行爬虫（windows系统）前言一、scrapy是什么？二、步骤 * 1.下载库 2.安装库 3.创建Scrapy项目 4.新建爬虫文件 5.代码 6.项目…

Python 2023年10月3日
0043
scrapy 下载及处理文件和图片

scrapy框架-下载及处理文件和图片前言：scrapy提供下载item中包含的文件及图片，提供了一个可重用的item pipelines，这些pipeline有些共同的方法…

Python 2023年10月2日
0040
Python之pygame基础

文章目录一、下载二、基础应用 * 2.1 生成窗口 2.2 加入图片 2.2 加入声音 2.3 改变颜色 2.4 加入图形三、Pygame飞机 * 3.1 飞机跳跃 3.2 …

Python 2023年9月21日
0058
salesforce零基础学习（一百二十一）Limitation篇之Heap Size Limitation

本篇参考：此前讲过CPU limitation：salesforce零基础学习（一百零二）Limitation篇之 CPU Limit本篇说一下项目中也经常用到的 heap si…

Python 2023年10月14日
0042
docker内安装carla0.8.2, 配置conda, 运行python程序

拉取carla0.8.2镜像 docker pull carlasim/carla:0.8.2 运行carla容器 To run CARLA in off-screen mode:…

Python 2023年9月8日
0054
MongoDB在Java中的使用

MongoDB的使用主要内容有: MongoDB的简单查询操作, MongoDB更新操作, MongoDB聚合查询操作用Java代码实现在最下面，可以做对照简单查询在Jav…

Python 2023年10月8日
0053
【pytest官方文档】解读fixtures – 1.什么是fixtures

在深入了解fixture之前，让我们先看看什么是 测试。其实说白了，测试就是在特定的环境、特定的场景下、执行特定的行为，然后确认结果与…

Python 2023年9月15日
0036
WebGPU实现Ray Packet

大家好~本文在如何用WebGPU流畅渲染百万级2D物体？基础上进行优化，使用WebGPU实现了Ray Packet，也就是将8*8=64条射线作为一个Packet一起去访问BVH的…

Python 2023年10月23日
0045
Matplotlib饼图实例

本人是Python编程爱好者，享受编程的乐趣和喜欢与大家分享学习心得。文章目录前言一、首先引入相关Python库二、实例Demo 1.1 官方Demo 1.2 将实际数据应…

Python 2023年8月31日
0046

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

2020CCF 路况预测–数据读取部分：将数据格式转化成dataFrame

2020CCF 路况预测–数据读取部分：

数据一:历史与实时路况 traffic/201907XX.txt

处理后的格式，二维表格，105个字段：

数据二:道路属性 attr.txt

数据三:路网拓扑 topo.txt

最终

pandas 内存缩减

大家都在看

数据一:历史与实时路况 `traffic/201907XX.txt`

数据二:道路属性 `attr.txt`

数据三:路网拓扑 `topo.txt`