pandas 对含时间戳数据进行提取,读取时间可能是 int、str 的数据类型,因此第一步需要对齐进行时间解析。
创建一个DataFrame 测试数据
import pandas as pd
import numpy as np
df = pd.DataFrame({
'time1': [20210101000000, 20210101000001, '20210101000002', '20210101000003',
'2021-01-01 00:00:04', '2021-01-01 00:00:05', '2021/01/01 00:00:06',
'2021/1/1 00:00:07', '2021-01-01 00:00:08', '2021-01-01 00:00:09'],
'time2': ['2021-01-01 00:00:00:167', '2021-01-01 00:00:01:167', '2021-01-01 00:00:02:167',
'2021-01-01 00:00:03:167', '2021-01-01 00:00:04:167', '2021-01-01 00:00:05:167',
'2021-01-01 00:00:06:167', '2021-01-01 00:00:07:167', '2021-01-01 00:00:08:167',
'2021-01-01 00:00:09:167'],
'name': ['赵一', '钱二', '张三', '李四', '王五', '麻六', '孙七', '周八', '吴九', '郑十'],
'age': [18, 28, 38, 48, 16, 26, 39, 25, 19, 24],
'sex': [True, True, False, True, False, False, True, False, True, False],
'score': [85.5, 59, 78, 88, 69, 96, 90, 85, 69, 79],
})
print(df['time1'][0], type(df['time1'][0]))
print(df['time1'][2], type(df['time1'][2])) ''' 查看数据类型 '''
print('*'*50)
print(df['time2'][0], type(df['time2'][0]))
print('*'*50)
df
_______________________________________________________________________________________________
结果输出:
20210101000000 <class 'int'>
20210101000002 <class 'str'>
**************************************************
2021-01-01 00:00:00:167 <class 'str'>
**************************************************
time1 time2 name age sex score
0 20210101000000 2021-01-01 00:00:00:167 赵一 18 True 85.5
1 20210101000001 2021-01-01 00:00:01:167 钱二 28 True 59.0
2 20210101000002 2021-01-01 00:00:02:167 张三 38 False 78.0
3 20210101000003 2021-01-01 00:00:03:167 李四 48 True 88.0
4 2021-01-01 00:00:04 2021-01-01 00:00:04:167 王五 16 False 69.0
5 2021-01-01 00:00:05 2021-01-01 00:00:05:167 麻六 26 False 96.0
6 2021/01/01 00:00:06 2021-01-01 00:00:06:167 孙七 39 True 90.0
7 2021/1/1 00:00:07 2021-01-01 00:00:07:167 周八 25 False 85.0
8 2021-01-01 00:00:08 2021-01-01 00:00:08:167 吴九 19 True 69.0
9 2021-01-01 00:00:09 2021-01-01 00:00:09:167 郑十 24 False 79.0
pandas默认的时间解析格式 “%Y-%m-%d %H:%M:%S”,由上段代码,可以看出,需要解析的时间格式有 int,str 两种数据格式,需要解析的有标准 格式(秒级),也有(毫秒级)需要解析
pandas 读取中 read_csv( parse_dates=[‘**’]) 参数
''' 存储再读取,验证 pd.read_csv(parse_dates=['time'])的能力 '''
df.to_csv('test.csv', index=None, encoding='utf-8')
df = pd.read_csv('test.csv', parse_dates=['time1','time2'])
print(df['time1'][0], type(df['time1'][0]))
print(df['time1'][2], type(df['time1'][2])) # 解析成功
print('*'*50)
print(df['time2'][0], type(df['time2'][0])) # 解析失败
print('*'*50)
df
_______________________________________________________________________________________________
结果输出:
2021-01-01 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2021-01-01 00:00:02 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
**************************************************
2021-01-01 00:00:00:167 <class 'str'>
**************************************************
time1 time2 name age sex score
0 2021-01-01 00:00:00 2021-01-01 00:00:00:167 赵一 18 True 85.5
1 2021-01-01 00:00:01 2021-01-01 00:00:01:167 钱二 28 True 59.0
2 2021-01-01 00:00:02 2021-01-01 00:00:02:167 张三 38 False 78.0
3 2021-01-01 00:00:03 2021-01-01 00:00:03:167 李四 48 True 88.0
4 2021-01-01 00:00:04 2021-01-01 00:00:04:167 王五 16 False 69.0
5 2021-01-01 00:00:05 2021-01-01 00:00:05:167 麻六 26 False 96.0
6 2021-01-01 00:00:06 2021-01-01 00:00:06:167 孙七 39 True 90.0
7 2021-01-01 00:00:07 2021-01-01 00:00:07:167 周八 25 False 85.0
8 2021-01-01 00:00:08 2021-01-01 00:00:08:167 吴九 19 True 69.0
9 2021-01-01 00:00:09 2021-01-01 00:00:09:167 郑十 24 False 79.0
上面可以看到解析的局限,对于毫秒级的数据,str并未解析
pandas自带 to_datetime() 方法
df = pd.read_csv('test.csv')
print(df['time1'][0], type(df['time1'][0])) # 解析前
print('*'*50)
print(pd.to_datetime(df['time1']))
print(pd.to_datetime(df['time2'])) # ParserError: Unknown string format: 2021-01-01 00:00:00:167 ''' (报错) '''
print(pd.to_datetime(df['time2'], format='%Y-%m-%d %H:%M:%S:%f')) # 进行毫秒级数据解析
______________________________________________________________________________________________
结果输出:
20210101000000 <class 'str'>
**************************************************
0 2021-01-01 00:00:00
1 2021-01-01 00:00:01
2 2021-01-01 00:00:02
3 2021-01-01 00:00:03
4 2021-01-01 00:00:04
5 2021-01-01 00:00:05
6 2021-01-01 00:00:06
7 2021-01-01 00:00:07
8 2021-01-01 00:00:08
9 2021-01-01 00:00:09
Name: time1, dtype: datetime64[ns]
0 2021-01-01 00:00:00.167
1 2021-01-01 00:00:01.167
2 2021-01-01 00:00:02.167
3 2021-01-01 00:00:03.167
4 2021-01-01 00:00:04.167
5 2021-01-01 00:00:05.167
6 2021-01-01 00:00:06.167
7 2021-01-01 00:00:07.167
8 2021-01-01 00:00:08.167
9 2021-01-01 00:00:09.167
Name: time2, dtype: datetime64[ns]
可以看出,解析需要毫秒级,需要制定解析格式为 ” format=’%Y-%m-%d %H:%M:%S:%f’ “,在to_datetime() 方法可以进行指定,其它方法还有待发现,补充。
插入一个补充(笔记)
df_1 = pd.DataFrame({'Time Date':['1/15/2021 11:00','1/15/2021 11:10']})
print (df_1)
df_1['Time Date'] = pd.to_datetime(df_1['Time Date'], format='%m/%d/%Y %H:%M')
print (df_1)
Time Date
0 1/15/2021 11:00
1 1/15/2021 11:10
_________________________
datetime.datetime(2021, 1, 15, 11, 0)
df1 = pd.DataFrame({'Join Date':['1/4/21 14:44','2/3/21 14:44']})
print (df1)
关注点 %y 和 floor()方法
df1['Join Date'] = pd.to_datetime(df1['Join Date'], format='%d/%m/%y %H:%M').dt.floor('D')
print (df1)
Join Date
0 1/4/21 14:44
1 2/3/21 14:44
_________________________
Join Date
0 2021-04-01 14:44:00
1 2021-03-02 14:44:00
插入一个补充(笔记)
df = pd.read_csv('test.csv', chunksize=3)
for i in df:
print(i) # 可以看出 数据分块,不足3行的也会取出,不会遗失
____________________________________________________________________________________________
结果输出:
time1 time2 name age sex score
0 2021-01-01 00:00:00 2021-01-01 00:00:00:167 赵一 18 True 85.5
1 2021-01-01 00:00:01 2021-01-01 00:00:01:167 钱二 28 True 59.0
2 2021-01-01 00:00:02 2021-01-01 00:00:02:167 张三 38 False 78.0
time1 time2 name age sex score
3 2021-01-01 00:00:03 2021-01-01 00:00:03:167 李四 48 True 88.0
4 2021-01-01 00:00:04 2021-01-01 00:00:04:167 王五 16 False 69.0
5 2021-01-01 00:00:05 2021-01-01 00:00:05:167 麻六 26 False 96.0
time1 time2 name age sex score
6 2021-01-01 00:00:06 2021-01-01 00:00:06:167 孙七 39 True 90.0
7 2021-01-01 00:00:07 2021-01-01 00:00:07:167 周八 25 False 85.0
8 2021-01-01 00:00:08 2021-01-01 00:00:08:167 吴九 19 True 69.0
time1 time2 name age sex score
9 2021-01-01 00:00:09 2021-01-01 00:00:09:167 郑十 24 False 79.0
验证 chunksize= 参数设置,不足数据分块的设置大小,也可以读取。
Original: https://blog.csdn.net/xnd_31726/article/details/112297723
Author: 一定波兮
Title: Pandas 毫秒级时间解析
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/675325/
转载文章受原作者版权保护。转载请注明原作者出处!