时间序列库sktime 输入和模型构建介绍

2023年7月15日上午10:50 • 人工智能 • 阅读 103

sktime 序列分析

sktime 是一个新的处理序列数据的库，可以进行多种任务的处理，分类、回归、聚类、预测和注释，是一个值得考察的库。

构建输入

sktime使用的是一种叫做nested数据形式，为了将其它类型的数据转化成nested数据，sktime提共了多种构建输入的工具，适用于不同的数据形式：
1.from_2d_array_to_nested：从2D数据，也叫tabular型数据，是一种仅能容纳单变量的数据形式，这种数据形式中， 0轴表示instance，比如患者，1轴表示该变量多个时间点的测量数据，比如多个时间点药物浓度（变量）的测量数据；
库的引用：from sktime.datatypes._panel._convert import ( from_2d_array_to_nes, from_nested_to_2d_array,is_nested_dataframe,)
2.from_long_to_nested：从纵向数据生成nested数据形式，纵向数据将多次测量的数据沿着0轴进行排列，而1轴需要罗列的几个特征：case_id（instance编号，比如患者），dim_id(维度编号，说明测量了几个变量，比如心率、血压等)，reading_id（另一个名字可能更好理解一些，序列长度series_len，比如24小时）
3.from_multi_index_to_nested：从多索引pandas df 生成nested数据形式，具体说是两个索引，case_id和reading_id, 而维度沿着1轴展开。
4.from_3d_numpy_to_nested：从numpy 3D数据生成nested数据形式， numpy 3D数据和多索引数据相似，只是索引是默认索引。

掌握这几个该是够用了， sktime还是提供了其它的许多读取数据的形式。

注：对于时间序列来说，一般会有年-月-日小时-分钟-秒来表示，如果测量数据是每小时测量一次，那么reading_id 就是0-23， case_id就是天数；如果以上数据处理为每月一个周期，那reading_id 就是24*30，而case_id 就成为月的数量。

以下用sktime提供的代码进行一定的演示：


import sktime
import pandas as pd
from numpy.random import default_rng
from sktime.datatypes._panel._convert import (
    from_2d_array_to_nested,
    from_nested_to_2d_array,
    is_nested_dataframe,
)
rng = default_rng()
X_2d = rng.standard_normal((50, 20))
print(f"The tabular data has the shape {X_2d.shape}")
print(pd.DataFrame(X_2d).head(3))
print(pd.DataFrame(X_2d).tail(3))

output:
The tabular data has the shape (50, 20)
0 1 2 3 4 5 6
0 0.951345 -1.315426 0.214762 -0.136433 0.247084 -1.932268 0.785887
1 -0.010882 1.698942 0.781392 0.007080 -0.734776 0.233316 -1.657900
2 1.018911 -0.964553 -0.682563 0.702798 -0.032204 -1.105005 0.095655

     7         8         9         10        11        12        13  \

0 -0.042764 0.569522 1.017830 0.186783 0.249294 0.244153 -0.985478
1 -0.279308 0.831627 0.765397 1.470033 -0.484943 -0.084951 -1.571112
2 -0.861139 0.122383 -0.389639 1.494302 0.343313 0.297859 0.142106

     14        15        16        17        18        19

0 1.862123 -0.193330 -0.433783 0.750041 -0.500827 -2.268721
1 0.047955 0.664481 -1.374092 -0.187755 1.350386 -0.360479
2 -0.572279 -0.419691 -0.341976 0.008623 0.901157 0.709582
0 1 2 3 4 5 6
47 -2.027696 0.868153 0.083681 0.045453 -0.199436 -0.960754 0.611705
48 1.278521 0.740739 -0.581048 -0.274030 0.559486 -1.960081 -0.527335
49 0.256974 -1.053948 -0.180352 0.495492 -0.229110 3.771682 0.350383

      7         8         9         10        11        12        13  \

47 -0.909587 0.887509 0.960593 -0.712712 0.668460 0.539432 -1.276410
48 -1.160588 0.541308 -1.091403 2.419113 -0.700643 1.003165 -1.082824
49 0.555191 -1.780015 -0.549731 1.503096 -1.293898 0.111052 0.422830

      14        15        16        17        18        19

47 2.669976 0.632130 -0.482352 -0.309763 1.390071 -0.773413
48 -0.796634 0.635805 0.935867 0.563172 -1.042389 -0.277242
49 1.761040 -1.392075 0.873080 -1.138395 -0.788783 1.283540

X_nested = from_2d_array_to_nested(X_2d)
print(f"X_nested is a nested DataFrame: {is_nested_dataframe(X_nested)}")
print(f"The cell contains a {type(X_nested.iloc[0,0])}.")
print(f"The nested DataFrame has shape {X_nested.shape}")
print(X_nested.head(1))
print(X_nested.tail(1))

output:
X_nested is a nested DataFrame: True
The cell contains a

from sktime.datasets import generate_example_long_table
X = generate_example_long_table(num_cases=50, series_len=20, num_dims=5)
print(X.head())
print(X.tail())

纵向数据

case_id dim_id reading_id value
0 0 0 0 0.562555
1 0 0 1 0.280241
2 0 0 2 0.738769
3 0 0 3 0.258843
4 0 0 4 0.354176
case_id dim_id reading_id value
4995 49 4 15 0.734340
4996 49 4 16 0.991652
4997 49 4 17 0.665473
4998 49 4 18 0.272355
4999 49 4 19 0.695632

from sktime.datatypes._panel._convert import from_long_to_nested, from_nested_to_long
X_nested = from_long_to_nested(X)
X_nested.head()
print(f"X_nested is a nested DataFrame: {is_nested_dataframe(X_nested)}")
print(f"The cell contains a {type(X_nested.iloc[0,0])}.")
print(f"The nested DataFrame has shape {X_nested.shape}")
X_nested.iloc[0, 0]

X_nested is a nested DataFrame: True
The cell contains a


from sktime.datasets import make_multi_index_dataframe
from sktime.datatypes._panel._convert import (
    from_multi_index_to_nested,
    from_nested_to_multi_index,
)
X_mi = make_multi_index_dataframe(n_instances=50, n_columns=5, n_timepoints=20)
print(f"The multi-indexed DataFrame has shape {X_mi.shape}")
print(f"The multi-index names are {X_mi.index.names}")
X_mi.head()

output:
The multi-indexed DataFrame has shape (1000, 5)
The multi-index names are [‘case_id’, ‘reading_id’]
var_0 var_1 var_2 var_3 var_4
case_id reading_id
0 0 0.995135 0.038909 0.181639 0.104997 0.122326
1 0.355448 0.991423 0.807475 0.062603 0.175412
2 0.935102 0.403729 0.031880 0.664581 0.438267
3 0.724594 0.291621 0.833333 0.780480 0.201459
4 0.019985 0.743191 0.485572 0.444321 0.146891

X_nested = from_multi_index_to_nested(X_mi, instance_index="case_id")
print(f"X_nested is a nested DataFrame: {is_nested_dataframe(X_nested)}")
print(f"The cell contains a {type(X_nested.iloc[0,0])}.")
print(f"The nested DataFrame has shape {X_nested.shape}")
X_nested.head()

output:
X_nested is a nested DataFrame: True
The cell contains a

构建模型

from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.theta import ThetaForecaster
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error

import matplotlib.pyplot  as plt
y = load_airline()
print(y)

y_train, y_test = temporal_train_test_split(y)
fh = ForecastingHorizon(y_test.index, is_relative=False)
print(fh)
forecaster = ThetaForecaster(sp=12)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
mean_absolute_percentage_error(y_test, y_pred)
y_train.plot()
y_test.plot()
y_pred.plot()
plt.show()

output:
Period
1949-01 112.0
1949-02 118.0
1949-03 132.0
1949-04 129.0
1949-05 121.0
…

1960-08 606.0
1960-09 508.0
1960-10 461.0
1960-11 390.0
1960-12 432.0
Freq: M, Name: Number of airline passengers, Length: 144, dtype: float64
ForecastingHorizon([‘1958-01’, ‘1958-02’, ‘1958-03’, ‘1958-04’, ‘1958-05’, ‘1958-06’,
‘1958-07’, ‘1958-08’, ‘1958-09’, ‘1958-10’, ‘1958-11’, ‘1958-12’,
‘1959-01’, ‘1959-02’, ‘1959-03’, ‘1959-04’, ‘1959-05’, ‘1959-06’,
‘1959-07’, ‘1959-08’, ‘1959-09’, ‘1959-10’, ‘1959-11’, ‘1959-12’,
‘1960-01’, ‘1960-02’, ‘1960-03’, ‘1960-04’, ‘1960-05’, ‘1960-06’,
‘1960-07’, ‘1960-08’, ‘1960-09’, ‘1960-10’, ‘1960-11’, ‘1960-12′],
dtype=’period[M]’, name=’Period’, is_relative=False)

from sktime.classification.interval_based import TimeSeriesForestClassifier
from sktime.datasets import load_arrow_head
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_arrow_head()
print(X.shape)
print(y)
X_train, X_test, y_train, y_test = train_test_split(X, y)
classifier = TimeSeriesForestClassifier()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
accuracy_score(y_test, y_pred)
0.8679245283018868
X.iloc[0,0]

output:
(211, 1)
[‘0’ ‘1’ ‘2’ ‘0’ ‘1’ ‘2’ ‘0’ ‘1’ ‘2’ ‘0’ ‘1’ ‘2’ ‘0’ ‘1’ ‘2’ ‘0’ ‘1’ ‘2’
‘0’ ‘1’ ‘2’ ‘0’ ‘1’ ‘2’ ‘0’ ‘1’ ‘2’ ‘0’ ‘1’ ‘2’ ‘0’ ‘1’ ‘2’ ‘0’ ‘1’ ‘2’
‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’
‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’
‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’
‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘1’ ‘1’ ‘1’
‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’
‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’
‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘1’ ‘2’ ‘2’ ‘2’ ‘2’
‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’
‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’
‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’ ‘2’]
0 -1.963009
1 -1.957825
2 -1.956145
3 -1.938289
4 -1.896657
…

246 -1.841345
247 -1.884289
248 -1.905393
249 -1.923905
250 -1.909153
Length: 251, dtype: float64

from sktime.annotation.adapters import PyODAnnotator
from pyod.models.iforest import IForest
from sktime.datasets import load_airline
y = load_airline()
print(y.shape)
pyod_model = IForest()
pyod_sktime_annotator = PyODAnnotator(pyod_model)
pyod_sktime_annotator.fit(y)
annotated_series = pyod_sktime_annotator.predict(y)
annotated_series

output:
(144,)
1949-01 1
1949-02 0
1949-03 0
1949-04 0
1949-05 0
…

1960-08 1
1960-09 1
1960-10 0
1960-11 0
1960-12 1
Freq: M, Length: 144, dtype: int32

Original: https://blog.csdn.net/skyskytotop/article/details/123895815
Author: 预测模型的开发与应用研究
Title: 时间序列库sktime 输入和模型构建介绍

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/694116/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

anaconda如何配置环境变量

anaconda安装好后，在cmd输入conda，显示：‘conda’ 不是内部或外部命令，也不是可运行的程序或批处理文件。原因是：anaconda没有配…

人工智能 2023年7月4日
00107
聚类分析经典算法（一）

完成实验的过程学习下聚类分析算法内容图片如无法查看请前往原站点访问：http://taoblog421.cn/posts/27782ca8/参考文章：https://develop…

人工智能 2023年6月2日
0094
yolov4-tiny从安装到训练再到python调用接口

（一）安装在GitHub网址https://github.com/AlexeyAB/darknet下载最新版的darknetAB源码解压后会生成名为darknet-master的…

人工智能 2023年7月11日
0074
李宏毅2021春季机器学习课程笔记4：Classfication & Generative Model & Logic Regression & HW2

文章目录 * – 1. Classfication – 2. Generative Model – + 2.1 Method + 2.2 Mod…

人工智能 2023年7月2日
00106
脑电EEG代码开源分享【4.特征提取-时域篇】

往期文章希望了解更多的道友点这里0. 分享【脑机接口 + 人工智能】的学习之路1.1 . 脑电EEG代码开源分享【1.前置准备-静息态篇】1.2 . 脑电EEG代码开源分享【…

人工智能 2023年7月26日
0085
pytorch实现逻辑回归

目录 1. 导入库 2. 定义数据集 2.1 生成数据 2.2 设置label 3. 搭建网络+优化器 4. 训练 5. 绘制决策边界 6. 代码导入库机器学习的任务分为两大类…

人工智能 2023年7月24日
0070
STDP学习机制（使用Brian2仿真）

文章目录前言 STDP学习机制代码仿真 * 1. 创建神经元模型 2. 创建STDP学习机制 3. 运行仿真 4. 仿真结果总结参考文献前言本文简要介绍了STDP学习机…

人工智能 2023年6月16日
00122
DataWhale sklearn学习笔记（一）

线性回归数据生成：生成数据的思路是设定一个二维的函数（维度高了没办法在平面上画出来），根据这个函数生成一些离散的数据点，对每个数据点我们可以适当的加一点波动，也就是噪声，最后看…

人工智能 2023年6月17日
0089
玩转MySQL：分清回滚、重做、逻辑这些日志很重要！

引言任何项目都会有日志，MySQL也不例外，而且MySQL更是其中的佼佼者，日志种类繁多，而本篇的目的就是全解MySQL中的各类日志，如撤销日志、错误日志、慢查询日志、中继日志、…

人工智能 2023年6月26日
0096
【机器学习】什么是过度拟合？如何解决过度拟合？

系列文章目录第九章 Python 机器学习入门之过度拟合问题及解决办法系列文章目录文章目录前言一、什么是过拟合？我们可以通过几个例子来了解一下什么是过拟合，编辑1…

人工智能 2023年6月16日
00111
android通过百度语音合成实现文字转换成语音(TTS)详细教程？

人工智能 2023年5月23日
0094
知识图谱从入门到应用——知识图谱的基础知识

分类目录：《知识图谱从入门到应用》总目录相关文章：· 知识图谱的基础知识· 知识图谱的发展· 知识图谱的应用· 知识图谱的技术结构知识图谱是有学识的人工智能早期的人工智能有很多…

人工智能 2023年6月1日
0082
训练深度学习神经网络的常用5个损失函数

神经网络在训练时的优化首先是对模型的当前状态进行误差估计，然后为了减少下一次评估的误差，需要使用一个能够表示错误函数对权重进行更新，这个函数被称为损失函数。损失函数的选择与神经网…

人工智能 2023年7月12日
0070
基于matlab实现FAST算法进行角点检测【图像处理_01】

快速角点检测角点检测的方法用于图像分析中的特征提取步骤，它是计算机视觉系统中用于从图像中提取相关信息和特征的一种方法。 1.FAST算法根据图像处理时的图像输入等步骤，我们引入…

人工智能 2023年6月22日
00123
【Warning】YOLOV5训练时的ignoring corrupt image/label: [Errno 2]…..,无法全部训练数据集，快速带你解决它

问题描述在使用yolo(yolov5)训练自己的模型时候，有时候会发现出现下面的问题： ignoring corrupt image/label: [Errno 2]&#8230…

人工智能 2023年7月6日
00209
SAR目标检测数据集汇总

SAR目标检测数据集汇总文章目录 * – SAR目标检测数据集汇总 – + 1. MSTAR （1996） + 2. OpenSARShip2.0 （20…

人工智能 2023年6月17日
00110

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

时间序列库sktime 输入和模型构建介绍

sktime 序列分析

构建输入

纵向数据

构建模型

大家都在看