【金融】【随机森林】使用随机森林对期货数据（涨跌）进行回归

2023年6月18日下午2:56 • 人工智能 • 阅读 97

【金融】【随机森林】使用随机森林对期货数据（涨跌）进行回归

RF-RF_train3year3month
*
读取数据
划分训练集与数据集，3年+3月，以此类推
取特定数据
Exponential smoothing
Feature Extraction – Technical Indicators
Prepare the data with a prediction horizon of 10 days
分数据集依次训练
查看指标

参考《jmartinezheras/reproduce-stock-market-direction-random-forests》《基于随机森林做回归任务（数据预处理、MAPE指标评估、可视化展示、特征重要性、预测和实际值差异显示图）》

RF-RF_train3year3month

读取数据

%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (7,4.5)

import numpy as np
import random

np.random.seed(42)
random.seed(42)

import pandas_techinal_indicators as ta

import pandas as pd
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import f1_score, precision_score, confusion_matrix, recall_score, accuracy_score
from sklearn.model_selection import train_test_split

df_org = pd.read_csv(r'F:/data/market_index_data/hengSheng_0404.csv')
df_org.head()

划分训练集与数据集，3年+3月，以此类推

为什么要这么划分？详见：
M’Ng J , Mehralizadeh M . Forecasting East Asian Indices Futures via a Novel Hybrid of Wavelet-PCA Denoising and Artificial Neural Network Models[J]. PLOS ONE, 2016, 11.

train_ptr = []
test_ptr = []
end_ptr = []

date_flag = [2, 3*12+2, 3*12+5]

for i in range(0, len(df_org)):
    num = (df_org.iloc[i]['year'] - 2006) * 12 + df_org.iloc[i]['month']
    if num == date_flag[0]:
        train_ptr.append(i)
        date_flag[0] += 3
    if num == date_flag[1]:
        test_ptr.append(i)
        date_flag[1] += 3
    if num == date_flag[2]:
        end_ptr.append(i)
        date_flag[2] += 3

print(len(end_ptr))

取特定数据

aapl = df_org[['Open', 'High', 'Low', 'Close', 'Volume']]
aapl.head()

Exponential smoothing

def get_exp_preprocessing(df, alpha=0.9):
    edata = df.ewm(alpha=alpha).mean()
    return edata

saapl = get_exp_preprocessing(aapl)
saapl.head()

Feature Extraction – Technical Indicators

def feature_extraction(data):
    for x in [5, 14, 26, 44, 66]:

        data = ta.relative_strength_index(data, n=x)
        data = ta.stochastic_oscillator_d(data, n=x)
        data = ta.accumulation_distribution(data, n=x)
        data = ta.average_true_range(data, n=x)
        data = ta.momentum(data, n=x)
        data = ta.money_flow_index(data, n=x)
        data = ta.rate_of_change(data, n=x)
        data = ta.on_balance_volume(data, n=x)
        data = ta.commodity_channel_index(data, n=x)
        data = ta.ease_of_movement(data, n=x)
        data = ta.trix(data, n=x)
        data = ta.vortex_indicator(data, n=x)

    data['ema50'] = data['Close'] / data['Close'].ewm(50).mean()
    data['ema21'] = data['Close'] / data['Close'].ewm(21).mean()
    data['ema14'] = data['Close'] / data['Close'].ewm(14).mean()
    data['ema5'] = data['Close'] / data['Close'].ewm(5).mean()

    data = ta.macd(data, n_fast=12, n_slow=26)

    del(data['Open'])
    del(data['High'])
    del(data['Low'])
    del(data['Volume'])

    return data

def compute_prediction_int(df, n):
    pred = (df.shift(-n)['Close'] >= df['Close'])
    pred = pred.iloc[:-n]
    return pred.astype(int)

def prepare_data(df, horizon):
    data = feature_extraction(df).dropna().iloc[:-horizon]
    data['pred'] = compute_prediction_int(data, n=horizon)
    del(data['Close'])
    return data.dropna()

Prepare the data with a prediction horizon of 10 days


data = prepare_data(saapl, 10)

y = data['pred']

features = [x for x in data.columns if x not in ['gain', 'pred']]
X = data[features]

print(list(data.columns))

分数据集依次训练

rf = RandomForestClassifier(n_jobs=-1, n_estimators=200, random_state=42)

accuracy_his = []
recall_his = []
f1_his = []
precision_his = []

all_p = np.array([])
all_prob = np.array([])

for k in range(len(end_ptr)):
    if end_ptr[k] >= len(X):
        break

    X_train = X[train_ptr[0]:test_ptr[k]]
    y_train = y[train_ptr[0]:test_ptr[k]]
    X_test = X[test_ptr[k]:end_ptr[k]]
    y_test = y[test_ptr[k]:end_ptr[k]]
    print('\nDataSet No.{} data row {}-{}-{}'.format(k, train_ptr[k], test_ptr[k], end_ptr[k]))

    rf.fit(X_train, y_train.values.ravel());

    pred = rf.predict(X_test)

    prob = rf.predict_proba(X_test)
    if k == 0:
        all_prob = prob
    else :
        all_prob = np.concatenate((all_prob, prob), axis=0)

    all_p = np.concatenate((all_p,pred))

    precision = precision_score(y_pred=pred, y_true=y_test)
    recall = recall_score(y_pred=pred, y_true=y_test)
    f1 = f1_score(y_pred=pred, y_true=y_test)
    accuracy = accuracy_score(y_pred=pred, y_true=y_test)
    print('precision: {0:1.2f}, recall: {1:1.2f}, f1: {2:1.2f}, accuracy: {3:1.2f}'.format(precision, recall, f1, accuracy))
    accuracy_his.append(accuracy)
    recall_his.append(recall)
    f1_his.append(f1)
    precision_his.append(precision)

    confusion = confusion_matrix(y_pred=pred, y_true=y_test)
    print('Confusion Matrix')
    print(confusion)

查看指标

print(np.mean(accuracy_his))
print(np.mean(recall_his))
print(np.mean(precision_his))

Acc = accuracy_score(all_p, y[test_ptr[0]:end_ptr[42]])
print('------------- DataSet:, Accuracy:{:.5f} -------------'.format(Acc))
Pc = precision_score(all_p, y[test_ptr[0]:end_ptr[42]])
print('------------- DataSet:, Precision:{:.5f} -------------'.format(Pc))
Recall = recall_score(all_p, y[test_ptr[0]:end_ptr[42]])
print('------------- DataSet:, Recall:{:.5f} -------------'.format(Recall))
f1 = f1_score(all_p, y[test_ptr[0]:end_ptr[42]])
print('------------- DataSet:, Specificity:{:.5f} -------------'.format(f1))

year_accuracy_his = []
year_precision_his = []
year_recall_his = []
year_f1_his = []

for i in range(int(len(accuracy_his) / 4)):
    left = test_ptr[i*4] - test_ptr[0]
    right = end_ptr[i*4+3] - test_ptr[0]
    item_y = y[left:right]
    item_p = all_p[left:right]

    Acc = accuracy_score(item_p, item_y)
    year_accuracy_his.append(Acc)
    print('------------- DataSet:{}, Accuracy:{:.5f} -------------'.format(i, Acc))
    Pc = precision_score(item_p, item_y)
    year_precision_his.append(Pc)
    print('------------- DataSet:{}, Precision:{:.5f} -------------'.format(i, Pc))
    Recall = recall_score(item_p, item_y)
    year_recall_his.append(Recall)
    print('------------- DataSet:{}, Recall:{:.5f} -------------'.format(i, Recall))
    f1 = f1_score(item_p, item_y)
    year_f1_his.append(f1)
    print('------------- DataSet:{}, Specificity:{:.5f} -------------'.format(i, f1))

if len(accuracy_his)%4 != 0:
    left = test_ptr[len(accuracy_his)-(len(accuracy_his)%4)-1] - test_ptr[0]
    right = end_ptr[len(accuracy_his)-1] - test_ptr[0]
    item_y = y[left:right]
    item_p = all_p[left:right]

    Acc = accuracy_score(item_p, item_y)
    year_accuracy_his.append(Acc)
    print('------------- DataSet:final, Accuracy:{:.5f} -------------'.format(Acc))
    Pc = precision_score(item_p, item_y)
    year_precision_his.append(Pc)
    print('------------- DataSet:fianl, Precision:{:.5f} -------------'.format(Pc))
    Recall = recall_score(item_p, item_y)
    year_recall_his.append(Recall)
    print('------------- DataSet:fianl, Recall:{:.5f} -------------'.format(Recall))
    f1 = f1_score(item_p, item_y)
    year_f1_his.append(f1)
    print('------------- DataSet:fianl, Specificity:{:.5f} -------------'.format(f1))

plt.figure(figsize=(20,7))
plt.plot(np.arange(len(year_accuracy_his)), year_accuracy_his, label='year_accuracy')
plt.plot(np.arange(len(year_recall_his)), year_recall_his, label='year_recall')
plt.plot(np.arange(len(year_f1_his)), year_f1_his, label='year_f1_his')
plt.plot(np.arange(len(year_precision_his)), year_precision_his, label='year_precision')

plt.axhline(y=0.5, c='r', ls='--', lw=2)
plt.legend();
plt.show()

plt.figure(figsize=(20,7))
plt.plot(np.arange(len(accuracy_his)), accuracy_his, label='accuracy')
plt.plot(np.arange(len(recall_his)), recall_his, label='recall')
plt.plot(np.arange(len(f1_his)), f1_his, label='f1')
plt.plot(np.arange(len(precision_his)), precision_his, label='precision')
plt.grid(True, ls=':', c='r')
plt.axhline(y=0.5, c='r', ls='--', lw=2)
plt.legend();
plt.show()

Original: https://blog.csdn.net/qq_40690815/article/details/119520473
Author: StevenGerrad
Title: 【金融】【随机森林】使用随机森林对期货数据（涨跌）进行回归

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/635546/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

3西格玛计算公式_6 西格玛相关的计算

文件筛选文件并复制到本机策略，使用python os包处理，缺点设备电脑没有安装python。所有更换策略，直接使用dos 命令，制作成bat脚本然后运行。筛选文件遇到的问题，…

人工智能 2023年7月9日
0061
OpenCV-Python实战（18）——深度学习简介与入门示例

[ _OpenCV_是一款非常强大的计算机视觉库，其中包含了很多功能强大的图像处理和计算机视觉算法。而在这个系列的第三篇文章中，我们将重点介绍如何在 _OpenCV_中绘制图形和文…

人工智能 2023年6月17日
00124
Pytorch交叉熵损失（CrossEntropyLoss）函数内部运算解析

CLASS torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=- 100,reduce=…

人工智能 2023年6月17日
0095
【四种环境安装】深度学习中GPU、CPU版本的tensorflow以及pytorch的安装

建立一个tensorflow activate conda create -n xxxx python=3.7 这里的xxxx代表你需要命名的环境当中的名字例如我将GPU版本的py…

人工智能 2023年5月24日
00121
1【源码】数据可视化：基于 Echarts +Java SpringBoot 实现的动态实时大屏范例-互联网企业数据分析

目录效果展示 1、首先看动态效果图 2、再看实时分片数据图一、需求确认 1、确定产品上线部署的屏幕LED分辨率 2、功能模块 3、部署方式二、整体架构设计三、开发环境搭建…

人工智能 2023年7月15日
00122
Fully Convolutional Networks for Semantic Segmentation ————全卷积网络 FCN论文解读

Fully Convolutional Networks for Semantic Segmentation 作者： Jonathan Long, Evan Shelhamer, …

人工智能 2023年7月14日
0075
什么是DNS劫持？如何进行有效应对？

今年6月份，美国政府以”违反制裁”为由关闭了伊朗30多个新闻媒体网站，将目标网站解析转移到了美国控制的IP上。此次事件中，美国”制裁&#8221…

人工智能 2023年6月28日
0067
10-QNX与Android双系统通讯之FDBUS(1)

前言 TODO 交叉编译(QNX与Android) TODO 2.1 下载资源 FDBUS下载V5.4版本： gitee： fdbus: Fast Distributed Bus …

人工智能 2023年6月27日
0081
VQA: Visual Question Answering 视觉问答

论文：Antol S, Agrawal A, Lu J, et al. Vqa: Visual question answering[C]//Proceedings of the …

人工智能 2023年6月24日
0083
Negative Margin Matters: Understanding Margin in Few-Shot Classification（理解小样本分类的边缘）

Abstract. 本文引入了一个基于度量学习的小样本学习的负边缘损失，负边缘损失显著优于softmax损失，在三个标准的小样本分类基准上实现了最好效果的accuracy。这些结果…

人工智能 2023年7月2日
0094
2021级研究生人工智能高级语言程序设计考试说明

考试纪律要求：所有同学参与腾讯会议进行线上考试，会议全程录像。要求：使用笔记本电脑摄像头或者手机摄像头，二选一；不开摄像头的同学不准许参加考试；全程视频要求不与身边的人交流，违反…

人工智能 2023年6月23日
00131
python实现Lasso回归分析（特征筛选、建模预测）

实现功能： python实现Lasso回归分析（特征筛选、建模预测）输入结构化数据，含有特征以及相应的标签，采用Lasso回归对特征进行分析筛选，并对数据进行建模预测。实现代码…

人工智能 2023年7月18日
0077
Python中eval()函数的使用

今天给大家分享一下Python中的eval()函数，如果感觉博主的文章还不错的话，希望大家点赞支持一下博主文章目录 eval()函数 * 语法实例 – 实例1 实例…

人工智能 2023年7月30日
0052
Python Pandas 库：Series、DataFrame 对象

Pandas Pandas 安装并使用安装使用 Series 对象 Series 对象创建用 list 创建用 dict 创建用 ndarray 创建用标量创建 Ser…

人工智能 2023年6月2日
00107
# 机器学习笔记(二)——手动编写神经网络前向反向传播

手动编写神经网络前向/反向传递前向传递 \qquad前向传递的原理非常容易，如下图所示。 a 1 ( 2 ) = g ( Θ 10 ( 1 ) x 0 + Θ 11 ( 1 ) …

人工智能 2023年7月14日
0063
【超详细】初学者包会的Vision Transformer（ViT）的PyTorch实现代码学习

本文是博主跟着b站up霹雳吧啦Wz的视频稿件学习的自我记录，图片均为该视频截图。代码来源timm库（PyTorchImageModels，简称timm）是一个巨大的PyTorch…

人工智能 2023年7月21日
0079

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

【金融】【随机森林】使用随机森林对期货数据（涨跌）进行回归

【金融】【随机森林】使用随机森林对期货数据（涨跌）进行回归

读取数据

划分训练集与数据集，3年+3月，以此类推

取特定数据

Exponential smoothing

Feature Extraction – Technical Indicators

Prepare the data with a prediction horizon of 10 days

分数据集依次训练

查看指标

大家都在看