LSTM入门

文章目录

内容主要来源自:
datamonday的深度学习专栏

个人安排和全神贯注,方便后续查找。

[En]

Personal arrangement and absorption to facilitate follow-up search.


import warnings
warnings.filterwarnings('ignore')

LSTM入门

LSTM原理与应用及训练方法

LSTM原理与应用及训练方法

不同类型的序列预测问题:

Sequence Prediction

  • 天气预报。根据一段时间内对天气的一系列观察,预测明天的天气。
    [En]

    Weather forecast. According to a series of observations on the weather over a period of time, predict the weather tomorrow.*

  • 股市预测。 给定安全性随时间推移的一系列运动,请预测安全性的下一个运动。
  • 产品推荐。 给定客户过去的购买顺序,请预测客户的下一次购买

Sequence Classification.

  • DNA序列分类。给定a、C、G和T值的DNA序列,预测该序列是编码区还是非编码区。
  • 异常检测。 给定一系列观察结果,请预测该序列是否异常。
  • 情感分析。 给定一系列文本(例如评论或推文),预测文本的情绪是正面还是负面

Sequence Generation.

  • 文字生成。 给定一个文本语料库(例如莎士比亚的作品),生成新的句子或文本段落,阅读它们可能是从该语料库中提取的。
  • 手写预测。 给定一个手写示例的语料库,为具有语料库中笔迹特性的新短语生成笔迹。
  • 音乐生成。 给定一个音乐实例集,生成具有该音乐集特性的新音乐作品。
  • 序列的生成也可以指给出单个观测作为输入的序列的生成。一个例子是图像的自动文本描述。
    [En]

    the generation of a sequence can also refer to the generation of a sequence given a single observation as input. An example is an automatic text description of an image.*

  • 图像标题生成。 给定图像作为输入,生成描述图像的单词序列

Sequence-to-Sequence Prediction.

  • 多步时间序列预测。 给定一个时间序列的观测值,可以预测一系列未来时间步长的观测值序列。
  • 文字摘要。 给定一个文本文件,该文件描述了源文件的重要部分。
  • 程序执行。 给定文本描述程序或数学方程式,可以预测描述正确输出的字符序列。

为LSTM准备数据

为LSTM准备数据


from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
scaler = scaler.fit(values)
normalized = scaler.transform(values)
inversed = scaler.inverse_transform(normalized)

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler = scaler.fit(values)
standardized = scaler.transform(values)
inversed = scaler.inverse_transform(standardized)

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
onehot_encoder = OneHotEncoder(sparse=False)

from keras.preprocessing.sequence import pad_sequences

Keras开发LSTM

Keras开发LSTM

模型定义

model = Sequential()
model.add(LSTM(5, input_shape=(2,1)))
model.add(Dense(1))
model.add(Activation( sigmoid ))

编译

编译需要指定许多参数,特别是那些为训练网络而定制的参数。具体地,使用优化算法来训练网络,并使用损失函数来评估优化算法最小化的网络。

[En]

Compilation requires many parameters to be specified, especially those customized to train the network. Specifically, the optimization algorithm is used to train the network, and the loss function is used to evaluate the network minimized by the optimization algorithm.


model.compile(optimizer= sgd , loss= mse )

algorithm = SGD(lr=0.1, momentum=0.3)
model.compile(optimizer=algorithm, loss= mse )

损失函数

  • Regression: Mean Squared Error or mean squared error, mse for short.

  • Binary Classification (2 class): Logarithmic Loss, also called cross entropy or binary crossentropy.

  • Multiclass Classification (> 2 class): Multiclass Logarithmic Loss or
    categorical crossentropy.

优化方法

  • Stochastic Gradient Descent, or sgd.

  • Adam, or adam.

  • RMSprop, or rmsprop

fit, evaluate and predict

history = model.fit(X, y, batch_size=10, epochs=100, verbose=0)

loss, accuracy = model.evaluate(X, y, verbose=0)

predictions = model.predict_classes(X)

predictions = model.predict(X, verbose=0)

LSTM State Management

待进一步研究……

Examples of Preparing Data

This section lists some final tips to help you when preparing your input data for LSTMs.

  • The LSTM input layer must be 3D.

  • The meaning of the 3 input dimensions are: samples, time steps and features.

  • The LSTM input layer is defined by the input shape argument on the first hidden layer.

  • The input shape argument takes a tuple of two values that define the number of time steps and features.

  • The number of samples is assumed to be 1 or more.

  • The reshape() function on NumPy arrays can be used to reshape your 1D or 2D data to be 3D.

  • The reshape() function takes a tuple as an argument that defines the new shape.

for example:

  • one sequence of multiple time steps and one feature: input_shape = (10,1)
  • multiple parallel series: input_shape = (10,2)

Keras实现4种序列预测模型

Keras实现4种序列预测模型

  • 单变量时间序列预测。 在这里,有一个具有多个输入时间步长的序列,并希望预测超出输入序列的 一个时间步长。 可以将其实现为 多对一模型。
  • 多元时间序列预测。 在这里,将拥有多个具有多个输入时间步长的序列,并希望预测一个或多个输入序列之外的 一个时间步长。 可以将其实现为 多对一模型。 每个系列只是另一个输入功能。
  • 多步时间序列预测:您可以有多个输入时间步长的多个序列,您希望预测多个时间步长,而不是一个或多个输入序列。它可以实现为多对多模型。
    [En]

    Multi-step time series prediction: you can have multiple series with multiple input time steps, and you want to predict * multiple time steps * other than one or more input sequences. It can be implemented as a many-to-many model.*

  • 时间序列分类。 在这里,可以有一个或多个序列,其中有多个输入时间步长作为输入,并希望输出分类标签。 可以将其实现为 多对一模型。

Natural Language Processing

  • 图片字幕。 这是有一张图片并希望生成文字描述的地方。 可以将其实现为一对多模型。
  • 视频描述。 在这里,可以在视频中生成一系列图像,并希望生成文字说明。 可以使用多对多模型来实现。
  • 情绪分析。 在这里可以输入文本序列,并希望生成分类标签。 可以将其实现为多对一模型。
  • 语音识别。 在这里,可以输入一系列音频数据,并希望生成有关所讲内容的文本描述。 这可以通过许多模式来实现。
  • 文字翻译。 在这里,可以使用一种语言的一系列单词作为输入,并希望生成另一种语言的一系列单词。 可以使用多对多模型来实现。
  • 文字摘要。 在这里,有一个文本文档作为输入,并希望创建该文档的简短文本摘要作为输出。 可以使用多对多模型来实现。

One-to-One Model

time_step=1feature_dim=5hidenfeatrue=3model=Sequential()model.add(LSTM(hidenfeatrue,input_shape=(time_step,feature_dim)))model.add(Dense(1))

One-to-Many Model

model = Sequential()model.add(Conv2D(...))...model.add(LSTM(...))model.add(TimeDistributed(Dense(1)))

TimeDistributed后续研究……

Many-to-One Model


time_step=1
feature_dim=5
hidenfeatrue=3

model=Sequential()
model.add(LSTM(hidenfeatrue,input_shape=(time_step,feature_dim)))
model.add(Dense(1))

Many-to-Many Model


model = Sequential()
model.add(LSTM(..., input_shape=(steps, ...), return_sequences=True))
model.add(TimeDistributed(Dense(1)))

model = Sequential()
model.add(LSTM(..., input_shape=(in_steps, ...)))
model.add(RepeatVector(out_steps))
model.add(LSTM(..., return_sequences=True))
model.add(TimeDistributed(Dense(1)))

Keras实现VanillaLSTM/StackedLSTM

Keras实现VanillaLSTM/StackedLSTM

VanillaLSTM

LSTM入门
from random import randint
from numpy import array
from numpy import argmax
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense

def generate_sequence(length, n_features):
    return [randint(0, n_features-1) for _ in range(length)]

def one_hot_encode(sequence, n_features):
    encoding = list()
    for value in sequence:
        vector = [0 for _ in range(n_features)]
        vector[value] = 1
        encoding.append(vector)
    return array(encoding)

def one_hot_decode(encoded_seq):
    return [argmax(vector) for vector in encoded_seq]

def generate_example(length, n_features, out_index):

    sequence = generate_sequence(length, n_features)

    encoded = one_hot_encode(sequence, n_features)

    X = encoded.reshape((1, length, n_features))

    y = encoded[out_index].reshape(1, n_features)
    return X, y

length = 5
n_features = 10
out_index = 2
model = Sequential()
model.add(LSTM(25, input_shape=(length, n_features)))
model.add(Dense(n_features, activation= 'softmax' ))
model.compile(loss= 'categorical_crossentropy' , optimizer= 'adam' , metrics=[ 'acc' ])
print(model.summary())

for i in range(1000):
    X, y = generate_example(length, n_features, out_index)
    model.fit(X, y, epochs=1, verbose=0)

correct = 0
for i in range(100):
    X, y = generate_example(length, n_features, out_index)
    yhat = model.predict(X)
    if one_hot_decode(yhat) == one_hot_decode(y):
        correct += 1
print( 'Accuracy: %f' % ((correct/100)*100.0))

X, y = generate_example(length, n_features, out_index)
yhat = model.predict(X)
print( 'Sequence: %s' % [one_hot_decode(x) for x in X])
print( 'Expected: %s' % one_hot_decode(y))
print( 'Predicted: %s' % one_hot_decode(yhat))

Model: "sequential_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 lstm_14 (LSTM)              (None, 25)                3600

 dense_10 (Dense)            (None, 10)                260

=================================================================
Total params: 3,860
Trainable params: 3,860
Non-trainable params: 0
_________________________________________________________________
None
WARNING:tensorflow:5 out of the last 105 calls to <function model.make_predict_function.<locals>.predict_function at 0x000001D169DAF670> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

Accuracy: 29.000000
Sequence: [[8, 3, 3, 4, 7]]
Expected: [3]
Predicted: [8]
</function>

StackedLSTM

LSTM入门
from math import sin
from math import pi
from math import exp
from random import random
from random import randint
from random import uniform
from numpy import array
from matplotlib import pyplot
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense

def generate_sequence(length, period, decay):
    return [0.5 + 0.5 * sin(2 * pi * i / period) * exp(-decay * i) for i in range(length)]

def generate_examples(length, n_patterns, output):
    X, y = list(), list()
    for _ in range(n_patterns):
        p = randint(10, 20)
        d = uniform(0.01, 0.1)
        sequence = generate_sequence(length + output, p, d)
        X.append(sequence[:-output])
        y.append(sequence[-output:])
    X = array(X).reshape(n_patterns, length, 1)
    y = array(y).reshape(n_patterns, output)
    return X, y

length = 50
output = 5

model = Sequential()
model.add(LSTM(20, return_sequences=True, input_shape=(length, 1)))
model.add(LSTM(20))
model.add(Dense(output))
model.compile(loss= 'mae' , optimizer= 'adam' )
print(model.summary())

X, y = generate_examples(length, 10000, output)
history = model.fit(X, y, batch_size=10, epochs=1)

X, y = generate_examples(length, 1000, output)
loss = model.evaluate(X, y, verbose=0)
print( 'MAE: %f' % loss)

X, y = generate_examples(length, 1, output)
yhat = model.predict(X, verbose=0)
pyplot.plot(y[0], label= 'y' )
pyplot.plot(yhat[0], label= 'yhat' )
pyplot.legend()
pyplot.show()

Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 lstm_12 (LSTM)              (None, 50, 20)            1760

 lstm_13 (LSTM)              (None, 20)                3280

 dense_9 (Dense)             (None, 5)                 105

=================================================================
Total params: 5,145
Trainable params: 5,145
Non-trainable params: 0
_________________________________________________________________
None
1000/1000 [==============================] - 14s 12ms/step - loss: 0.0511
MAE: 0.026396

LSTM入门

Keras开发CNN LSTM

LSTM入门

Keras开发CNN LSTM

CNN-LSTM体系结构包括使用卷积神经网络(CNN)层对输入数据进行特征提取,并结合LSTM来支持序列预测。CNN-LSTMs是为视觉时间序列预测问题和从图像序列(如视频)生成文本描述的应用而开发的。具体而言,适用于以下问题:

  • 活动识别(Activity Recognition):生成一系列图像中显示的活动的文本描述。
  • 图像描述(Image Description):生成单个图像的文本描述。
  • 视频描述(Video Description):生成图像序列的文本描述。

[CNN LSTMs]是一类在空间和时间上都很深的模型,它具有灵活性,可以应用于包括顺序输入和输出的各种视觉任务。——Long-term Recurrent Convolutional Networks for Visual Recognition and Description, 2015.

这种体系结构也被用于语音识别和自然语言处理问题,其中CNNs被用作音频和文本输入数据的LSTMs的特征提取器。此体系结构适用于以下问题:

  • 在输入中具有空间结构(spatial structure),例如图像中的二维结构或像素,或句子、段落或文档中单词的ID结构。
  • 在其输入中具有时间结构(temporal structure),例如视频中的图像顺序或文本中的单词,或者需要生成具有时间结构的输出,例如文本描述中的单词。

CNN LSTM Model

我们可以在Keras中定义一个CNN LSTM模型,首先定义一个或多个CNN层,将它们包装在TimeDistributed层中,然后定义LSTM和输出层。我们有两种定义模型的方法,它们是等价的,只是在口味上有所不同。您可以先定义CNN模型,然后将其添加到LSTM模型中,方法是将整个CNN层序列包装在TimeDistributed层中,如下所示:


cnn = Sequential()
cnn.add(Conv2D(...))
cnn.add(MaxPooling2D(...))
cnn.add(Flatten())

model = Sequential()
model.add(TimeDistributed(cnn,...))
model.add(LSTM(..))
model.add(Dense(...))

另一种方法是将CNN模型中的每一层封装在TimeDistributed层中,并将其添加到主模型中,这种方法可能更易于阅读。

model = Sequential()
model.add(TimeDistributed(Conv2D(...))
model.add(TimeDistributed(MaxPooling2D(...)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(...))
model.add(Dense(...))

Keras开发Encoder-Decoder LSTM

LSTM入门

Keras开发Encoder-Decoder LSTM

序列预测通常包括预测实值序列或输出输入序列的类别标签中的下一个值。这通常被描述为输入时间步长到输出时间步长的序列预测问题(例如,一对一)或多个输入时间步长到输出时间步长的序列预测问题(多对一)。

[En]

Sequence prediction usually includes predicting the next value in a real-valued sequence or a class label that outputs an input sequence. This is usually framed as a sequence prediction problem of an input time step to an output time step (for example, one-to-one) or multiple input time steps to an output time step (many-to-one).

有一种更具挑战性的序列预测问题,它以 一个序列作为输入,要求 一个序列预测作为输出。这些问题称为序列到序列预测问题,简称 seq2seq。使这些问题具有挑战性的一个建模问题是,输入和输出序列的长度可能不同。由于存在多个输入时间步和多个输出时间步,这种形式的问题被称为 多对多类型序列预测问题。

应用场景:

  • 机器翻译,例如短语的英法翻译。
    [En]

    Machine translation, such as English-French translation of phrases.*

  • 学会表演,比如算小程序的成绩。
    [En]

    learn to perform, such as calculating the results of Mini Program.*

  • 图像字幕,如生成图像的文字说明。
    [En]

    Image subtitles, such as a text description of the generated image.*

  • 会话建模,例如生成文本问题的答案。
    [En]

    session modeling, such as generating answers to text questions.*

  • 操作类别,例如从一系列手势生成一系列命令。
    [En]

    Action categories, such as generating a series of commands from a series of gestures.*

; Addition Prediction Problem

这个问题被定义为计算两个输入数的输出和。这很有挑战性,因为每个数字和数学符号都是以字符的形式提供的,而预期的输出是以字符的形式提供的。例如,输入10到6并输出16。

[En]

This problem is defined as calculating the output sum of two input numbers. This is challenging because each number and mathematical symbol is provided as a character, and the expected output is provided as a character. For example, enter 10 to 6 and output 16.


from random import seed
from random import randint
from numpy import array
from math import ceil
from math import log10
from math import sqrt
from numpy import argmax
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import TimeDistributed
from keras.layers import RepeatVector

def random_sum_pairs(n_examples, n_numbers, largest):
    X, y = list(), list()
    for i in range(n_examples):
        in_pattern = [randint(1,largest) for _ in range(n_numbers)]
        out_pattern = sum(in_pattern)
        X.append(in_pattern)
        y.append(out_pattern)
    return X, y

def to_string(X, y, n_numbers, largest):
    max_length = n_numbers * ceil(log10(largest+1)) + n_numbers - 1
    Xstr = list()
    for pattern in X:
        strp = '+'.join([str(n) for n in pattern])
        strp = ''.join([' ' for _ in range(max_length-len(strp))]) + strp
        Xstr.append(strp)
    max_length = ceil(log10(n_numbers * (largest+1)))
    ystr = list()
    for pattern in y:
        strp = str(pattern)
        strp = ''.join([' ' for _ in range(max_length-len(strp))]) + strp
        ystr.append(strp)
    return Xstr, ystr

def integer_encode(X, y, alphabet):
    char_to_int = dict((c, i) for i, c in enumerate(alphabet))
    Xenc = list()
    for pattern in X:
        integer_encoded = [char_to_int[char] for char in pattern]
        Xenc.append(integer_encoded)
    yenc = list()
    for pattern in y:
        integer_encoded = [char_to_int[char] for char in pattern]
        yenc.append(integer_encoded)
    return Xenc, yenc

def one_hot_encode(X, y, max_int):
    Xenc = list()
    for seq in X:
        pattern = list()
        for index in seq:
            vector = [0 for _ in range(max_int)]
            vector[index] = 1
            pattern.append(vector)
        Xenc.append(pattern)
    yenc = list()
    for seq in y:
        pattern = list()
        for index in seq:
            vector = [0 for _ in range(max_int)]
            vector[index] = 1
            pattern.append(vector)
        yenc.append(pattern)
    return Xenc, yenc

def generate_data(n_samples, n_numbers, largest, alphabet):

    X, y = random_sum_pairs(n_samples, n_numbers, largest)

    X, y = to_string(X, y, n_numbers, largest)

    X, y = integer_encode(X, y, alphabet)

    X, y = one_hot_encode(X, y, len(alphabet))

    X, y = array(X), array(y)
    return X, y

def invert(seq, alphabet):
    int_to_char = dict((i, c) for i, c in enumerate(alphabet))
    strings = list()
    for pattern in seq:
        string = int_to_char[argmax(pattern)]
        strings.append(string)
    return ''.join(strings)

n_terms = 3

largest = 10

alphabet = [str(x) for x in range(10)] + ['+', ' ']

n_chars = len(alphabet)

n_in_seq_length = n_terms * ceil(log10(largest+1)) + n_terms - 1

n_out_seq_length = ceil(log10(n_terms * (largest+1)))

model = Sequential()
model.add(LSTM(75, input_shape=(n_in_seq_length, n_chars)))
model.add(RepeatVector(n_out_seq_length))
model.add(LSTM(50, return_sequences=True))
model.add(TimeDistributed(Dense(n_chars, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

from keras.utils.vis_utils import plot_model
plot_model(model,to_file='model-seq2seq.png',show_shapes=True,dpi=300)

X, y = generate_data(75000, n_terms, largest, alphabet)
model.fit(X, y, epochs=1, batch_size=32)

X, y = generate_data(100, n_terms, largest, alphabet)
loss, acc = model.evaluate(X, y, verbose=0)
print(' Loss: %f, Accuracy: %f' % (loss, acc*100))

for _ in range(10):

    X, y = generate_data(1, n_terms, largest, alphabet)

    yhat = model.predict(X, verbose=0)

    in_seq = invert(X[0], alphabet)
    out_seq = invert(y[0], alphabet)
    predicted = invert(yhat[0], alphabet)
    print(' %s = %s (expect %s)' % (in_seq, predicted, out_seq))

Model: "sequential_16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 lstm_23 (LSTM)              (None, 75)                26400

 repeat_vector_4 (RepeatVect  (None, 2, 75)            0
 or)

 lstm_24 (LSTM)              (None, 2, 50)             25200

 time_distributed_4 (TimeDis  (None, 2, 12)            612
 tributed)

=================================================================
Total params: 52,212
Trainable params: 52,212
Non-trainable params: 0
_________________________________________________________________
None
('You must install pydot (pip install pydot) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) ', 'for plot_model/model_to_dot to work.')
2344/2344 [==============================] - 16s 6ms/step - loss: 0.6746 - accuracy: 0.8008
 Loss: 0.132323, Accuracy: 99.000001
    9+8+5 = 22 (expect 22)
    4+2+1 =  8 (expect  7)
   10+6+5 = 21 (expect 21)
   10+5+5 = 20 (expect 20)
   10+4+5 = 19 (expect 19)
    5+3+6 = 14 (expect 14)
   10+6+8 = 24 (expect 24)
    7+3+5 = 15 (expect 15)
   3+10+2 = 15 (expect 15)
   6+10+4 = 20 (expect 20)

超详细LSTM调参指南

超详细LSTM调参指南

评估随机模型的技巧

随机模型,如深度神经网络,添加随机操作(如随机初始化权重和随机梯度下降)。这种额外的随机性使模型在学习中更加灵活,但也使模型不稳定(例如,在相同的数据上训练相同的模型会产生不同的结果)。这与模型方差不同,模型方差在不同数据上训练同一模型时给出不同的结果。

[En]

Stochastic models, such as deep neural networks, add random operations (such as random initialization weights and random gradient descent). This extra randomness makes the model more flexible in learning, but makes the model unstable (for example, training the same model on the same data produces different results). This is different from the model variance which gives different results when training the same model on different data.

为了获得可靠的(稳健的)模型性能估计,必须考虑和控制这种额外的方差来源。一种可靠的方法是反复评估随机模型试验。您可以参考下面的伪代码:

[En]

In order to obtain a reliable (robust) model performance estimate, this additional source of variance must be considered and controlled. * A reliable method is to repeatedly evaluate the random model experiment. * you can refer to the following pseudo code:

scores = list()
for i in repeats:
    train, test = random_split(data)
    model = fit(train.X, train.y)
    predictions = model.predict(test.X)
    skill = compare(test.y, predictions)
    scores.append(skill)
final_skill = mean(scores)

诊断欠拟合和过拟合

通过训练集与验证集的方差与偏差来判断

...

history = model.fit(trainX, trainy,
                      epochs=epochs,
                      batch_size=batch_size,
                      verbose=verbose,
                      validation_data=(testX, testy),
                      callbacks=[summary])
print("history.history:{}".format(history.history))

history.history:{'loss': [0.6198216109203176, 0.22001621517898987, 0.14948655201184996, 0.12273854326955383, 0.12327274605550756],
        'accuracy': [0.74428725, 0.91920567, 0.9428727, 0.953346, 0.95048964],
        'val_loss': [0.5575803409279667, 0.4091062663836594, 0.39247380317769337, 0.3639399050404692, 0.3881000212997623],
        'val_accuracy': [0.8187988, 0.8649474, 0.89650494, 0.8975229, 0.8982016]}

通过在fit()中设置validation_split参数来完成,该参数使用部分训练数据作为验证数据集

...

history = model.fit(X, Y, epochs=100, validation_split=0.33)

import matplotlib.pyplot as plt
...

history = model.fit(X, Y, epochs=100, validation_data=(valX, valY))
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model train vs validation loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train','validation'], loc='upper right')
plt.show()

寻找合适隐藏神经元数量与epochs

  • 不足:这可以从训练损失低于验证损失的图表中诊断出来,并且验证损失有趋势,表明有可能进一步改善。
    [En]

    underfitting: this can be diagnosed from the chart where the training loss is lower than the verification loss, and the verification loss has a trend, indicating that it is possible to further improve.*

  • 过拟合: 是指训练集的性能良好并持续改善,而验证集的性能提高到一定程度后开始下降的模型。
from keras.models import Sequentialfrom keras.layers import Densefrom keras.layers import LSTMimport matplotlib.pyplot as pltfrom numpy import arraydef get_train():    seq = [[0.0, 0.1], [0.1, 0.2], [0.2, 0.3], [0.3, 0.4], [0.4, 0.5]]    seq = array(seq)    X, y = seq[:, 0], seq[:, 1]    X = X.reshape((5, 1, 1))    return X, ydef get_val():    seq = [[0.5, 0.6], [0.6, 0.7], [0.7, 0.8], [0.8, 0.9], [0.9, 1.0]]    seq = array(seq)    X, y = seq[:, 0], seq[:, 1]    X = X.reshape((len(X), 1, 1))    return X, ymodel = Sequential()model.add(LSTM(10, input_shape=(1,1)))model.add(Dense(1, activation='linear'))model.compile(loss='mse', optimizer='adam')X,y = get_train()valX, valY = get_val()history = model.fit(X, y, epochs=800, verbose=0, validation_data=(valX, valY), shuffle=False)plt.figure(figsize=(6,5),dpi=100)plt.plot(history.history['loss'])plt.plot(history.history['val_loss'])plt.title('model train vs validation loss')plt.ylabel('loss')plt.xlabel('epoch')plt.legend(['train','validation'], loc='upper right')plt.show()

LSTM入门

多次重复运行判断模型随机初始化的影响

LSTM是随机的,这意味着每次运行都会得到不同的诊断图。可以多次重复诊断运行(例如5、10或30)。然后,可以绘制每次运行的训练和验证跟踪,以便对模型随时间变化的行为提供更可靠的概念。在绘制每次运行的列车轨迹和验证损失之前,下面的示例多次运行同一个实验。

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import matplotlib.pyplot as plt
from numpy import array

def get_train():
    seq = [[0.0, 0.1], [0.1, 0.2], [0.2, 0.3], [0.3, 0.4], [0.4, 0.5]]
    seq = array(seq)
    X, y = seq[:, 0], seq[:, 1]
    X = X.reshape((5, 1, 1))
    return X, y

def get_val():
    seq = [[0.5, 0.6], [0.6, 0.7], [0.7, 0.8], [0.8, 0.9], [0.9, 1.0]]
    seq = array(seq)
    X, y = seq[:, 0], seq[:, 1]
    X = X.reshape((len(X), 1, 1))
    return X, y

X,y = get_train()
valX, valY = get_val()
plt.figure(figsize=(6,5),dpi=100)
for i in range(5):

    model = Sequential()
    model.add(LSTM(10, input_shape=(1,1)))
    model.add(Dense(1, activation='linear'))

    model.compile(loss='mse', optimizer='adam')
    history = model.fit(X, y, epochs=300, verbose=0, validation_data=(valX, valY), shuffle=False)

    plt.plot(history.history['loss'],color='blue')
    plt.plot(history.history['val_loss'],color='orange')
    plt.title('model train vs validation loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train','validation'], loc='upper right')
plt.show()

LSTM入门

测试不同的隐藏神经元

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

def get_train():
    seq = [[0.0, 0.1], [0.1, 0.2], [0.2, 0.3], [0.3, 0.4], [0.4, 0.5]]
    seq = np.array(seq)
    X, y = seq[:, 0], seq[:, 1]
    X = X.reshape((5, 1, 1))
    return X, y

def get_val():
    seq = [[0.5, 0.6], [0.6, 0.7], [0.7, 0.8], [0.8, 0.9], [0.9, 1.0]]
    seq = np.array(seq)
    X, y = seq[:, 0], seq[:, 1]
    X = X.reshape((len(X), 1, 1))
    return X, y

def fit_model(n_cells):

    model = Sequential()
    model.add(LSTM(n_cells, input_shape=(1,1)))
    model.add(Dense(1, activation='linear'))

    model.compile(loss='mse', optimizer='adam')

    X,y = get_train()
    history = model.fit(X, y, epochs=500, shuffle=False, verbose=0)

    valX, valY = get_val()
    loss = model.evaluate(valX, valY, verbose=0)
    return loss

params = [1, 5, 10]
n_repeats = 5

scores = pd.DataFrame()
for value in params:

    loss_values = list()
    for i in range(n_repeats):
        loss = fit_model(value)
        loss_values.append(loss)
        print('>%d/%d param=%f, loss=%f'% (i+1, n_repeats, value, loss))

    scores[str(value)] = loss_values

print(scores.describe())

fig = plt.figure(figsize=(6,5),dpi=100)
scores.boxplot(ax = plt.gca())
plt.show()

>1/5 param=1.000000, loss=0.182866
>2/5 param=1.000000, loss=0.180858
>3/5 param=1.000000, loss=0.154367
>4/5 param=1.000000, loss=0.120814
>5/5 param=1.000000, loss=0.237209
>1/5 param=5.000000, loss=0.088532
>2/5 param=5.000000, loss=0.050130
>3/5 param=5.000000, loss=0.032009
>4/5 param=5.000000, loss=0.153709
>5/5 param=5.000000, loss=0.147693
>1/5 param=10.000000, loss=0.045294
>2/5 param=10.000000, loss=0.030856
>3/5 param=10.000000, loss=0.057810
>4/5 param=10.000000, loss=0.028818
>5/5 param=10.000000, loss=0.057722
              1         5        10
count  5.000000  5.000000  5.000000
mean   0.175223  0.094415  0.044100
std    0.042801  0.055328  0.013999
min    0.120814  0.032009  0.028818
25%    0.154367  0.050130  0.030856
50%    0.180858  0.088532  0.045294
75%    0.182866  0.147693  0.057722
max    0.237209  0.153709  0.057810

LSTM入门

调试方法

数值缩放

  • Normalize values.(归一化)
  • Standardize values.(标准化)

编码

  • Real-value encoding.

  • Integer encoding.

  • One hot encoding

平稳性

在处理实值序列(如时间序列)时,请考虑保持序列稳定。

[En]

When dealing with a real-valued sequence, such as a time series, consider keeping the sequence stable.

  • 移除趋势(Remove Trends):如果序列包含均值的方差(例如趋势),则可以使用差异。
  • 移除季节性(Remove Seasonality):如果序列包含周期性周期(例如季节性),则可以使用季节性调整。-
  • 移除方差(Remove Variance):如果序列包含递增或递减方差,则可以使用对数或Box-Cox变换

也就是模型的输入特征经过一定的构造处理比较好,可以较低模型的复杂度

输入序列长度

输入序列长度的选择取决于所要解决的问题,并评估了使用不同输入序列长度对模型技能的影响。当权值更新时,输入序列的长度也会通过时间的反向传播影响误差梯度的估计。它会影响模型学习的速度和内容。

[En]

The choice of input sequence length is determined by the problem to be solved, and the impact of using different input sequence length on model skills is evaluated. When the weight is updated, the length of the input sequence will also affect the estimation of the error gradient through the back propagation of time. It can affect the speed and content of model learning.

序列模型类型

  • One-to-one
  • One-to-many
  • Many-to-one
  • Many-to-many
    Keras 都支持以上序列模型。使用每个序列模型类型为问题设置框架,并评估模型技能,以帮助为需要解决的问题选择框架。

记忆单元(隐藏单元)

  • Try grid searching the numb er of memory cells by 100s, 10s, or finer.

  • Try using numbers of cells quoted in research papers.

  • Try randomly searching the number of cells between 1 and 1000.

隐藏层 (堆叠次数)

与存储单元的数量一样,对于给定的序列预测问题或LSTM体系结构,我们无法知道LSTM隐藏层的最佳数量。当有很多数据的时候,深度越深往往更好。

  • 试试网格搜索的层数和记忆单元。
    [En]

    try the number of layers and memory units of the grid search.*

  • 尝试使用在研究论文中引用的堆叠LSTM层的模式。
  • 尝试随机搜索层数和存储单元数。
    [En]

    try to randomly search for the number of layers and memory cells.*

权重初始化 (Weight Initialization)

  • random uniform
  • random normal
  • glorot uniform
  • glorot normal

激活函数(Activation Functions)

序列预测问题的分类或回归性质决定了在输出层中使用的激活函数的类型

  • sigmoid
  • tanh
  • relu

优化算法(Optimization Algorithm)

  • Adam
  • RMSprop
  • Adagrad

学习率(Learning Rate)

  • 网格搜索学习率值(例如0.1%、0.001、0.0001)。
    [En]

    Grid search learning rate values (e.g. 0.1, 0.001, 0.0001).*

  • 尝试学习速度随时代数而衰减(例如通过callback)。
  • 尝试用越来越低学习率的训练来更新拟合模型。
    [En]

    try to update the fitting model with training with lower and lower learning rates.*

批次大小(Batch Size)

  • 尝试设计随机梯度下降(SGD)的批量大小为1。
  • 批量大小n,其中n是批量梯度下降的样本数。
  • 使用网格搜索,尝试将batch_size从2更新到256。
  • 常用的大小为 32

正则化(Regularization)

  • dropout: dropout applied on input connections.

  • recurrent_dropout: dropout applied to recurrent connections.

model.add(LSTM(..., dropout=0.4))

LSTMs还支持其他形式的正则化,例如权重正则化减小网络权重的大小。同样,可以在LSTM层设置这些参数:

  • bias_regular izer: regularization on the bias weights.

  • kernel_regularizer: regularization on the input weights.

  • recurrent_regularizer: regularization on the recurrent weights.

model.add(LSTM(..., kernel_regularizer=L1L2(0.01, 0.01)))

EarlyStopping

培训阶段的数量调整可能非常耗时。另一种方式是配置大量的培训周期。然后设置检查点以检查模型在训练和验证数据集方面的性能,如果模型似乎开始过度学习,则停止训练。因此,适时停止是一种抑制过拟合的正则化方法。

[En]

The quantity adjustment during the training phase can be very time-consuming. Another way is to configure a large number of training periods. Then set the checkpoint to check the performance of the model on training and validating the dataset, and stop training if it appears that the model is starting to overlearn. Therefore, timely stopping is a regularization method to restrain over-fitting.

keras.callbacks.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=0, verbose=0,
                                        mode='auto', baseline=None, restore_best_weights=False)

参数:

  • monitor:要监视的变量。
  • min_delta:监视变量中符合改进的最小变化,即小于min_delta的绝对变化,将不视为任何改进。
  • patience:产生受监控变量但没有改善的时期数,之后将停止训练。如果验证频率(model.fit(validation_freq=5))大于1 ,则可能不会为每个时期产生验证变量。
  • verbose:详细模式。
  • model:{自动,最小,最大}之一。在min模式下,当监视的变量停止减少时,训练将停止;在max 模式下,当监视的变量停止增加时,它将停止;在auto 模式下,将根据监视变量的名称自动推断出方向。
  • baseline:要达到的监视变量的基线值。如果模型没有显示出超过基线的改善,培训将停止。
  • restore_best_weights:是否从时期以受监视变量的最佳值恢复模型权重。如果为False,则使用在训练的最后一步获得的模型权重。
from keras.callbacks import EarlyStopping
es = EarlyStopping(monitor= 'val_loss', min_delta=100)
model.fit(..., callbacks=[es])

Original: https://blog.csdn.net/weixin_42432468/article/details/121659970
Author: Hayden112
Title: LSTM入门

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/511874/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球