# 使用贝叶斯优化进行深度神经网络超参数优化

[En]

In this paper, we will study hyperparametric optimization in depth.

[En]

Here is a brief introduction to the steps of preparing the dataset, because the main content of this article is the optimization of hyperparameters, so this part only briefly introduces the process. In general, the process is as follows:

• 加载数据。
• 分为训练集、验证集和测试集。
[En]

divided into training set, verification set and test set.*

• 将像素值从 0–255 标准化到 0–1 范围。
• One-hot 编码目标变量。
#load data(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()split into train, validation and test setstrain_x, val_x, train_y, val_y = train_test_split(train_images, train_labels, stratify=train_labels, random_state=48, test_size=0.05)(test_x, test_y)=(test_images, test_labels)normalize pixels to range 0-1train_x = train_x / 255.0val_x = val_x / 255.0test_x = test_x / 255.0#one-hot encode target variabletrain_y = to_categorical(train_y)val_y = to_categorical(val_y)test_y = to_categorical(test_y)

[En]

The shapes of all our training, verification and test sets are:

print(train_x.shape)  #(57000, 28, 28)
print(train_y.shape)  #(57000, 10)
print(val_x.shape)    #(3000, 28, 28)
print(val_y.shape)    #(3000, 10)
print(test_x.shape)   #(10000, 28, 28)
print(test_y.shape)   #(10000, 10)


pip install keras-tuner


Keras Tuner 需要 Python 3.6+ 和 TensorFlow 2.0+

[En]

Hyperparameter adjustment is the basic part of machine learning project. There are two types of hyperparameters:

• 结构超参数：定义模型的整体架构(例如，隐藏单元数、层数)
[En]

structure hyperparameters: define the overall architecture of the model (for example, the number of hidden units, the number of layers)*

• 优化器超级参数：影响训练速度和质量的参数(如学习率和优化器类型、批次大小、轮次数等)
[En]

Optimizer super parameters: parameters that affect the speed and quality of training (such as learning rate and optimizer type, batch size, number of rounds, etc.)*

[En]

Why do you need a hyperparameter tuning library? Can’t we try all the possible combinations and see what’s best on the validation set?

[En]

This certainly won’t work because deep neural networks take a lot of time to train, or even a few days. If you train large models on a cloud server, each experiment will cost a lot of money.

[En]

Therefore, a pruning strategy is needed to limit the hyperparametric search space.

keras-tuner提供了贝叶斯优化器。它搜索每个可能的组合，而是随机选择前几个。然后根据这些超参数的性能，选择下一个可能的最佳值。因此每个超参数的选择都取决于之前的尝试。根据历史记录选择下一组超参数并评估性能，直到找到最佳组合或到达最大试验次数。我们可以使用参数”max_trials”来配置它。

model_mlp = Sequential()
print(model_mlp.summary())


[En]

The tuning process requires two main approaches:

hp.Int()：设置超参数的范围，其值为整数 – 例如，密集层中隐藏单元的数量：

model.add(Dense(units = hp.Int('dense-bot', min_value=50, max_value=350, step=50))


hp_optimizer=hp.Choice('Optimizer', values=['Adam', 'SGD'])


• 隐藏层数：1-3
• 第一密集层大小：50–350
• 第二和第三密集层大小：50–350
• Dropout：0、0.1、0.2
• 学习率：0.1、0.01、0.001

model = Sequential()

model.add(Dense(units = hp.Int('dense-bot', min_value=50, max_value=350, step=50), input_shape=(784,), activation='relu'))

for i in range(hp.Int('num_dense_layers', 1, 2)):
model.add(Dense(units=hp.Int('dense_' + str(i), min_value=50, max_value=100, step=25), activation='relu'))

hp_learning_rate = hp.Choice('learning_rate', values=[1e-1, 1e-2, 1e-3])
elif hp_optimizer == 'SGD':
hp_learning_rate = hp.Choice('learning_rate', values=[1e-1, 1e-2, 1e-3])
nesterov=True
momentum=0.9


model.compile(optimizer = hp_optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

tuner_mlp = kt.tuners.BayesianOptimization(
model,
seed=random_seed,
objective='val_loss',
max_trials=30,
directory='.',
project_name='tuning-mlp')
tuner_mlp.search(train_x, train_y, epochs=50, batch_size=32, validation_data=(dev_x, dev_y), callbacks=callback)


[En]

This process used up the number of iterations and took about an hour to complete. We can also print the best superparameters of the model using the following command:

best_mlp_hyperparameters = tuner_mlp.get_best_hyperparameters(1)[0]
print("Best Hyper-parameters")
best_mlp_hyperparameters.values


[En]

Now we can retrain our model with optimal parameters:

model_mlp = Sequential()

for i in range(best_mlp_hyperparameters['num_dense_layers']):

model_mlp.compile(optimizer=best_mlp_hyperparameters['Optimizer'], loss='categorical_crossentropy',metrics=['accuracy'])
history_mlp= model_mlp.fit(train_x, train_y, epochs=100, batch_size=32, validation_data=(dev_x, dev_y), callbacks=callback)


[En]

Or, we can retrain our model with these parameters:

model_mlp=tuner_mlp.hypermodel.build(best_mlp_hyperparameters)

history_mlp=model_mlp.fit(train_x, train_y, epochs=100, batch_size=32,
validation_data=(dev_x, dev_y), callbacks=callback)


mlp_test_loss, mlp_test_acc = model_mlp.evaluate(test_x,  test_y, verbose=2)
print('\nTest accuracy:', mlp_test_acc)

Test accuracy: 0.8823


[En]

Compared with the model test accuracy of the baseline:

[En]

First, this is our baseline model:

model_cnn = Sequential()
model_cnn.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))


1. 卷积、MaxPooling 和 Dropout 层的”块”数
2. 每个块中 Conv 层的过滤器大小：32、64
3. 转换层上的有效或相同填充
4. 最后一个额外层的隐藏层大小：25-150，乘以 25
6. 学习率：0.01、0.001
model = Sequential()

model = Sequential()

for i in range(hp.Int('num_blocks', 1, 2)):
hp_filters=hp.Choice('filters_'+ str(i), values=[32, 64])

hp_units = hp.Int('units', min_value=25, max_value=150, step=25)

hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3])

hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3])
elif hp_optimizer == 'SGD':
hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3])
nesterov=True
momentum=0.9


[En]

As before, we let the network determine its depth. Set the maximum number of iterations to 100:

model.compile( optimizer=hp_optimizer,loss='categorical_crossentropy', metrics=['accuracy'])

tuner_cnn = kt.tuners.BayesianOptimization(
model,
objective='val_loss',
max_trials=100,
directory='.',
project_name='tuning-cnn')


model_cnn = Sequential()

for i in range(best_cnn_hyperparameters['num_blocks']):
hp_filters=best_cnn_hyperparameters['filters_'+ str(i)]

model_cnn.compile(optimizer=best_cnn_hyperparameters['Optimizer'],
loss='categorical_crossentropy',
metrics=['accuracy'])
print(model_cnn.summary())

history_cnn= model_cnn.fit(train_x, train_y, epochs=50, batch_size=32, validation_data=(dev_x, dev_y), callbacks=callback)


cnn_test_loss, cnn_test_acc = model_cnn.evaluate(test_x,  test_y, verbose=2)
print('\nTest accuracy:', cnn_test_acc)

Test accuracy: 0.92


1. 基线 CNN 模型：90.8 %
2. 最佳 CNN 模型：92%

[En]

We see the performance improvement of the optimization model!

[En]

In addition to accuracy, we can also see that the optimization effect is very good, because:

[En]

RandomSearch：随机选择其中的一些来避免探索超参数的整个搜索空间。但是，它不能保证会找到最佳超参数

Hyperband：选择一些超参数的随机组合，并仅使用它们来训练模型几个 epoch。然后使用这些超参数来训练模型，直到用尽所有 epoch 并从中选择最好的。

https://avoid.overfit.cn/post/c3f904fab4f84914b8a1935f8670582f

Original: https://blog.csdn.net/m0_46510245/article/details/125236556
Author: deephub
Title: 使用贝叶斯优化进行深度神经网络超参数优化

(0)

