keras篇(1)–model.fit()的输入数据

人生苦短,我用keras!!!

大家都知道keras极大的简化了神经网络的搭建,但是大家知道如何输入数据吗,数据大时,直接numpy矩阵输入使内存爆满?有试过生成器吗?有试过tf.data吗?想知道这几着的差距和优劣势吗?往下看吧!!!

一、简介

我们先来看看keras官方的代码中对model.fit()中的数据的输入的描述:

x: Input data. It could be:
 - A Numpy array (or array-like), or a list of arrays
  (in case the model has multiple inputs).

 - A TensorFlow tensor, or a list of tensors
  (in case the model has multiple inputs).

 - A dict mapping input names to the corresponding array/tensors,
  if the model has named inputs.

 - A tf.data dataset. Should return a tuple
  of either (inputs, targets) or
  (inputs, targets, sample_weights).

 - A generator or keras.utils.Sequence returning (inputs, targets)
  or (inputs, targets, sample_weights).

A more detailed description of unpacking behavior for iterator types
(Dataset, generator, Sequence) is given below.

从描述中我们可以看到输入的方式有:

  • Numpy array
  • List Numpy Array
  • Tensors
  • Dict
  • tf.data
  • generator
  • keras.utils.Sequeue

二、Numpy Array

从标题就可以看到,model.fit()的输入为numpy矩阵

适合数据量比较小的时候,直接把全部数据读入内存

import keras
from keras.layers import *
import numpy as np

(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()
print("x_train.shape:", x_train.shape)
print("y_test.shape:", x_test.shape)
print(y_train.shape)

x_train = x_train / 255.0
x_test = x_test / 255.0

model = keras.Sequential(
    [
        Flatten(input_shape=(28, 28)),
        Dense(units=256, activation='relu'),
        Dense(units=128, activation='relu'),
        Dense(units=10, activation='softmax')
    ]
)

model.summary()

optimizer = keras.optimizers.Adam(lr=0.001)
model.compile(
    optimizer=optimizer,
    metrics=['accuracy'],
    loss=keras.losses.sparse_categorical_crossentropy)

model.fit(x_train, y_train, batch_size=32, epochs=1, verbose=1)

三、Tensors

使用tf.convert_to_tensor()函数将Numpy矩阵变成Tensor张量,在Tensorflow框架中Numpy和Tensor的转换很容易

import keras
from keras.layers import *
import numpy as np
import tensorflow as tf

(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()

x_train = x_train / 255.0
x_test = x_test / 255.0

x_train = tf.convert_to_tensor(x_train)
x_test = tf.convert_to_tensor(x_test)

model = keras.Sequential(
    [
        Flatten(input_shape=(28, 28)),
        Dense(units=256, activation='relu'),
        Dense(units=128, activation='relu'),
        Dense(units=10, activation='softmax')
    ]
)

model.summary()

optimizer = keras.optimizers.Adam(lr=0.001)
model.compile(
    optimizer=optimizer,
    metrics=['accuracy'],
    loss=keras.losses.sparse_categorical_crossentropy)

model.fit(x_train, y_train, batch_size=32, epochs=5, verbose=1)

四、tf.data

使用tf.data.Dataset.from_tensor_slices((x, y))

可以使用的方法:

  • shuffle —- 括号内是打乱数据的大小
  • batch —– 括号内是一次喂入的batch-size
  • repeat —–括号内是数据池中需要重复的次数
import keras
from keras.layers import *
import numpy as np
import tensorflow as tf

(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()

x_train = x_train / 255.0
x_test = x_test / 255.0

datasets_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
datasets_train = datasets_train.shuffle(x_train.shape[0]).batch(32).repeat(2)

model = keras.Sequential(
    [
        Flatten(input_shape=(28, 28)),
        Dense(units=256, activation='relu'),
        Dense(units=128, activation='relu'),
        Dense(units=10, activation='softmax')
    ]
)

model.summary()

optimizer = keras.optimizers.Adam(lr=0.001)
model.compile(
    optimizer=optimizer,
    metrics=['accuracy'],
    loss=keras.losses.sparse_categorical_crossentropy)

model.fit(datasets_train, batch_size=32, epochs=5, verbose=1)

五、Dict

这里如果要使用dict输入数据,就不能使用Sequence构造神经网络结构,而是需要使用keras的高级API,keras.layers.Input()来创建输入,然后在输入和输出中用name的键值构造dict的key

import keras
from keras.layers import *
import numpy as np
import tensorflow as tf

(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()

x_train = x_train / 255.0
x_test = x_test / 255.0

inputs = Input(shape=(28,28), name='inputs')
temp = Flatten()(inputs)
temp = Dense(256, activation='relu')(temp)
temp = Dense(128, activation='relu')(temp)

outputs = Dense(10, activation='softmax', name='outputs')(temp)
model = keras.Model(inputs, outputs)

model.summary()

optimizer = keras.optimizers.Adam(lr=0.001)
model.compile(
    optimizer=optimizer,
    metrics=['accuracy'],
    loss=keras.losses.sparse_categorical_crossentropy)

model.fit({"inputs":x_train}, {"outputs":y_train}, batch_size=32, epochs=5, verbose=1)

六、List Numpy Array

在多输入中输入的模型中,model的构建采用list,model.fit()中采用dict

num_tags = 12
num_words = 10000
num_departments = 4

title_data = np.random.randint(num_words, size=(1280, 10))
body_data = np.random.randint(num_words, size=(1280, 100))
tags_data = np.random.randint(2, size=(1280, num_tags)).astype('float32')

priority_target = np.random.random(size=(1280,1))
department_target = np.random.randint(2, size=(1280, num_departments))

title_input = keras.Input(shape=(None,), name='title')
body_input = keras.Input(shape=(None,), name='body')
tags_input = keras.Input(shape=(num_tags,), name='tags')

title_features = layers.Embedding(num_words, 64)(title_input)
body_features = layers.Embedding(num_words, 64)(body_input)

title_features = layers.LSTM(128)(title_features)
body_features = layers.LSTM(128)(body_features)

temp = layers.concatenate([title_features, body_features, tags_input])

priority_pred = layers.Dense(1, name='priority')(temp)
department_pred = layers.Dense(num_departments, name='num_departments')(temp)

输入使用列表,训练时喂入数据使用字典
model = keras.Model(inputs=[title_input, body_input, tags_input], outputs=[priority_pred, department_pred])
model.summary()

'''
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
title (InputLayer)              (None, None)         0
__________________________________________________________________________________________________
body (InputLayer)               (None, None)         0
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, None, 64)     640000      title[0][0]
__________________________________________________________________________________________________
embedding_2 (Embedding)         (None, None, 64)     640000      body[0][0]
__________________________________________________________________________________________________
lstm_1 (LSTM)                   (None, 128)          98816       embedding_1[0][0]
__________________________________________________________________________________________________
lstm_2 (LSTM)                   (None, 128)          98816       embedding_2[0][0]
__________________________________________________________________________________________________
tags (InputLayer)               (None, 12)           0
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 268)          0           lstm_1[0][0]
                                                                 lstm_2[0][0]
                                                                 tags[0][0]
__________________________________________________________________________________________________
priority (Dense)                (None, 1)            269         concatenate_1[0][0]
__________________________________________________________________________________________________
num_departments (Dense)         (None, 4)            1076        concatenate_1[0][0]
==================================================================================================
Total params: 1,478,977
Trainable params: 1,478,977
Non-trainable params: 0
__________________________________________________________________________________________________
'''

model.compile(
    optimizer=keras.optimizers.Adam(0.001),
    loss={
        'priority':keras.losses.binary_crossentropy,
        'num_departments':keras.losses.sparse_categorical_crossentropy,
    },
    metrics=['acc'],
    loss_weights=[1.0, 0.2]
)

model.fit(
    {'title':title_data, 'body':body_data, 'tags':tags_data},
    {'priority':priority_target, 'num_departments':department_target},
    batch_size=32,
    verbose=1,
    epochs=2,
)

七、生成器(generator)

import keras
from keras.layers import *
import numpy as np
import tensorflow as tf

'''

(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()

x_train = x_train / 255.0
x_test = x_test / 255.0

class MyGenerator:
    def __init__(self, x_train, y_train, batch_size):
        self.x_train = x_train
        self.y_train = y_train
        self.batch_size = batch_size
        self.steps = len(self.x_train) // self.batch_size
        if len(self.x_train) % self.batch_size != 0:
            self.steps += 1

    def __len__(self):
        return self.steps

    def __iter__(self):
        while True:
            count = 0
            x_list, y_list = [], []
            idxs = np.arange(self.x_train.shape[0])
            np.random.shuffle(idxs)
            for i in idxs:
                x_list.append(x_train[i])
                y_list.append(y_train[i])
                count += 1
                if count == self.batch_size:
                    yield np.array(x_list), np.array(y_list)
                    x_list, y_list = [], []

my_generator = MyGenerator(x_train, y_train, 32)

inputs = Input(shape=(28,28), name='inputs')
temp = Flatten()(inputs)
temp = Dense(256, activation='relu')(temp)
temp = Dense(128, activation='relu')(temp)
outputs = Dense(10, activation='softmax', name='outputs')(temp)
model = keras.Model(inputs, outputs)

model.summary()

optimizer = keras.optimizers.Adam(lr=0.001)
model.compile(
    optimizer=optimizer,
    metrics=['accuracy'],
    loss=keras.losses.sparse_categorical_crossentropy)

model.fit(iter(my_generator), batch_size=32, epochs=5, verbose=1, steps_per_epoch=len(my_generator))

八、keras.utils.Sequece

使用keras.utils.Sequece类喂入数据

import keras
from keras.layers import *
import numpy as np
import tensorflow as tf

(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()

x_train = x_train / 255.0
x_test = x_test / 255.0

class Fashion_Sequece(keras.utils.Sequence):
    def __init__(self, x_train, y_train, batch_size):
        self.x_train = x_train
        self.y_train = y_train
        self.batch_size = batch_size
        self.steps = (x_train.shape[0]) // self.batch_size
        if (x_train.shape[0]) % self.batch_size != 0:
            self.steps += 1

    def __len__(self):
        return self.steps

    def __getitem__(self, idx):
        batch_x = self.x_train[idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_y = self.y_train[idx * self.batch_size:(idx + 1) * self.batch_size]
        return batch_x, batch_y

My_Sequece = Fashion_Sequece(x_train, y_train, batch_size=32)

inputs = Input(shape=(28,28), name='inputs')
temp = Flatten()(inputs)
temp = Dense(256, activation='relu')(temp)
temp = Dense(128, activation='relu')(temp)
outputs = Dense(10, activation='softmax', name='outputs')(temp)
model = keras.Model(inputs, outputs)

model.summary()

optimizer = keras.optimizers.Adam(lr=0.001)
model.compile(
    optimizer=optimizer,
    metrics=['accuracy'],
    loss=keras.losses.sparse_categorical_crossentropy)

model.fit(My_Sequece, batch_size=32, epochs=5, verbose=1, steps_per_epoch=len(My_Sequece))

喜欢我的文章,可以动动小手,点个赞!!!!!

一起加油!!!!

Original: https://blog.csdn.net/qq_41744697/article/details/114490441
Author: 我真是啥也不会
Title: keras篇(1)–model.fit()的输入数据

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/761524/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球