Detailed model introduction to the keras function API - (with comments)

Keras's functional model is Model, which is a generalized model with inputs and outputs. The tf.keras.Sequential model is a simple stack of layers and cannot represent any model.The Keras functional API allows you to build complex model topologies.For example:

  • Multiple input model,

  • Multiple output models,

  • A model with a shared layer (the same layer is called multiple times),

  • Models with non-sequential data streams (for example, residual joins).

Fully Connected Neural Network

The Sequential model of a fully connected network may be more appropriate, but it is only used as an example (for comparison) because simple networks are easier to understand.

import tensorflow as tf
from keras.layers import Input, Dense
from keras.models import Model

# Returns a tensor
inputs = Input(shape=(784,))

# An instance of a layer is callable, takes a tensor as a parameter, and returns a tensor
x = Dense(64, activation='relu')(inputs)            # First floor
x = Dense(64, activation='relu')(x)                 # The second floor
outputs = Dense(10, activation='softmax')(x)        # output layer
# This section creates a model that includes an input layer and three fully connected layers
model = Model(inputs=inputs, outputs=outputs)

model.compile(optimizer, loss, metrics=None, loss_weights=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None)         # Compile Model
history = x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)  # Start training
  • An instance of the network layer is callable, takes a tensor as a parameter, and returns a tensor
  • Both input and output are tensors, and they can be used to define a Model.
  • The model, like Keras's Sequential model, can be trained

model.compile(self, optimizer, loss, metrics=None, loss_weights=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None)

  • Optimizer: optimizer, for predefined optimizer names or optimizer objects, reference Optimizer

  • Loss: loss function, a predefined loss function name or an objective function, reference loss function

  • Metrics: A list of indicators that assess the performance of a model during training and testing. Typically, metrics=['accuracy'] If you want to specify different indicators for different outputs in a multiple output model, you can pass a dictionary like this parameter, for example, metrics={'ouput_a':'accuracy'}

  • sample_weight_mode: If you need to assign samples (2D weight matrix) by time step, set the value to "temporal".The default is "None", meaning grant by sample (1D weight).If the model has multiple outputs, you can pass in the specified sample_to the parameterWeight_Dictionary or list of models.There are relevant references in the explanation of the fit function below.

  • Weighted_Metrics: A list of metrics that will be generated by sample_during training and testingWeight or clss_weight calculation and weighting

  • target_tensors: By default, Keras creates a placeholder for the model's target that will be replaced by the target data during training.If you want to use your own target tensors (which Keras will not expect to load external numpy data for during training), you can specify manually through this parameter.The target tensor can be a single tensor (corresponding to a single output model), a list of tensors, or a tensor dictionary with name->tensor.

  • Kwargs: Ignore this parameter when using TensorFlow as backend. If Theano/CNTK is used as backend, the value of kwargs will be passed to K.function.If TensorFlow is used as the backend, the value here is passed to

Note: An exception is thrown when an illegal value is passed in for a parameter.If you just load the model and use its predict, you don't need to compile.In Keras, compile mainly completes some configuration of the loss function and optimizer to serve the training.Predict compiles symbolic functions internally (by calling _make_predict_function generation function)

fit, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)

  • x: Input data.If the model has only one input, the type of x is numpy array. If the model has multiple inputs, the type of x should be list, and the elements of list should be numpy array corresponding to each input.If each input to the model has a name, a dictionary can be passed in that corresponds to the input data.

  • y: label, numpy array.If the model has multiple outputs, you can pass in a list of numpy arrays.If the output of the model has a name, a dictionary can be passed in that corresponds to its label.

  • batch_size: An integer specifying the number of samples per batch for gradient descent.A batch sample is calculated as a gradient descent during training to optimize the objective function one step.

  • Epochs: Integer, the epoch value at the end of the training, the training will stop when the epoch value is reached, when initial_is not setWhen epoch, it is the total number of rounds of training, otherwise the total number of rounds of training is epochs - inital_epoch

  • verbose: The log shows that 0 does not output log information in the standard output stream, 1 records the output progress bar, and 2 records a row for each epoch

  • Callbacks:list, where the element is the object of keras.callbacks.Callback.The callback function in this list will be called at the appropriate time during the training process, referring to the callback function

  • Validation_A floating point number between split:0 and 1 that specifies a certain proportion of the training set's data as the validation set.The validation set will not participate in the training and will test the model's indicators, such as loss function, accuracy, and so on, after each epoch.Notice that validation_split is split after shuffle, so if your data itself is ordered, you need to manually unmarshal it before specifying validation_split, otherwise uneven validation set samples may occur.

  • validation_data: in the form (X, y) or (X, y, sample_The tuple of weights) is the specified set of validations.This parameter will override validation_spilt.

  • shuffle: Boolean value indicating whether to randomly disrupt the order of input samples before each epoch during training.

  • class_weight: A dictionary that maps different categories to different weights and is used to adjust the loss function during training (only for training).When dealing with unbalanced training data (with a small number of training samples in some classes), this parameter can make the loss function pay more attention to the data with insufficient samples.

  • sample_weight: numpy array of weights used to adjust the loss function during training (training only).A 1D vector of the same length as the sample can be passed to weigh the sample one to one, or in the face of time series data, a form can be passed (samples, sequence_The length matrix assigns different weights to the samples at each time step.In this case, make sure that sample_is added when compiling the modelWeight_Mode='temporal'.

  • initial_epoch: Starts with the epoch specified in this parameter and is useful when continuing previous training.

  • steps_per_epoch: The number of steps that an epoch contains (each step is a batch data feed), and when training with an input tensor such as TensorFlow data Tensor, the default one represents automatic segmentation, that is, the number of dataset samples/batch samples.

  • validation_steps:Only if steps_Per_The total number of steps on the validation set that are useful when epoch is specified.

Note: The fit function returns a History object whose History.history property records how the values of the loss function and other indicators change with epoch and, if there is a validation set, also includes the changes of these indicators of the validation set.

Multiple Input and Multiple Output Model

For example, try to predict a news headline's forward and point approval.The main input of the model is the headline itself (a series of words), along with other auxiliary inputs, such as when the headline was published.The model will also be supervised by two loss functions.Using the principal loss function in a model earlier is a good regularization method for learning the model in depth.
The model structure is shown in the following figure:

The code is as follows:

from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model

# Title input: Receives a sequence of 100 integers, each between 1 and 10,000.
# Name any layer by passing a "name" parameter
main_input = Input(shape=(100,), dtype='int64', name='main_input')

# The Embedding layer encodes the input sequence as a sequence of dense vectors
# Each vector dimension is 512
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)

# LSTM layer converts a sequence of vectors into a single vector
# It contains context information for the entire sequence
lstm_out = LSTM(32)(x)

# The insertion of an auxiliary loss allows the LSTM and Embedding layers to be smoothly trained even when the main loss of the model is high.
output2 = Dense(1, activation='sigmoid', name='output2')(lstm_out) # output2
input2 = Input(shape=(5,), name='input2')                          # input2

# Linking secondary input data to the output of the LSTM layer
x = keras.layers.concatenate([lstm_out, input2])

# Stack multiple fully connected network layers
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)

# Add Primary Logistic Regression Layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
# A model that defines two inputs and outputs for the entire network
model = Model(inputs=[main_input, input2], outputs=[main_output, output2])

# Compile the model and assign a weight of 0.2 to the auxiliary loss.If you want to specify different loss_for different outputsWeights or loss, you can use lists or dictionaries.Here, a single loss function is passed to the loss parameter, which is used for all outputs.
model.compile(optimizer='rmsprop', loss='binary_crossentropy',
              loss_weights=[1., 0.2])[headline_data, additional_data], [labels, labels],
          epochs=50, batch_size=32)

# Or you can compile and train in the following ways
              loss={'main_output': 'binary_crossentropy', 'output2': 'binary_crossentropy'},
              loss_weights={'main_output': 1., 'output2': 0.2})

# Then train in the following ways:{'main_input': headline_data, 'input2': additional_data},
          {'main_output': labels, 'output2': labels},
          epochs=50, batch_size=32)

Share Network Layer

Another use of the functional API is to use a shared network layer model to consider Twitter datasets.We want to build a model to distinguish whether two tweets come from the same person (for example, to compare users by their similarity).
One way to achieve this goal is to build a model that codes two tweets into two vectors, connects the vectors, and then adds a logical regression layer.This outputs the probability that two tweets will come from the same author.The model will receive a pair of tweets representing positive and negative pairs.
Since this problem is symmetric, the mechanism for encoding the first tweet should be fully reused to encoding the second tweet (weights and all others).Here we use a shared LSTM layer to encode tweets.
First, we will convert a Twitter into a matrix of size (280, 256), 280 characters per Twitter and 256-dimensional one-hot encoding vectors (256 common characters).

import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model

tweet_a = Input(shape=(280, 256))
tweet_b = Input(shape=(280, 256))
# To share the same layer on different inputs, you only need to instantiate it once
# This layer can enter a matrix and return a 64-dimensional vector
shared_lstm = LSTM(64)

# When we reuse the same layer instance multiple times, the layer's weight is also reused (it's actually the same layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)

# Then connect the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)

# Add another logistic regression layer above
predictions = Dense(1, activation='sigmoid')(merged_vector)

# Define a trainable model that connects Twitter inputs to predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)

              metrics=['accuracy'])[data_a, data_b], labels, epochs=100)

Another example:

import tensorflow as tf
from tensorflow.keras import layers 
from keras.layers import Input

# Encoder Network
encode_input = keras.Input(shape=(28,28,1), name='img')
h1 = layers.Conv2D(16, 3, activation='relu')(encode_input)
h1 = layers.Conv2D(32, 3, activation='relu')(h1)
h1 = layers.MaxPool2D(3)(h1)
h1 = layers.Conv2D(32, 3, activation='relu')(h1)
h1 = layers.Conv2D(16, 3, activation='relu')(h1)
encode_output = layers.GlobalMaxPool2D()(h1)
encode_model = keras.Model(inputs=encode_input, outputs=encode_output, name='encoder')

# Decoder Network
decode_input = keras.Input(shape=(16,), name='encoded_img')
h2 = layers.Reshape((4, 4, 1))(decode_input)
h2 = layers.Conv2DTranspose(16, 3, activation='relu')(h2)
h2 = layers.Conv2DTranspose(32, 3, activation='relu')(h2)
h2 = layers.UpSampling2D(3)(h2)
h2 = layers.Conv2DTranspose(16, 3, activation='relu')(h2)
decode_output = layers.Conv2DTranspose(1, 3, activation='relu')(h2)
decode_model = keras.Model(inputs=decode_input, outputs=decode_output, name='decoder')

You can use these two networks as one layer

autoencoder_input = keras.Input(shape=(28,28,1), name='img') # input
h3 = encode_model(autoencoder_input)                         # Call Encoder Network
autoencoder_output = decode_model(h3)                        # Call Decoder Network
autoencoder = keras.Model(inputs=autoencoder_input, outputs=autoencoder_output,

Residual Network

For more information on Residual Networks, see Deep Residual Learning for Image Recognition

from keras.layers import Conv2D, Input

# Input tensor is 3-channel 256x256 image
x = Input(shape=(256, 256, 3))
# 3x3 Convolution Kernel for 3 Output Channels (Same as Input Channels)
y = Conv2D(3, (3, 3), padding='same')(x)
# Return x + y
z = keras.layers.add([x, y])

You can also refer to the Tenorflow tutorial (Small Residual Network)

import tensorflow as tf
from tensorflow.keras import layers 
from keras.layers import Input
inputs = keras.Input(shape=(32,32,3), name='img')
h1 = layers.Conv2D(32, 3, activation='relu')(inputs)
h1 = layers.Conv2D(64, 3, activation='relu')(h1)
block1_out = layers.MaxPooling2D(3)(h1)

h2 = layers.Conv2D(64, 3, activation='relu', padding='same')(block1_out)
h2 = layers.Conv2D(64, 3, activation='relu', padding='same')(h2)
block2_out = layers.add([h2, block1_out])

h3 = layers.Conv2D(64, 3, activation='relu', padding='same')(block2_out)
h3 = layers.Conv2D(64, 3, activation='relu', padding='same')(h3)
block3_out = layers.add([h3, block2_out])

h4 = layers.Conv2D(64, 3, activation='relu')(block3_out)
h4 = layers.GlobalMaxPool2D()(h4)
h4 = layers.Dense(256, activation='relu')(h4)
h4 = layers.Dropout(0.5)(h4)
outputs = layers.Dense(10, activation='softmax')(h4)

model = keras.Model(inputs, outputs, name='small resnet')
keras.utils.plot_model(model, 'small_resnet_model.png', show_shapes=True)
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
x_train = x_train.astype('float32') / 255
x_test = y_train.astype('float32') / 255
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

             metrics=['acc']), y_train,

Tags: Python TensorFlow api

Posted by cockney on Sat, 22 May 2021 03:34:54 +0930