The distribution of input data to each layer during the training process of deep neural networks may change as the network learns. This process is known as internal covariate shift and this can disrupt the optimization process.
Using BatchNormaliaztion we address the internal covariate shift by normalizing the activations which leads to speeding up the convergence.
Where to exactly place batch normalization in Keras
The BatchNormalization is usually placed under the convolutional and dense layer, but it must be before the activation function. This ordering is very important here for proper functioning.
Following is the example:
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, BatchNormalization, Activation, MaxPooling2D, Flatten, Dense
model = keras.Sequential()
# Add a Conv2D layer
model.add(Conv2D(64, (3, 3), padding='same', input_shape=(32, 32, 3)))
# Add BatchNormalization after Conv2D
model.add(BatchNormalization())
# Add Activation function
model.add(Activation('relu'))
# Add MaxPooling2D layer
model.add(MaxPooling2D(pool_size=(2, 2)))
# Define the rest of your model architecture
In the above example, we created a sequential model using Keras and add a Conv2D
layer. After the convolutional layer, we add the BatchNormalization
layer.
By inserting the normalization before the activation function we ensure a more effective stabilization of the training process.
Using batch normalization with the dense layer
The same applies when using batch normalization with dense layers i.e. we add a Dense layer to the model, followed by BatchNormalization
layer and then apply the activation function.
Following is the example:
model = keras.Sequential()
# Add a Dense layer
model.add(Dense(256, input_shape=(784,)))
# Add BatchNormalization after Dense layer
model.add(BatchNormalization())
# Add Activation function
model.add(Activation('relu'))
# Continue defining the rest of your model architecture
...