The distribution of input data to each layer during the training process of deep neural networks may change as the network learns. This process is known as internal covariate shift and this can disrupt the optimization process.

Using BatchNormaliaztion we address the internal covariate shift by normalizing the activations which leads to speeding up the convergence.

Where to exactly place batch normalization in Keras

The BatchNormalization is usually placed under the convolutional and dense layer, but it must be before the activation function. This ordering is very important here for proper functioning.

Following is the example:

from tensorflow import keras
from tensorflow.keras.layers import Conv2D, BatchNormalization, Activation, MaxPooling2D, Flatten, Dense

model = keras.Sequential()

# Add a Conv2D layer
model.add(Conv2D(64, (3, 3), padding='same', input_shape=(32, 32, 3)))
# Add BatchNormalization after Conv2D
model.add(BatchNormalization())
# Add Activation function
model.add(Activation('relu'))
# Add MaxPooling2D layer
model.add(MaxPooling2D(pool_size=(2, 2)))
# Define the rest of your model architecture

In the above example, we created a sequential model using Keras and add a Conv2D layer. After the convolutional layer, we add the BatchNormalization layer.

By inserting the normalization before the activation function we ensure a more effective stabilization of the training process.

Using batch normalization with the dense layer

The same applies when using batch normalization with dense layers i.e. we add a Dense layer to the model, followed by BatchNormalization layer and then apply the activation function.

Following is the example:

model = keras.Sequential()

# Add a Dense layer
model.add(Dense(256, input_shape=(784,)))
# Add BatchNormalization after Dense layer
model.add(BatchNormalization())
# Add Activation function
model.add(Activation('relu'))
# Continue defining the rest of your model architecture
...

Categorized in: