Lenet to recognizance handwritten numbers

1. Load MNIST data

Load MNIST  dataset comes pre-loaded with TensorFlow all you have to do is the following commands

from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", reshape=False)
X_train, y_train           = mnist.train.images, mnist.train.labels
X_validation, y_validation = mnist.validation.images, mnist.validation.labels
X_test, y_test             = mnist.test.images, mnist.test.labels

assert(len(X_train) == len(y_train))
assert(len(X_validation) == len(y_validation))
assert(len(X_test) == len(y_test))

print("\n Image Shape: {} \n".format(X_train[].shape))
These dataset has been previously split into sets of 55000 samples with image shape: (28, 28, 1). 
NOTE: Is good practice to use assert to confirm that all the data has been loaded correctly.
To find out how large the training set you can run:print("Training Set: {} samples".format(len(X_train)))

2. Prepare Data Set

In many cases, this step requires data augmentation and more. Because MNIST dataset has a 28x28x1 shape and the architecture we are using for this example only accepts 32x32x? images (? = color channels), we are just going to reshape images to 1x32x32x1

import numpy as np
img_size = 32

if X_train[].shape[1] is not img_size:
    X_train      = np.pad(X_train, ((,),(2,2),(2,2),(,)), 'constant')
    X_validation = np.pad(X_validation, ((,),(2,2),(2,2),(,)), 'constant')
    X_test       = np.pad(X_test, ((,),(2,2),(2,2),(,)), 'constant')
    
print("Updated Image Shape: {}".format(X_train[].shape))
 
Updated Image Shape: (32, 32, 1)
 how padding works:
 

3. Take a look at the data set, How does this dataset look like?

import random
import matplotlib.pyplot as plt
%matplotlib inline

index = random.randint(, len(X_train))
image = X_train[index].squeeze()

plt.figure(figsize=(1,1))
plt.imshow(image, cmap="gray")
print(y_train[index])
 
 

3. Architecture

For this example, I am using LeNet architecture http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

import tensorflow as tf
from tensorflow.contrib.layers import flatten

tf.reset_default_graph()

# Using tf.truncated_normal to initilize weights and biases varibles
# with random normal distribution 
mu = 
sigma = 0.1

conv1_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 1, 6), mean = mu, stddev = sigma))
conv1_b = tf.Variable(tf.zeros(6))

conv2_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 6, 16), mean = mu, stddev = sigma))
conv2_b = tf.Variable(tf.zeros(16))

fc1_W = tf.Variable(tf.truncated_normal(shape=(400, 120), mean = mu, stddev = sigma))
fc1_b = tf.Variable(tf.zeros(120))

fc2_W  = tf.Variable(tf.truncated_normal(shape=(120, 84), mean = mu, stddev = sigma))
fc2_b  = tf.Variable(tf.zeros(84))

fc3_W  = tf.Variable(tf.truncated_normal(shape=(84, 10), mean = mu, stddev = sigma))
fc3_b  = tf.Variable(tf.zeros(10))

def LeNet(x):     
    
    print(x.shape)
    #Layer 1: Convolutional. Input = 32x32x1. Output = 28x28x6.
    conv1   = tf.nn.conv2d(x, conv1_W, strides=[1, 1, 1, 1], padding='VALID') + conv1_b
    conv1 = tf.nn.relu(conv1)
    #Pooling. Input = 28x28x6. Output = 14x14x6.
    conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')

    #Layer 2: Convolutional. Output = 10x10x16.
    conv2   = tf.nn.conv2d(conv1, conv2_W, strides=[1, 1, 1, 1], padding='VALID') + conv2_b
    conv2 = tf.nn.relu(conv2)
    #Pooling. Input = 10x10x16. Output = 5x5x16.
    conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
    #Flatten. Input = 5x5x16. Output = 400.
    fc0   = flatten(conv2)
    
    #Layer 3: Fully Connected. Input = 400. Output = 120.
    fc1   = tf.matmul(fc0, fc1_W) + fc1_b
    fc1    = tf.nn.relu(fc1)

    #Layer 4: Fully Connected. Input = 120. Output = 84.
    fc2    = tf.matmul(fc1, fc2_W) + fc2_b
    fc2    = tf.nn.relu(fc2)

    #Layer 5: Fully Connected. Input = 84. Output = 10.
    logits = tf.matmul(fc2, fc3_W) + fc3_b
    
    return logits


#Placeholder variables
x = tf.placeholder(tf.float32, (None, img_size, img_size, 1))
y = tf.placeholder(tf.int32, (None))

save_path = 'model/model.ckpt'

placeholder is a variable that we can use to transfer a value when we run our Tensorflow session.
It creates a memory space for variables that will be using in the future.
In this case x stands for our images and y for our labels

<strong>Logits</strong> is a matrix with an estimate number of how likely the input image is to be of the a class. In order for this number to look like a provability we have to normilize them (zero to one) using softmax. 

In the following section, you would notice that we would be using BATCH_SIZE = 128. This means that instead of feeding images in training data one by one to the network we would be feeing 128 at one time and let the GPU/CPU process all of them in parallel. and compute loss, calculate the gradient and update weights and proceed with next image. The choice of this size is up to you and depends on your problem too.

Note: Very big batch sizes adversely affect the rate at which optimizer converges as well as the quality of the final solution. So the target batch sizes should be in the hundreds or possibly thousands.

4. Deep learning pipeline Model

EPOCHS = 10
BATCH_SIZE = 128

rate = 0.001
one_hot_y = tf.one_hot(y, 10)

# Get logits
logits = LeNet(x)
# Computes softmax cross entropy between logits and labels
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=one_hot_y, logits=logits)
#Calculate loss
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate = rate)
training_operation = optimizer.minimize(loss_operation)

# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()
saver = tf.train.Saver()
 

5. Set a validation system

correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

def evaluate(X_data, y_data):
    num_examples = len(X_data)
    total_accuracy = 
    sess = tf.get_default_session()
    for offset in range(, num_examples, BATCH_SIZE):
        batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
        accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y: batch_y})
        total_accuracy += (accuracy * len(batch_x))
    return total_accuracy / num_examples
 

6. Start Tensorflow training session

import time
beginTime = time.time()
with tf.Session() as sess:
    sess.run(init_op)
    dataLen = len(X_train)

    print("Training...")
    print()
    for i in range(EPOCHS):
        for offset in range(, dataLen, BATCH_SIZE):
            end = offset + BATCH_SIZE
            batch_x, batch_y = X_train[offset:end], y_train[offset:end]
            sess.run(training_operation, feed_dict={x: batch_x, y: batch_y})

        validation_accuracy = evaluate(X_validation, y_validation)

        endTime = time.time()     
        print("{:5.2f}s EPOCH {} Accuracy = {:.3f} \n".format(\
        endTime - beginTime, i+1, validation_accuracy))

    save_path = saver.save(sess, 'model/model.ckpt')
    print("Model saved")
 
Training...

 4.94s EPOCH 1 Accuracy = 0.958 

 7.64s EPOCH 2 Accuracy = 0.970 

10.23s EPOCH 3 Accuracy = 0.975 

12.85s EPOCH 4 Accuracy = 0.978 

15.48s EPOCH 5 Accuracy = 0.982 

18.06s EPOCH 6 Accuracy = 0.982 

20.69s EPOCH 7 Accuracy = 0.984 

23.31s EPOCH 8 Accuracy = 0.983 

25.92s EPOCH 9 Accuracy = 0.987 

28.55s EPOCH 10 Accuracy = 0.987 

Model saved
Manuel Cuevas

Manuel Cuevas

Hello, I'm Manuel Cuevas a Software Engineer with background in machine learning and artificial intelligence.