Using Keras CNN to clone driving behavior

This is a convolution neural network with data augmentation that can successfully clone the behavior of a car to drive around a track. With the used of this model the car in the simulation can be driven autonomously around the track by executing python The model was tested by running it through the simulator and ensuring that the vehicle could stay on the track.



The data set consists of image data and steering angles to train a neural network and then uses this model to drive the car autonomously around the track. Training data was chosen to keep the vehicle driving on the road. I used a combination of center lane driving, recovering from the left and right sides of the road. The training data used to train this model was a combined set of the couple recorded successful driving laps and other datasets that emphasized handling the sharp curves. Data was split 20% for validation and 80% for training.

  • To exclude the features that are invariant for the training model, the images have been cropped to limit the view to the road

Data augmentation

Data augmentation is fundamentally important for improving the performance of the networks, it allows the network to learn the important features that are invariant for the object classes, rather than the artifact of the training images. We have explored two different ways of doing augmentation to artificially increase the size of the dataset.

  • Random Brightness factor. Knowing that roads can dramatically change brightness due to times of the day, or weather, jittered was used to randomize brightness tones.
  • Rotate the image by 180° degrees on the horizontal axis


Adam Optimization algorithm: I used the Adam optimization algorithm, which is an extension to stochastic gradient descent except for Stochastic gradient descent maintains a single learning rate (termed alpha) for all weight updates and the learning rate does not change during training.

Adam embraces the benefits of both AdaGrad and RMSProp:

  • Straightforward to implement.
  • Computationally efficient.
  • Little memory requirements.
  • Invariant to diagonal rescale of the gradients.
  • Well suited for problems that are large in terms of data and/or parameters.
  • Appropriate for non-stationary objectives.
  • Appropriate for problems with very noisy/or sparse gradients.
  • Hyper-parameters have intuitive interpretation and typically require little tuning.


The overall strategy for developing the model architecture was to use a known architecture model as a base start and modifying it as required. One of the models I tried was NVIDIA's model "End to End Learning for Self-Driving Cars" 5 convolution layers with filter size 5x5, 3x3, and 2x4, but the result were not very good. On the other hand, I used end GoogleNet stem (the first part of the architecture, with some twists) and I was able to have very good results with that one. With this architecture, the car can complete multiple laps around the track with no issues. The model has one normalization layer, ELU(Exponential linear unit) activation layer after every layer are used to introduce non-linearity, and zero padding feature to control the spatial size of the output volumes and to smooth the training where neurons don’t fit neatly and symmetrically across the input. Training time is about 94s per epoch, and the validation loss 0.0117.

33240/33240 [==============================] - 94s - loss: 0.0104 - val_loss: 0.0117

Input 1 - 75, 320, 3
Convolution 64 7x7 -
MaxPooling 64 2X2 19, 80, 64
Convolution 64 1x1 -
Convolution 193 3x3 -
MaxPooling 193 2x2 4, 10, 193
AveragePooling 193 2x2 2, 5, 193
Flatten - - 1930
Dense 1500 Dropout 0.2 1500
Dense 500 Dropout 0.2 500
Dense 1 - 1

The model used an adam optimizer, so the learning rate was not tuned manually (


To reduce overfitting in the model, the model contains two dropout layers in the first two fully-connected layers and Max Polling Layers periodically inserted an in-between successive Conv layers. Polling Layer used has filters of size 2x2 applied with a stride of 2 down-sample every depth slice, discarding 75% of the activation. Without dropout, the network ex-hibits substantial overfitting. Dropout roughly doubles the number of iterations required to converge.(

Dropout technique consists of setting to zero the output of each hidden neuron with probability 0.2. The neurons which are “dropped out” in this way do not contribute to the forward pass and do not participate in back-propagation. So every time an input is presented, the neural network samples a different architecture,but all these architectures share weights. This technique reduces complex co-adaptations of neurons, since a neuron cannot rely on the presence of particular other neurons. It is, therefore, forced to learn more robust features that are useful in conjunction with many different random subsets of the other neurons.(6)


There were two spots where the vehicle fell off the track. One of the spots was the curve with dirt on the side and the other spot was at a sharp curve. To improve the driving behavior in the dirt curve, I recorded more data from the dirt curve and in combination with the data already being used, I was able to pass that part. To resolve the second issue, I increased the offset for the side images to 2.8 and incorporate it to my training data as a set of horizontal flipped side images with their corresponding inverse label. I noticed that a small offset to the side images were causing the car to get too close to the sides on sharp curves; causing the car to get off the road.


  • (1), “CS231 Convolutional Neural Networks”, link
  • (2) Karen Simonyan, Andrew Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition”, link
  • (3) NVIDIA Corporation, “End to End Learning for Self-Driving Cars”, link
  • (4) Ren Wu, Shengen Yan, Hi Shan, Qingqing Dang, Gang Sun, “Deep Image: Scaling up Image Recognition”, link
  • (5) Djork-Arne Clevert, Thomas Unterthiner & Sepp Hochreiter, “Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)”, link
  • (6) Geoffrey E. Hinton, Ilya Sutskever & Alex Krizhevsky, “ImageNet Classification with Deep Convolutional Neural Networks”, link
  • (6) Adam: A Method for Stochastic Optimization, link
Written on Jan 20, 2017
Manuel Cuevas

Manuel Cuevas

Hello, I'm Manuel Cuevas a Software Engineer with background in machine learning and artificial intelligence.