Transpose a CNN (Tensoflow)
Overview
This is a quick example of how to transpose a conv layer and as well how to use duse a dense layer, tf.layers.dense as a convolutional layer tf.layers.conv2d.
Transposed layers
Transposed Convolutions worked as backward strided convolution to help in upsampling the previous layer
to a higher resolution or dimension.
Upsampling
is a classic signal processing technique which is often accompanied by interpolation.
The term transpose mean transfer to a different place or context. We can use a transposed
convolution to transfer patches of data onto a sparse matrix, then we can fill the sparse area of the
matrix based on the transferred information. Helpful animations of convolutional operations, including
transposed convolutions, can be found here.
As an example
, suppose you have a 3x3 input and you wish to upsample that to
the desired dimension of 6x6. The process involves multiplying each pixel of your input
with a kernel or filter. If this filter was of size 5x5, the output of this operation will be a weighted
kernel of size 5x5. This weighted kernel then defines your output layer. However, the upsampling part of
the process is defined by the strides and the padding. In TensorFlow, using the tf.layers.conv2d_transpose,
a stride of 2, and “SAME” padding would result in an output of dimensions 6x6. Let’s look at a simple
representation of this. If we have a 2x2 input and a 3x3 kernel; with “SAME” padding, and a stride of 2
we can expect an output of dimension 4x4.
The following code example gives an idea of the process.
results
np.random.rand(1, 4, 4, 1)  Transpose of a dense layer  Transpose of a conv layer 

[[[[4.17022005e01]  [[[[1.9132932e+00]  [[[[1.9132932e+00] 
[7.20324493e01]  [3.3048427e+00]  [3.3048427e+00] 
[1.14374817e04]  [5.2475062e04]  [5.2475062e04] 
[3.02332573e01]]  [1.3870993e+00]]  [1.3870993e+00]] 
[[1.46755891e01]  [[6.7331469e01]  [[6.7331469e01] 
[9.23385948e02]  [4.2364866e01]  [4.2364866e01] 
[1.86260211e01]  [8.5456026e01]  [8.5456026e01] 
[3.45560727e01]]  [1.5854297e+00]]  [1.5854297e+00]] 
[[3.96767474e01]  [[1.8203658e+00]  [[1.8203658e+00] 
[5.38816734e01]  [2.4720864e+00]  [2.4720864e+00] 
[4.19194514e01]  [1.9232609e+00]  [1.9232609e+00] 
[6.85219500e01]]  [3.1437812e+00]]  [3.1437812e+00]] 
[[2.04452250e01]  [[9.3802512e01]  [[9.3802512e01] 
[8.78117436e01]  [4.0287952e+00]  [4.0287952e+00] 
[2.73875932e02]  [1.2565404e01]  [1.2565404e01] 
[6.70467510e01]]]]  [3.0760992e+00]]]]  [3.0760992e+00]]]] 
import tensorflow as tf
import numpy as np
x1 = tf.ones(shape=[64, 7, 7, 256])
y1 = tf.layers.conv2d_transpose(x1, 128, 3, strides=2, padding='SAME')
w = tf.ones([3, 3, 128, 256])
y2 = tf.nn.conv2d_transpose(x1, w, output_shape=[64, 14, 14, 128], strides=[1, 2, 2, 1], padding='SAME')
x2 = tf.nn.conv2d(y2, w, strides=[1, 2, 2, 1], padding='SAME')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
y1_value, y2_value, x2_value=sess.run([y1, y2, x2])
print('downsampleg example')
print(y1_value.shape)
print(y2_value.shape)
print(x2_value.shape)
tf.reset_default_graph()
image = tf.ones(shape=[64, 14, 14, 128])
w = tf.ones([3, 3, 128, 256])
x = tf.nn.conv2d(image, w, strides=[1, 3, 3, 1], padding='VALID')
y1 = tf.layers.conv2d_transpose(x, 128, kernel_size=3, strides=3, padding='VALID')
y2 = tf.nn.conv2d_transpose(x, w, output_shape=[64, 14, 14, 128], strides=[1, 3, 3, 1], padding='VALID')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
x_value, y1_value, y2_value = sess.run([x, y1, y2])
print('upsample example')
print(x_value.shape)
print(y1_value.shape)
print(y2_value.shape)
downsample example
(64, 14, 14, 128)
(64, 14, 14, 128)
(64, 7, 7, 256)
upsample example
(64, 4, 4, 256)
(64, 12, 12, 128)
(64, 14, 14, 128)
As you see above, the underlying math will be the same for dense layer and conv layer, but the spatial information will be preserved allowing seamless use of future convolutional layers.
tf.layers.conv2d_transpose(conv_out, 1, (1, 1), (1, 1), kernel_initializer=custom_init)

The second argument 1 is the number of kernels/output channels.

The third argument is the kernel size, (1, 1). Note that the kernel size could also be (1, 1) and the output shape would be the same. However, if it were changed to (3, 3) note the shape would be (9, 9), at least with ‘VALID’ padding.

The fourth argument, the number of strides, is how we get from a height and width from (4, 4) to (8, 8). If this were a regular convolution the output height and width would be (2, 2).`
Would you like to see a simple example on how to use transpose layers for segmentation. Take a look at my road segmentation project
Reference
Concepts from this blog come from Udacity selfdriving CarND program