Introduction to tensorflow Part 3

If you are new to tensorflow, we recommend to start this series from part 1

This is part 3 of tensorflow series. The goal of every single article is to make you understand one of the most popular deep learning library out there. This series is entirely focused on beginners, who are either starting to learn deep learning and want to built their own neural nets or want to build state-of-the-art neural network’s in tensorflow. This series of article will specifically covers basics of tensorflow for python, there is another version of tensorflow.js for javascript developers but that will be covered in next series.In last part of the series we will be building a MLP  for MNIST dataset using tensorflow.

After reading this article you will be,

  • Able to design MLP
  • Implement it using tensorflow
First check if tensorflow is working

This can be done just by importing tensorflow,


import tensorflow as tf

if every thing is correct this should be execute without giving any error, but there is a possibility that ipython displays warning for some library specific functions.

Check for computing devices like cpu or gpu

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

if you have only cpu it will display “/device:CPU:0” and with gpu  “/gpu:0” followed by gpu name,
note : Tensorflow only works with nvidia gpu’s as it currently only uses Cuda library.

The MLP architecture which we will be building will contain,

  1. Input layer of size 784, as each image size is 28×28 and we are using flatten version of it
  2. hidden layer 1 : size 10, i.e mlp 1st hidden layer will contain 10 neuron
  3. hidden layer 2 : size 5 i.e mlp 2nd hidden layer will contain 5 neuron

The size of the layer are the hyperparameters of the network, which have been set by using some experience, don’t worry too much about hyperparameter tuning.

let’s start building a mlp

import data


from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Mnist data is already present in sample datasets of tensorflow, the following code will download mnist dataset and place it inside “MNIST_data” folder.

Setting up hyperparameters

input_layer = 784
hidden_layer_1 = 10
hidden_layer_2 = 5
output_layer= 10

These are just variables which we will be using for storing layer configuration of our network.

Initialize placeholders for input layer and output layer

x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])

you can view these placeholders as describing size of input and output of the network.

Initialize weights of the model

weights = {
'h1': tf.Variable(tf.random_normal([input_layer, hidden_layer_1],stddev=0.062, mean=0)), 
'h2': tf.Variable(tf.random_normal([hidden__layer_1,hidden_layer_2],stddev=0.125, mean=0)), 
'out': tf.Variable(tf.random_normal([hidden_layer_2, output_layer],stddev=0.120, mean=0)) 
}

We have used a python dictionary to store all our weights layer-by-layer, I find this a very elegant approach rather than using individual variables for each layer.

Initializing biases

biases = {
'b1': tf.Variable(tf.random_normal([hidden_layer_1])), 
'b2': tf.Variable(tf.random_normal([hidden_layer_2])), 
'out': tf.Variable(tf.random_normal([output_layer])) 
}

We have followed the same approach for biases also.

Model parameters

epochs = 15 #total no of times the whole dataset will be used
learning_rate = 0.001 #learning rate of the algorithm
batch_size = 100 #batch size in case of optimizers other that standard gradient descent.

These parameter of the model will be fixed during whole computation

Defining MLP

This is the core step of this whole article.Before reading further if you are unfamiliar with activation function click here.


def MLP(x, weights, biases):

    #first hidden layer with relu activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)

    #second layer with relu activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']) 
    layer_2 = tf.nn.relu(layer_2)

    #output layer with sigmoid activation
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out'] 
    out_layer = tf.nn.sigmoid(out_layer)


return out_layer

If you walk through carefully , you will find that the procedure is quite simple as all we are doing is

“Weight times input add a bias, activate “

Defining our predicted output

y_predicted  = MLP(x, weights, biases)

y_predicted is the output produced by MLP.

Defining our loss function

we will be using one of the famous loss function the log loss or cross-entropy but as our task is one-vs-all classification we will be using softmax log loss.


loss_function = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = y_predicted, labels = y))

the whole idea is simple, just take the mean of all softmax results.

Defining our optimization algorithm

The most popular choices among all optimizers are Adadelta or adam but for the simplicity we will be using stochastic gradient descent


optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(loss_function)

Let’s run our model

This step will make use of all the steps mentioned above.


with tf.Session() as sess:
    tf.global_variables_initializer().run()
    for epoch in range(epochs):
        train_avg_cost = 0.
        test_avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)

            for i in range(total_batch): 
                x_train, y_train = mnist.train.next_batch(batch_size)

                cost = sess.run(loss_function, feed_dict={x: mnist.test.images, y: mnist.test.labels})
                test_avg_cost += c / total_batch

            print("Epoch:",epoch+1, "train cost",train_avg_cost, "test cost ",test_avg_cost)

The pseudocode of above tensorflow code is,

  1. Begin the  tensorflow session
  2. Initialize all the above mentioned variables
  3. for all epochs
    1. initialize avg cost of trainining for each epoch
    2. initialize avg cost of test for each epoch
    3. initialize the total number of batch, just divide whole training data with batch size
    4. for each batch in one epoch
      1. train the network over train each batch in epoch and compute training data cost and test data cost
    5. print epoch no , train cost and test cost
  4. end

Congratulation you have just implemented your first tensorflow mlp model which will be able to recognize handwritten digits. In next article we will try to cover a simple implementation of convnet for Mnist dataset.

For part 4 click here.

Send a Message