Softmax Regression using TensorFlow
Last Updated :
10 Mar, 2023
This article discusses the basics of Softmax Regression and its implementation in Python using the TensorFlow library.
Softmax regression
Softmax regression (or multinomial logistic regression) is a generalization of logistic regression to the case where we want to handle multiple classes in the target column. In binary logistic regression, the labels were binary, that is for ith observation,
But consider a scenario where we need to classify an observation out of three or more class labels. For example, in digit classification here, the possible labels are:
In such cases, we can use Softmax Regression.
Softmax layer
It is harder to train the model using score values since it is hard to differentiate them while implementing the Gradient Descent algorithm for minimizing the cost function. So, we need some function that normalizes the logit scores as well as makes them easily differentiable. In order to convert the score matrix Z to probabilities, we use the Softmax function. For a vector y, softmax function S(y) is defined as:
So, the softmax function helps us to achieve two functionalities:
1. Convert all scores to probabilities.
2. Sum of all probabilities is 1.
Recall that in the Binary Logistic regression, we used the sigmoid function for the same task. The softmax function is nothing but a generalization of the sigmoid function. Now, this softmax function computes the probability that the ith training sample belongs to class j given the logits vector Zi as:
In vector form, we can simply write:
For simplicity, let Si denote the softmax probability vector for ith observation.
Cost function
Now, we need to define a cost function for which, we have to compare the softmax probabilities and one-hot encoded target vector for similarity. We use the concept of Cross-Entropy for the same. The Cross-entropy is a distance calculation function that takes the calculated probabilities from the softmax function and created a one-hot-encoding matrix to calculate the distance. For the right target classes, the distance values will be lesser, and the distance values will be larger for the wrong target classes. We define cross-entropy, D(Si, Ti) for ith observation with softmax probability vector, Si, and one-hot target vector, Ti as:
And now, the cost function, J can be defined as the average cross-entropy.
Let us now implement Softmax Regression on the MNIST handwritten digit dataset using the TensorFlow library. For a gentle introduction to TensorFlow, follow this tutorial.
Importing Libraries and Dataset
First of all, we import the dependencies.
Python3
import tensorflow as tf
import tensorflow.compat.v1 as tf1
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
|
TensorFlow allows you to download and read the MNIST data automatically. Consider the code given below. It will download and assign the MNIST_data to the desired variables like it has been done below.
Python3
(X_train, Y_train),\
(X_val, Y_val) = tf.keras.datasets.mnist.load_data()
print ( "Shape of feature matrix:" , X_train.shape)
print ( "Shape of target matrix:" , Y_train.shape)
|
Output:
Shape of feature matrix: (60000, 28, 28)
Shape of target matrix: (60000,)
Now, we try to understand the structure of the dataset. The MNIST data is split into two parts: 60,000 data points of training data, and 10,000 points of validation data. Each image is 28 pixels by 28 pixels. The number of class labels is 10.
Python3
fig, ax = plt.subplots( 10 , 10 )
for i in range ( 10 ):
for j in range ( 10 ):
k = np.random.randint( 0 ,X_train.shape[ 0 ])
ax[i][j].imshow(X_train[k].reshape( 28 , 28 ),
aspect = 'auto' )
plt.show()
|
Output:
Sample images from the MNIST data
Now let’s define some hyperparameters here only so, that we can control them for the whole notebook from here only. Also, we need to reshape the data, as well as one hot encode the data to get the desired results.
Python3
num_features = 784
num_labels = 10
learning_rate = 0.05
batch_size = 128
num_steps = 5001
train_dataset = X_train.reshape( - 1 , 784 )
train_labels = pd.get_dummies(Y_train).values
valid_dataset = X_val.reshape( - 1 , 784 )
valid_labels = pd.get_dummies(Y_val).values
|
Computation Graph
Now, we create a computation graph. Defining a computation graph helps us to achieve the functionality of the EagerTensor that is provided by TensorFlow.
Python3
graph = tf.Graph()
with graph.as_default():
tf_train_dataset = tf1.placeholder(tf.float32,
shape = (batch_size, num_features))
tf_train_labels = tf1.placeholder(tf.float32,
shape = (batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
weights = tf.Variable(
tf.random.truncated_normal([num_features, num_labels]))
biases = tf.Variable(tf.zeros([num_labels]))
logits = tf.matmul(tf_train_dataset, weights) + biases
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
labels = tf_train_labels, logits = logits))
optimizer = tf1.train.GradientDescentOptimizer(
learning_rate).minimize(loss)
train_prediction = tf.nn.softmax(logits)
tf_valid_dataset = tf.cast(tf_valid_dataset, tf.float32)
valid_prediction = tf.nn.softmax(
tf.matmul(tf_valid_dataset, weights) + biases)
|
Running the Computation Graph
Since we have already built the computation graph, now it’s time to run it through a session.
Python3
def accuracy(predictions, labels):
correctly_predicted = np. sum (
np.argmax(predictions, 1 ) = = np.argmax(labels, 1 ))
acc = ( 100.0 * correctly_predicted) / predictions.shape[ 0 ]
return acc
|
We will use the above utility function to calculate the accuracy of the model as the training goes on.
Python3
with tf1.Session(graph = graph) as session:
tf1.global_variables_initializer().run()
print ( "Initialized" )
for step in range (num_steps):
offset = np.random.randint( 0 , train_labels.shape[ 0 ] - batch_size - 1 )
batch_data = train_dataset[offset:(offset + batch_size), :]
batch_labels = train_labels[offset:(offset + batch_size), :]
feed_dict = {tf_train_dataset: batch_data,
tf_train_labels: batch_labels}
_, l, predictions = session.run([optimizer, loss, train_prediction],
feed_dict = feed_dict)
if (step % 500 = = 0 ):
print ( "Minibatch loss at step {0}: {1}" . format (step, l))
print ( "Minibatch accuracy: {:.1f}%" . format (
accuracy(predictions, batch_labels)))
print ( "Validation accuracy: {:.1f}%" . format (
accuracy(valid_prediction. eval (), valid_labels)))
|
Output:
Initialized
Minibatch loss at step 0: 3185.3974609375
Minibatch accuracy: 7.0%
Validation accuracy: 21.1%
Minibatch loss at step 500: 619.6030883789062
Minibatch accuracy: 86.7%
Validation accuracy: 89.0%
Minibatch loss at step 1000: 247.22283935546875
Minibatch accuracy: 93.8%
Validation accuracy: 85.7%
Minibatch loss at step 1500: 2945.78662109375
Minibatch accuracy: 78.9%
Validation accuracy: 83.6%
Minibatch loss at step 2000: 337.13922119140625
Minibatch accuracy: 94.5%
Validation accuracy: 89.0%
Minibatch loss at step 2500: 409.4652404785156
Minibatch accuracy: 89.8%
Validation accuracy: 90.6%
Minibatch loss at step 3000: 1077.618408203125
Minibatch accuracy: 84.4%
Validation accuracy: 90.3%
Minibatch loss at step 3500: 986.0247802734375
Minibatch accuracy: 80.5%
Validation accuracy: 85.9%
Minibatch loss at step 4000: 467.134521484375
Minibatch accuracy: 89.8%
Validation accuracy: 85.1%
Minibatch loss at step 4500: 1007.259033203125
Minibatch accuracy: 87.5%
Validation accuracy: 87.5%
Minibatch loss at step 5000: 342.13690185546875
Minibatch accuracy: 94.5%
Validation accuracy: 89.6%
Some important points to note:
- In every iteration, a minibatch is selected by choosing a random offset value using np.random.randint method.
- To feed the placeholders tf_train_dataset and tf_train_label, we create a feed_dict like this:
feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
Although many of the functionalities we have implemented from scratch here are provided automatically if one uses TensorFlow. But they have been implemented from scratch to get a better intuition of the mathematical formulas which are used in the Softmax Regression Classifier.
Similar Reads
Softmax Regression using TensorFlow
This article discusses the basics of Softmax Regression and its implementation in Python using the TensorFlow library. Softmax regression Softmax regression (or multinomial logistic regression) is a generalization of logistic regression to the case where we want to handle multiple classes in the tar
6 min read
One Hot Encoding using Tensorflow
In this post, we will be seeing how to initialize a vector in TensorFlow with all zeros or ones. The function you will be calling is tf.ones(). To initialize with zeros you could use tf.zeros() instead. These functions take in a shape and return an array full of zeros and ones accordingly. Code: imp
2 min read
Skin Cancer Detection using TensorFlow
In this article, we will learn how to implement a Skin Cancer Detection model using Tensorflow. We will use a dataset that contains images for the two categories that are malignant or benign. We will use the transfer learning technique to achieve better results in less amount of training. We will us
5 min read
Image Recognition using TensorFlow
In this article, we'll create an image recognition model using TensorFlow and Keras. TensorFlow is a robust deep learning framework, and Keras is a high-level API(Application Programming Interface) that provides a modular, easy-to-use, and organized interface to solve real-life deep learning problem
6 min read
Solving Linear Regression without using Sklearn and TensorFlow
In this article, we will see how can we implement a Linear Regression class on our own without using any of the sklearn or the Tensorflow API pre-implemented functions which are highly optimized for such tasks. But then why we are implementing these functions on our own? The answer to this is very s
3 min read
Integrating Numba with Tensorflow
TensorFlow is a widely-used open-source library for machine learning and deep learning applications, while Numba is a just-in-time (JIT) compiler that translates a subset of Python and NumPy code into fast machine code. Combining these two powerful tools can potentially enhance computational efficie
5 min read
Training Loop in TensorFlow
Training neural networks is at the core of machine learning, and understanding how to write a training loop from scratch is fundamental for any deep learning practitioner and TensorFlow provides powerful tools for building and training neural networks efficiently. In this article, we will get into t
7 min read
Classification on Imbalanced data using Tensorflow
In the modern days of machine learning, imbalanced datasets are like a curse that degrades the overall model performance in classification tasks. In this article, we will implement a Deep learning model using TensorFlow for classification on a highly imbalanced dataset. Classification on Imbalanced
7 min read
Save and Load Models using TensorFlow in Json?
If you are looking to explore Machine Learning with TensorFlow, you are at the right place. This comprehensive article explains how to save and load the models in TensorFlow along with its brief overview. If you read this article till the end, you will not need to look for further guides on how to s
6 min read
Implementing Neural Networks Using TensorFlow
Deep learning has been on the rise in this decade and its applications are so wide-ranging and amazing that it's almost hard to believe that it's been only a few years in its advancements. And at the core of deep learning lies a basic "unit" that governs its architecture, yes, It's neural networks.
8 min read
Architecture of TensorFlow
Prerequisite: Introduction to TensorFlow TensorFlow is an end-to-end open-source platform for machine learning developed by Google with many enthusiastic open-source contributors. TensorFlow is scalable and flexible to run on data centers as well as mobile phones. It can run on single-machine as wel
6 min read
Sparse tensors in Tensorflow
Imagine you are working with a massive dataset which is represented by multi-dimensional arrays called tensors. In simple terms, tensors are the building blocks of mathematical operations on the data. However, sometimes, tensors can have majority of values as zero. Such a tensor with a lot of zero v
10 min read
Load Images in Tensorflow - Python
In this article, we are going to see how to load images in TensorFlow in Python. Loading Images in Tensorflow For loading Images Using Tenserflow, we use tf.keras.utils.load_img function, which loads the image from a particular provided path in PIL Format. PIL is a Python Imaging Library that gives
3 min read
Gradient Descent Optimization in Tensorflow
Gradient descent is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function. In other words, gradient descent is an iterative algorithm that helps to find the optimal solution to a given problem. In this blog, we will discuss gr
15+ min read
How to Create Custom Model For Android Using TensorFlow?
Tensorflow is an open-source library for machine learning. In android, we have limited computing power as well as resources. So we are using TensorFlow light which is specifically designed to operate on devices with limited power. In this post, we going to see a classification example called the iri
5 min read
tf.GradientTape in TensorFlow
TensorFlow is an open-source library for data science and machine learning. It provides various tools and APIs for building, training, and deploying models. One of the core features of TensorFlow is automatic differentiation (autodiff). Autodiff is the process of computing the gradients of a functio
9 min read
Python - tensorflow.gradients()
TensorFlow is open-source Python library designed by Google to develop Machine Learning models and deep learning neural networks. gradients() is used to get symbolic derivatives of sum of ys w.r.t. x in xs. It doesn't work when eager execution is enabled. Syntax: tensorflow.gradients( ys, xs, grad_y
2 min read
Install Tensorflow on MacOS
TensorFlow is an open-source software library developed by the Google brain team. It widely used to implement deep learning models which helps in solving real world problems. In this article, we learn how to install TensorFlow on macOS using Homebrew. Requirements: Python 3.6–3.8macOS 10.12.6 (Sierr
2 min read
Introduction to Tensor with Tensorflow
TensorFlow is an open-source software library for dataflow programming across a range of tasks. It is a symbolic math library, and also used for machine learning applications such as neural networks. Google open-sourced TensorFlow in November 2015. Since then, TensorFlow has become the most starred
8 min read
Install Tensorflow on Linux
In this article, we are going to see how to install TensorFlow in Linux. It is a completely open-source library for numerical computation using data flow graphs. System requirement:Python 3.6 to 3.8.Pip 19.0 or higher.Ubuntu 16.04 or higher.Step-wise installation: Step 1: Create a virtual environmen
1 min read