The term “neural network” is a trending word in the tech industry. In reality, they’re much simpler than people imagine. Artificial Neural Network (ANN) is a framework in analogous with the structure of biological neural networks and the way it processes information in the human brain. It enables computers to learn from observational data like images, audios, text, labels, strings or numbers. They try to model some unknown function that maps these data to numbers or classes by recognizing patterns in it. Before diving deep, let us understand how a neuron works.
A Neuron or Activation Unit
Neuron is the basic computational unit of a neural network which is also known as an activation unit. It receives input from other nodes or external sources and applies a function to it. Each input has a weight associated with it based on the importance of input compared with other inputs. Activation unit applies a function to these weighted sum. The commonly used activation functions are Sigmoid, tanh and ReLU.
In the above image X denotes inputs, W denotes weights associated with it, and b is the bias term. Weights are those numbers which we multiply with inputs before passing it to the activation function. The activation function applies a function over it and produces the output(Y). Every activation function takes a single input and performs a certain fixed mathematical operation on it. Some of the commonly used activation functions are:
The Sigmoid Function is a S-shaped curve. A common application of this unit is in the output probability prediction models. Since probability of anything exists only between the range of 0 and 1, Sigmoid can be a right choice
σ(x) = 1 / (1 + exp(−x))
Another activation function is tanh. It takes a real-valued input and squashes it in the range [-1, 1]. The function is monotonic while its derivative is not.
tanh(x) = 2σ(2x) − 1
ReLU stands for Rectified Linear activation Unit. The most commonly used activation function nowadays are ReLU units. It can overcome vanishing gradient problem which can occur while using sigmoid and tanh units.
f(x) = max(0, x)
A Leaky ReLU is same as normal ReLU, except that instead of being 0 for x<0, it has a small negative slope for that region.
Feed Forward Neural Network
Deep Feed forward networks also known as Multi layer Perceptrons mark the foundation of most deep learning models. Networks like Convolutional Neural Networks (CNN) and Recurrent Neural Network (RNN) are simply some special cases or advancements in feed forward networks. Basically there are 3 different layers in a neural network:
Input layer is the bottom most layer that is directly visible. It receives the input and passes to the next hidden layer.
The hidden layer is the collection of neurons which has activation function applied on it and is an intermediate layer found between the input and output layers. Its job is to process the inputs obtained from its previous layer.
Output nodes are collectively referred to as the “Output Layer” and are responsible for computation and transferring of information from the network to the outside world.
The above figure shows the structure of a simple multi layer perceptron. It consist of one input layer, 3 hidden layers and one output layer. Here fij represents activation function in which i represents the layer number and j represent number from the top. Similarly Wij-n represent weights from ith neuron in the nth layer to jth neuron in the (n+1)th layer and Oij represents output of activation function fij. Here we have four features and in the i th iteration Xi1, Xi2, Xi3, Xi4 represents the input.
How a Multi Layer Perceptron works?
There are basically 3 steps in training a model.
- Forward propagation.
- Loss Calculation
- Backward propagation.
In this step of training, we pass the input from input layer and it gets multiplied by corresponding weights in each layer and gets added with bias. Here in our example input Xi1 gets multiplied by W11-1 while passing to the first neuron of layer 1, and with W12-1 while passing to the second neuron of layer 1 and so on.
After the input gets multiplied with weights and forwardly propagated through activation functions loss estimation is to done. Usually in case of regression problems we use squared loss and in case of classification problems we use logistic loss.
This is an important step in the working of neural networks with which they update their weights. Prior to the forward propagation, network initialize the weights randomly. Then after computing loss it will update the weights starting from the end by following formula
One can calculate gradient (partial derivative) based on this chain rule. We will explain how to calculate some of the gradients from the above figure. In layer three the gradients will be as follows:
Using the gradients we will update the weights for layer 3. Now for layer 2 calculate the gradients using chain rule as follows.
Similar to layer 3 we will also update weights in layer 2. This process continues till the weights in the layer 1 are updated. These steps altogether is known as back propagation. Backward propagation works only for activation functions that are differential in nature. If the functions can be easily differentiated, the rate of backward propagation will be also high. If we pass one point at a time through the network, it will take a lot of time. So the solution is to pass a batch of points(known as Stochastic Gradient Descent).
Multi Layer Perceptrons is the simplest type of neural network. Its advanced versions like Convolutional Neural Network(CNN) and Recurrent Neural Network(RNN) is yet to cover. We will be also covering some of the main challenges of the artificial neural network like vanishing gradient descent and exploding gradient descent in the upcoming discussions. Until then Happy Machine Learning!