Types of Activation Functions in Deep Learning explained with Keras

Tripathi Aditya Prakash
Dev Genius
Published in
4 min readSep 29, 2022

--

Activation does it means activating your car with a click ( if it has that ,of course) , well the same concept but in terms of neurons , neuron as in human brain ? , again close enough, neuron but in Artificial Neural Network (ANN).

The activation function decides whether a neuron should be activated or not.

A biological neuron in the human brain

If you have seen an ANN, which I sincerely hope you do you have seen they are linear in nature, so to use non — linearity in them we use activation functions and generate output from input values fed into the network.

A sample ANN network

Activation functions can be divided into three types

  1. Linear Activation Function
  2. Binary Step Activation Function
  3. Non — linear Activation Functions

Linear Activation Function

It is proportional to the output values, it just adds the weighted total to the output. It ranges from (-∞ to ∞).

Graph showing linear activation function

Mathematically, the same can be written as

Equation of Linear Activation

Implementation of the same in Keras is shown below,

Linear activation function in Keras

Binary Step Activation Function

It has a specific threshold value that determines whether a neuron should be activated or not.

Binary Step Activation Function Graph

Mathematically, this is the equation of the function

Equation of Binary Step Activation Function

Implementation of the same is not present in Keras so a custom function is made using TensorFlow as follows

Custom Binary Step Activation Function in TensorFlow

Non — Linear Activation Functions

It allows ANN to adapt according to a variety of data and differentiate between the outputs. It allows the stacking of multiple layers since the output is a combination of input passed through multiple layers of the neural network.

Various non — linear activation functions are discussed below

Sigmoid Activation Function

This function accepts the input (number) and returns a number between 0 and 1. It is mainly used in binary classification as the output ranges between 0 and 1 e.g. you train a dog and cat classifier , regardless of how furry that dog is it classifies it as a dog not cat , there is no between , sigmoid is perfect for it.

Graph of Sigmoid function

Mathematically, the equation looks like this

Sigmoid Function Equation

Implementation of the same in Keras is shown below,

Sigmoid Activation Function in Keras

TanH Activation Function

This activation function maps the value into the range [ -1 , 1 ]. The output is zero centered , it helps in mapping the negative input values into strongly negative and zero values to absolute zero.

Comparison of tanh with sigmoid

Mathematically, the equation looks like this

Equation of tanh

Implementation of the same in Keras is shown below

tanh Activation Function in Keras

ReLU ( Rectified Linear Unit)

It is one of the most commonly used activation functions, it solved the problem of vanishing gradient as the maximum value of the function is one. The range of ReLU is [ 0 , ∞ ].

Graph comparing Sigmoid and ReLU

Mathematically, the equation looks like this

Equation of ReLU

Implementation of the same in Keras is shown below,

ReLU implementation in Keras

Leaky ReLU

Upgraded version of ReLU like Covid variants .. sensitive topic …ok fine .. getting back to Leaky ReLU , it is upgraded as it solves the dying ReLU problem , as it has small positive slope in negative area.

Comparison of ReLU (left) and Leaky ReLU (right)

Mathematically, the equation looks like this

Implementation in Keras is coming right below

Leaky ReLU in Keras

SoftMax Activation Function

Its a combination of lets guess .. is it tanh , hmm not quite , ReLU ? no or its leaky counterpart .. mhh not quite …. ok lets reveal it .. it is a combination of many sigmoid. It determines relative probability.

In Multiclass classification , it is the most commonly used for the last layer of the classifier. It gives the probability of the current class with respect to others.

Example of Softmax function

Mathematically, the equation looks like this

Equation of Softmax function

Implementation in Keras is given below

Softmax function in Keras

The whole notebook containing all codes used above

If you wanna contact me , lets connect on LinkedIn link below

https://www.linkedin.com/in/tripathiadityaprakash/

--

--