Have you ever wondered about how object detection helps in building self-driving cars, or how facial recognition works on social media, or how disease detection is done using visual imagery in healthcare?

Sounds too much interesting when we hear about these things. Thanks to convolutional neural networks (CNN), It’s all possible by using convolutional neural networks(CNN). A convolutional neural network is a feed-forward neural network that is generally used to analyze visual images by processing data with grid-like topology. and also used to detect and classify objects in an image.

In this article, we are going to learn about convolutional neural networks. here are topics for the article

## What is Convolutional Neural Network?

In Deep Learning, a Convolutional Neural Network(CNN) is a class of deep neural networks. It’s also known as a ConvNet, most commonly applied to analyzing visual imagery. CNN are state of the art models for Image Classification, Segmentation, Object Detection and many other image processing tasks. ConvNet algorithm can take in an input image, assign learnable weights and biases to various aspects in the image and be able to differentiate one from the other. The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. The below figure is showing the flow of CNN to process an input image and classifies the objects based on values.

Technically, deep learning CNN models to train and test, each input image will pass it through a series of convolution layers with filters (Kernals), Pooling, fully connected layers (FC) and apply Softmax function to classify an object with probabilistic values between 0 and 1.

## How does CNN recognize images?

In CNN, every image is represented in the form of an array of pixel values. let’s consider the below example which is showing the representation of image of digit 8.

here is an example which show how CNN recognizes an image

## Layers of Convolutional Neural Network

CNN has multiple hidden layers that help in extracting information from an image.

The four important layers are mentioned below:

- Convolution layer
- ReLU layer
- Pooling layer
- Fully connected layer

Let’s start little more detailed discussion about each and every layer one by one:

##### Convolution Layer:

Convolution is the first layer or we can say that first step to extract features from an input image. Convolution preserves the relationship between pixels by learning image features using small squares of input data. It is a mathematical operation that takes two inputs such as image matrix and a filter or kernel. A convolution layer has several filters that perform the convolution operation. Every image is considered as a matrix of pixel values.

Consider the following 5×5 image whose pixel values are either 0 or 1. There’s also a filter matrix with a dimension of 3×3. Slide the filter matrix over the image and compute the dot product to get the convolved feature matrix.

In this, the convolution of 5 x 5 image matrix multiplies with 3 x 3 filter matrix which is called **“Feature Map”. **Convolution of an image with different filters can perform operations such as edge detection, blur and sharpen by applying filters.

##### ReLU layer:

ReLU stands for the rectified linear unit. Once the feature maps are extracted, the next step is to move them to a ReLU layer.

ReLU performs an element-wise operation and sets all the negative pixels to 0. It introduces non-linearity to the network, and the generated output is a **rectified feature map**. Below image is showing the ReLU operation:

There are other non linear functions such as tanh or sigmoid that can also be used instead of ReLU but most of using ReLU because performance wise ReLU is better than the other two.

**Pooling Layer**:

Pooling layers section would reduce the number of parameters when the images are too large. pooling is also called subsampling or down-sampling which reduces the dimensionality of each map but retains important information. pooling can be of different types:

- Max Pooling :- Max pooling takes the largest element from the rectified feature map.
- Average Pooling :- Taking the largest element could also take the average pooling.
- Sum Pooling :- Sum of all elements in the feature map call as sum pooling.

The pooling layer uses various filters to identify different parts of the image like edges, corners, body, feathers, eyes, and beak.

**Fully Connected Layer**:

We flattened our matrix into vector and feed it as input to the fully connected layer like a neural network.

In the diagram, the feature map matrix will be converted as vector(x1, x2, x3, …). With the fully connected layers, we combined these features together to create a model.

Here’s how the structure of the convolution neural network looks now:

Hope this will helpful and informative too.

You must be logged in to post a comment.