RNN(Recurrent Neural Network)

Before going to start the article, I would like to ask a question to all of you :- If I say “working with love I Freshlybuilt” did this make any sense to you? Not at all, now read this :- ” I love working with Freshlybuilt” this made perfect sense. we can see here A little jumble in the words made the sentence incoherent. let’s just think, if human brain was confused on what it meant, so how can we expect a neural network to make sense out of it?

Actually there are multiple such tasks in everyday life which get completely disrupted when their sequence is disturbed. And also there are multiple such cases wherein the sequence of information determines the event itself. For instance, a time series data – where time defines the occurrence of events. If we are trying to use such data for any reasonable output, we need a network which has access to some prior knowledge about the data to completely understand it. Recurrent neural networks thus come into play.

What are Recurrent Neural Networks?

Traditional neural networks will process an input and move onto the next one disregarding its sequence. Recurrent Neural Networks are a class of Artificial Neural Networks that can process a sequence of inputs and retain its state while processing the next sequence of inputs. Data such as time series have a sequential order that needs to be followed in order to understand. In Illustration, we see that the neural network (hidden state) takes an xt and outputs a value ht. The loop shows how the information is being passed from one step to the next. The inputs are the individual words and each one is passed on to the network F in the same order it is written (i.e. sequentially).

The RNN uses an architecture that is not dissimilar to the traditional NN. The difference is that the RNN introduces the concept of memory, and it exists in the form of a different type of link. This addition allows for the analysis of sequential data — music, text or voice, which is something that the traditional NN is incapable of. Also, traditional NNs are limited to a fixed-length input, whereas the RNN has no such restriction.

An RNN will not require linearity or model order checking. It can automatically check the whole dataset to try and predict the next sequence. As demonstrated in the image below, a neural network consists of 3 hidden layers with equal weights, biases and activation functions and made to predict the output. These hidden layers can then be merged to create a single recurrent hidden layer. A recurrent neuron now stores all the previous step input and merges that information with the current step input. Types of RNN(Recurrent Neural Networks)

RNN come in different varieties that are also typically dependent on the task. The type of RNN is described by the number of inputs in relation to the number of outputs. the below image shows the types of RNNs. The four different types of RNNs listed below, let’s discuss one by one:

1. One-to-one, which is informally known as the Vanilla RNN. This variety has one input, such as a word or an image, and outputs a single token, such as a word or a Boolean value.
2. One-to-many, where one input is used to create several outputs.
3. Many-to-one, where several inputs are used to create a single output.
4. Many-to-many, where several inputs are analyzed to generate several outputs.

1. It can model non-linear temporal/sequential relationships.
2. No need to specify lags to predict the next value in comparison to and autoregressive process.

1. Not suited for predicting long horizons

Vanishing Gradient Problem :- As more layers containing activation functions are added, the gradient of the loss function approaches zero. The gradient descent algorithm finds the global minimum of the cost function of the network. Shallow networks shouldn’t be affected by a too small gradient but as the network gets bigger with more hidden layers it can cause the gradient to be too small for model training. Gradients of neural networks are found using the backpropagation algorithm whereby you find the derivatives of the network. Using the chain rule, derivatives of each layer are found by multiplying down the network. This is where the problem lies. Using an activation function like the sigmoid function, the gradient has a chance of decreasing as the number of hidden layers increase.

Other RNN Architectures

As we saw, RNNs suffer from vanishing gradient problems when we ask them to handle long term dependencies. They also become severely difficult to train as the number of parameters become extremely large. If we unroll the network, it becomes so huge that its convergence is a challenge. so we can use other mentioned RNNs architecture for overcome this problem:

• LSTMs :- U sually called as Long Short Term Memory networks. They were introduced by Hochreiter & Schmidhuber. they are a special kind of RNN, capable of learning long-term dependencies. They work tremendously well on a large variety of problems, and are now widely used. LSTMs also have this chain like structure, but the repeating module has a slightly different structure. Instead of having a single neural network layer, there are multiple layers, interacting in a very special way. They have an input gate, a forget gate and an output gate. We shall be coming up with detailed article on LSTMs soon.
• GRUs :- Another efficient RNN architecture is the Gated Recurrent Units i.e. the GRUs. They are a variant of LSTMs but are simpler in their structure and are easier to train. Their success is primarily due to the gating network signals that control how the present input and previous memory are used, to update the current activation and produce the current state. These gates have their own sets of weights that are adaptively updated in the learning phase. We have just two gates here, the reset an the update gate.

Applications of RNN

The beauty of recurrent neural networks lies in their diversity of application. When we are dealing with RNNs they have a great ability to deal with various input and output types. Recurrent Neural Networks can be used for a number of ways such as:

Image Captioning

In this, an image is automatically given a caption based on what is being shown. This can include complex actions, such as: “Fox jumping over dog”.
This task requires a one-to-many RNN. let’s say we have an image for which we need a textual description. So we have a single input – the image, and a series or sequence of words as output. Here the image might be of a fixed size, but the output is a description of varying lengths. here is example of image captioning: Sentiment Classification

This can be a task of simply classifying tweets into positive and negative sentiment. Depending on the complexity of the sentiment, this RNN may be of type many-to-one or many-to-many. If, for example, the sentiment is “positive” or “negative”, then there is only a single output.  So here the input would be a tweet of varying lengths, while output is of a fixed type and size. the below image showing the example the sentiment analysis. Language Translation

In this, the written text or spoken words of one language serves as input, and a different language representing the same text is output.
This is an example of a many-to-many RNN, where several words are analyzed as input, and the output is also several words in length. This basically means that we have some text in a particular language let’s say English, and we wish to translate it in French. Each language has it’s own semantics and would have varying lengths for the same sentence. So here the inputs as well as outputs are of varying lengths. Image classification

In this, an image is examined and a single determination is made, such as “Daytime picture” versus “Nighttime picture”. This is an example of a one-to-one mapping. in this we classify the images with each other and determine the category for image. there are also many other applications of RNNs like : Time series prediction, such as the forecasting of a stock price given a history of values. This task involves using a many-to-one RNN, where many previous stock prices are used to predict a single, future price. And Text classification and sentence completion. RNN is used broadly in text classification, outperforming other well known algorithms such as the Support Vector Machine (SVM). And Segmented handwriting recognition and speech recognition systems have also been successfully implemented using RNNs.

So RNNs can be used for mapping inputs to outputs of varying types, lengths and are fairly generalized in their application.

Hope this will informative and helpful.