Transfer Learning

7 min read

As we know pre-trained models are used as the starting point on computer vision and natural language processing tasks given the vast compute and time resources required to develop neural network models on these problems. Deep convolutional neural network models may take days or even weeks to train on very large datasets.

A way to short-cut this process, we use a popular approach Transfer learning where a model developed for a task is reused as the starting point for a model on a second task. Top performing models can be downloaded and used directly, or integrated into a new model for your own problems.

In this article, we are going to learn concept of transfer learning and how we can use it to speed up our training task and improve the performance of our model.

So, let’s start with knowing table of content:

What Is Transfer Learning?

As we know Human learners appear to have inherent ways to transfer knowledge between tasks. That is, when we encounter new tasks we apply relevant knowledge from previous learning experience. Transfer learning works in same manner.

In transfer learning, the knowledge of an already trained model is applied to a different but related problem. For example, we may learn about one set of visual categories, such as cats and dogs, in the first setting, then learn about a different set of visual categories, such as ants and wasps, in the second setting. instead of starting the learning process from scratch, we start with patterns learned from solving a related task.

The weights in re-used layers may be used as the starting point for the training process and adapted in response to the new problem. This usage treats transfer learning as a type of weight initialization scheme. This may be useful when the first related problem has a lot more labeled data than the problem of interest and the similarity in the structure of the problem may be useful in both contexts. we try to transfer as much knowledge as possible from the previous task the model was trained on to the new task at hand.

Transfer learning is mostly used in computer vision and natural language processing tasks like sentiment analysis due to the huge amount of computational power required.

Why it is used?

By now we understand the concept of transfer learning but question is why it is used? And what benefits it has?

So, let’s understand reasons for use it and benefits of its.

Transfer learning has several benefits, but the main advantages are  saving training time, better performance of neural networks and not needing a lot of data. usually, a lot of data is needed to train a neural network from scratch but access to that data isn’t always available this is where transfer learning comes in handy.

With transfer learning a model can be built with comparatively little training data because the model is already pre-trained. This is especially valuable in natural language processing because mostly expert knowledge is required to create large labeled datasets. Additionally, training time is reduced because it can sometimes take days or even weeks to train a deep neural network from scratch on a complex task.

How does it work?

Neural networks usually try to detect edges in the earlier layers, shapes in the middle layer and some task-specific features in the later layers. In transfer learning, the early and middle layers are used and we only retrain the latter layers. It helps leverage the labeled data of the task it was initially trained on.

Example:- a model trained for recognizing a backpack on an image, which will be used to identify sunglasses. In the earlier layers, the model has learned to recognize objects, because of that we will only retrain the latter layers so it will learn what separates sunglasses from other objects.

How to Use Pre-Trained Models?

We can summarize some of these usage patterns as follows:

Classifier: The pre-trained model is used directly to classify images.

Standalone Feature Extractor: The pre-trained model, or some portion of the model, is used to pre-process images and extract relevant features.

Integrated Feature Extractor: The pre-trained model, or some portion of the model, is integrated into a new model, but layers of the pre-trained model are frozen during training.

Weight Initialization: The pre-trained model, or some portion of the model, is integrated into a new model, and the layers of the pre-trained model are trained in concert with the new model.

Approaches to transfer learning


To better understand the approach, consider this example: Imagine you want to solve task 1 but don’t have enough data to train a deep neural network. One way around this is to find a related task 2 with an abundance of data. Train the deep neural network on task 2 and use the model as a starting point for solving task 1. Whether you’ll need to use the whole model or only a few layers, depends heavily on the problem you’re trying to solve.

If you have the same input in both tasks, possibly reusing the model and making predictions for your new input is an option.


In the second approach, we use an already pre-trained model. If we do a little research, there are a lot of these models out there. how many layers to reuse and how many to retrain depends on the problem. This type of transfer learning is most commonly used throughout deep learning.

for example: Keras, provides nine pre-trained models that can be used for transfer learning, prediction, feature extraction and fine-tuning. You can find these models and also There are many research institutions that release trained models.


This approach is to use deep learning to discover the best representation of your problem and this approach is also known as representation learning, which means finding the most important features and can often result in a much better performance than can be obtained with hand-designed representation.

A representation learning algorithm can discover a good combination of features within a very short timeframe, even for complex tasks which would otherwise require a lot of human effort. Simply use the first layers to spot the right representation of features, but  don’t use the output of the network because it is too task-specific. Instead, feed data into your network and use one of the intermediate layers as the output layer. This layer can then be interpreted as a representation of the raw data.

This approach is mostly used in computer vision because it can reduce the size of your dataset, which decreases computation time and makes it more suitable for traditional algorithms, as well.

Models for Transfer Learning

There are perhaps a dozen or more top-performing models that can be downloaded and used as the basis for related computer vision tasks.

Perhaps three of the more popular models are as follows:

VGG (e.g. VGG16 or VGG19).

GoogLeNet (e.g. InceptionV3).

Residual Network (e.g. ResNet50).

These models are widely used for transfer learning not because of their performance, but also they were examples that introduced specific architectural innovations, namely consistent and repeating structures (VGG), inception modules (GoogLeNet), and residual modules (ResNet). the Inception-v3 was trained for the ImageNet “Large Visual Recognition Challenge.” In this challenge, participants had to classify images into 1,000 classes like “zebra,” “Dalmatian” and “dishwasher.”

Keras provides access to a number of top-performing pre-trained models.

Microsoft also offers some pre-trained models, available for both R and Python development, through the MicrosoftML R package and the Microsoftml Python package.

Practical implementation of Transfer Learning will see in next post.

Choose your Reaction!
Leave a Comment