Transfer Learning Introduction | Nishant Nagpal | Medium
Introduction
Training the deep neural network is very time consuming and computation heavy task. Achieving good training needs large training dataset and large dataset takes days and days to get trained. Then, how can we optimize the training process over a few iterations? The answer is Transfer Learning.
Models built for specific tasks can be reused for building models for similar tasks. For an example, if we have to build a model for image classification, we can re-use Google’s Inception Model as a starting point for building our own image classification model. In this way we can save a lot of money and time. Let’s understand the multiple approaches to re-use the source model for building new model using transfer learning.
Source Model
Let’s first understand how the trained deep neural networks hold the information inside the network. All the hidden layers in a neural network hold specific learning called features. The initial hidden layers hold very generic features of the data and as the layers progress the learning in the hidden layers gets more specific to the task. The trainiend model can be represented in two parts i.e. Network configuration and Weights.
Approaches for transfer learning from source model
There are 4 approaches to reuse the learnings from source model -
- Freeze the configuration and weights from source model
 Use when dataset for training is small and task is similar to the task of source model
 
 In this approach we keep the entire network configuration as it is. Output layer is re-trained with new weights on the new dataset. This method is applied when we have a small dataset to re-train on since the small dataset won’t be able to converge the network by updating the weight of all the hidden layers.
- Freeze the configuration from source model
 Use when dataset for training is sufficiently large and task is similar to the task of source model
 
 In this approach of transfer learning the only difference from the previous method is that we have sufficiently large new dataset to re-train on, which enables the network to converge by updating the weight from all the layers. In this method also the network configuration remains the same.
- Use the source model configuration as starting point and freeze the weights of hidden layers from source model
 Use when dataset for training is small and task needs additional features to learn than the task of source model
 
 In this method we keep the network configuration of the pre-trained model but add a few more hidden layers at the end. The weights associated with the hidden layers from the pre-trained model remain the same as we re-train only the additional layers. This method is also recommended when we have a small size new dataset to re-train on
- Use the configuration and weights from source model as starting point
 Use when dataset for training is sufficiently large and task needs additional features to learn than the task of source model
 
 In this method we reuse the source model just as the starting point for training on a new task. We append a few more hidden layers to the network of the source model. Since we have a large dataset to train on new tasks, we try to update the weights associated to each layer of the source model as well.
 Reference -
 https://www.youtube.com/watch?v=yofjFQddwHE

