Summary: Universal Language Model Fine Tuning

In early 2018, Jeremy Howard and Sebastian Ruder came up with one of the first methods to significantly improve performance of deep learning on nlp tasks by leveraging transfer learning Their findings were reported in their paper – Universal Language Model Fine-tuning for Text Classification (ULMFiT)

Table of Contents

The idea behind transfer learning is simple: [Peter Martigny, 2018]

Pre-paper era

Common Technologies

Limitations

The paper

Concept in brief

The paper proposes using an AWD-LSTM model for transfer learning from a LM [source] task to the classification [target] task in the following manner:

Experiments and Results

ULFiT obtained reductions in error rates over state-of-the-art as follows:

IMDb DBpedia Yelp-bi Yelp-full
22% 4.8% 18.2% 2.0%

Note: 10% of the training set was used and error rates with unidirectional LMs were reported. The classifier was fine-tuned for 50 epochs

After the paper

With the emergence of better language models, we will be able to further improve our model’s performance

The success achieved by ULMFiT has spurred an interest in applying transfer learning to NLP. In just the next few months after the paper, there were several frameworks proposed (especially fine-tuned language models). A few of the popular ones are: