GPT-2 is good, now you will it make it fast.
At this point you have a finetuned 774M variant of GPT-2, but it has some problems. It is formatted for Tensorflow 1.x which has been deprecated and generating text with it is slow.
To fix the formatting problems, we will use a script from Huggingface. …
Specialize GPT-2 for enhanced performance on any text
What is fine-tuning?
GPT-2 was trained on 40 gigabytes of text ranging across many subjects. It is very good at generating text, but it can be improved by training it on text specific to its application. This process is called transfer learning.
The best way to go through this article is interactively:
If you think your data is clean, you haven’t looked at it hard enough.
In the next tutorial, you will fine tune (train) GPT-2 on any topic that you want with a single large text file or folder containing a lot of text files. In the example, I will work with a large selection of Pulitzer Prize winning novels. You can select any text you would like as long as there is a lot of it and the text is very clean. I would encourage you to make your own text file(s) in an area that interests you.
Why should you use a lot of text?
Overfitting is more likely with GPT-2 since the model is so large (774M variant is ~6 gb), though it is less of problem than in normal modeling situations. Generally, if a model overfits, then it won’t work well in the real world. If GPT-2 overfits, it will memorize sections or all of the training text. The more it overfits, the more it will memorize. If your model memorizes a bit of text, then you model is not ruined, it just does not generate as much original text. For some examples, see Gwern’s blog. …
Open your toolbox, fill it up…
In this article, I will talk about the resources we will use and direct you to tutorials. Additionally, I will provide some tutorials to cover topics the built in tutorials don’t cover.
Feel the burn …
The existing resources for GPT-2’s architecture are very good, but are written for researchers so I will provide you will a tailored concept map for all the areas you will need to know prior to jumping in.
Areas that the reader should already know, i.e. areas I won’t specify the resource for:
Prepare for brain melt in 3, 2, 1 …
All joking aside, this and the next article will be the most intellectually demanding articles in the series, but don’t be discouraged. Learning on a concept this complex is an iterative process. This article is intended to inform your intuition rather than going through every point in depth.
Quick notes on the term GPT-2: In different resources, depending how technical the resource is, GPT-2 refers to different things. Some will refer to it as the whole thing that takes in some words and gives you some more words. For this article, I will refer to GPT-2 as the parts that take in some words and generate a single word piece. My reasoning will become more clear as you learn more, i.e. trust me and read on. …