Image for post
Image for post

GPT-2 is good, now you will it make it fast.

At this point you have a finetuned 774M variant of GPT-2, but it has some problems. It is formatted for Tensorflow 1.x which has been deprecated and generating text with it is slow.

To fix the formatting problems, we will use a script from Huggingface. …


Image for post
Image for post

Specialize GPT-2 for enhanced performance on any text

What is fine-tuning?
GPT-2 was trained on 40 gigabytes of text ranging across many subjects. It is very good at generating text, but it can be improved by training it on text specific to its application. This process is called transfer learning.

The best way to go through this article is interactively:

  1. Finetune with GPT-2 Simple — We will use this train the 774M variant because this package is a little more efficient with memory.
  2. Fine tune with Transformers’ Trainer utility — The tutorial is provided because it is a better way to train GPT-2 if you can get access to enough memory. The Trainer utility is faster and is already converted to Pytorch 1.x. I tried running the 774M variant with the 16 gigbytes of memory and I still went out of memory. …

Image for post
Image for post

If you think your data is clean, you haven’t looked at it hard enough.

In the next tutorial, you will fine tune (train) GPT-2 on any topic that you want with a single large text file or folder containing a lot of text files. In the example, I will work with a large selection of Pulitzer Prize winning novels. You can select any text you would like as long as there is a lot of it and the text is very clean. I would encourage you to make your own text file(s) in an area that interests you.

Why should you use a lot of text?
Overfitting is more likely with GPT-2 since the model is so large (774M variant is ~6 gb), though it is less of problem than in normal modeling situations. Generally, if a model overfits, then it won’t work well in the real world. If GPT-2 overfits, it will memorize sections or all of the training text. The more it overfits, the more it will memorize. If your model memorizes a bit of text, then you model is not ruined, it just does not generate as much original text. For some examples, see Gwern’s blog. …


Image for post
Image for post

Open your toolbox, fill it up…

In this article, I will talk about the resources we will use and direct you to tutorials. Additionally, I will provide some tutorials to cover topics the built in tutorials don’t cover.

Initial Setup:

  1. Unless you have 15 gigabytes of storage free on your google drive, you should set up another one to prevent storage issues.
  2. Go to this google drive folder. Download the folder “Everything GPT-2 Resources”. …

Feel the burn …

The existing resources for GPT-2’s architecture are very good, but are written for researchers so I will provide you will a tailored concept map for all the areas you will need to know prior to jumping in.

Areas that the reader should already know, i.e. areas I won’t specify the resource for:

  1. Linear Algebra — specifically matrix multiplication, vectors, and projections from one space onto another space with fewer dimensions
  2. Statistics — specifically probability distributions
  3. General Machine Learning Concepts — specifically supervised learning and unsupervised learning
  4. Neural Nets — general information about how each part works
  5. Training Neural Nets — general information about how training works specifically gradient descent, optimizers, back propagation, and updating weights. …


Image for post
Image for post

Prepare for brain melt in 3, 2, 1 …

All joking aside, this and the next article will be the most intellectually demanding articles in the series, but don’t be discouraged. Learning on a concept this complex is an iterative process. This article is intended to inform your intuition rather than going through every point in depth.

Quick notes on the term GPT-2: In different resources, depending how technical the resource is, GPT-2 refers to different things. Some will refer to it as the whole thing that takes in some words and gives you some more words. For this article, I will refer to GPT-2 as the parts that take in some words and generate a single word piece. My reasoning will become more clear as you learn more, i.e. trust me and read on. …


Image for post
Image for post

Why GPT-2?

What is Generative Pre-trained Transformer (GPT)-2?
GPT-2 is a state-of-the-art machine learning architecture released by OpenAI in 2019.

OpenAI just released GPT-3 as an API for a closed beta and has committed to public release by the end of 2020. …

About

Edward Girling

Mathematician, enjoys his knowledge distilled. Find my insight deep, my jokes laughable, my resources useful, connect with me on twitter @Rowlando_13

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store