Everything GPT-2: 3. Tools

Open your toolbox, fill it up…

This article is part of a series on GPT-2. It’s best if you start in the beginning. The links are located at the bottom of the page.

In this article, I will talk about the resources you will use and direct you to tutorials. Some of the tutorials are built by others, some I built for this article series.

Initial Setup:

  1. Unless you have 15 gigabytes of storage free on your google drive, you should set up another one to prevent storage issues.
  2. Go to this google drive folder. Download the folder “Everything GPT-2 Resources”. It is about 12 gigabytes. Unzip it and reupload it to you google drive folder. For all the tutorials to run without having to change the file paths you need to upload it to the “My Drive” folder.


Google Colab NotebooksA Colab Notebook is a custom Python Jupyter Notebook that you can run using Google’s servers for free. In a Jupyter Notebook, the user can run code and display rich text in the same document with each neatly divided into blocks. Furthermore, by the use of special symbols, the user can access the underlying operating system, Linux in this case, directly. If you are not familiar with Colab, then complete the Google tutorials in the GPT-2 Everything Resources Folder and this tutorial I made to fill in some holes. While you can run Colab Notebooks for free, to experiment with GPT-2 effectively you will need to sign up for Google Colab Pro. This is because GPT-2’s generated text only becomes impressive once you reach the 774 M variant and, training it consumes a lot of memory. It is well worth the 10 dollars a month for access to 30+ gigabytes of RAM.

Command Line (Command Line Interface)(Also “Shell” sometimes) — If you have not done much (or anything) with the command line, then look at my my tutorial. I have found most command line tutorials do a good job of explaining the mechanics (how to do something), but a poor job of explaining why you would do something in the command line or what the command line actually is. My tutorial provides a lot more of the rationale for use; specific mechanics are explained as they come up throughout the article series.

W3 Schools Python Reference — The best reference to get an overview before reading official Python Documentation. It is very concise. If you are not fully comfortable with Class and Inheritance concepts, then go through those sections. You will need to have those concepts down pat to work with Hugging Face’s Transformers package. Caveat Emptor, W3 Schools has a bad wrap in the Javascript world because they offer paid certifications that are basically useless, but I have found their Python documentation to be error free and provide a good baseline. If you are uncomfortable with that, then RealPython is the next best thing, though their explanations can be a bit long.

Pytorch — If you have not touched Pytorch prior to this, then complete their 60 Minute Blitz (realistically 4 hours). A quick note about Pytorch vs Tensorflow: generally speaking, there are a small number of differences which are disappearing over time. For this particular project, Pytorch is best at this point because Transformers and ONNX have better support for Pytorch.

Transformers Python Package by Hugging Face — This the best python library for experimenting with transformers. You do not even have to know Pytorch or TensorFlow to use or train a model. Start with the beginning of the documentation and read to “fine tuning with custom data sets”. If you are confused about something, then create a Colab notebook and run their sample code.

Open Neural Net Exchange (ONNX) — ONNX and ONNX Runtime offer the best optimizations and support quantization. Go to the website and take a look around. Their documentation and tutorials are very lacking. I will provide you better ones when we get to that point in the series.

Blogs Containing Others’ Experiences:
Max Woolf
Jeff Shek

Honorable Mention Software:
Max Woolf’s Package GPT-2 Simple — This package’s functionality is superseded by Hugging Face’s Transformers package in all but one way, it uses less memory to train GPT-2. We will make use of this to train the 774 M variant of GPT-2 using Google Colab.

Additional Tool’s Required for Production:

Desktop or Laptop Computer — The best way to take a machine learning algorithm to production short of writing a microservice, which is very involved, is using Cortex. Cortex requires Docker, but you are not able to install docker from a Colab Notebook, so you need a computer.

Docker — Docker it used to send software to another computer to run. The software and all the dependencies are packaged up and then sent as a whole unit for deployment in a docker container on another machine. It is required for Cortex, but you won’t ever have to deal with it directly.

Cortex — Cortex is an open source alternative to AWS Sagemaker. It is much cheaper (roughly 60 percent of the cost), and it does a great job of abstracting away much of the difficulty of deploying something computationally intensive at scale.

AWS — Cortex will take care of a lot of complexities for example configuring Kubernetes, and setting up logs, but it will do this through AWS. You will need an AWS account. Don’t worry. All of the use for this tutorial should fall under AWS’ free tier. I will use AWS S3 to store the model files and download them to the API, AWS CloudWatch to view logs, and an AWS EC2 instance to generate the text.

Articles in the series:
Everything GPT-2: 0. Intro
Everything GPT-2: 1. Architecture Overview
Everything GPT-2: 2. Architecture In-Depth
Everything GPT-2: 3. Tools
Everything GPT-2: 4. Data Preparation
Everything GPT-2: 5. Fine-Tuning
Everything GPT-2: 6. Optimizations
Everything GPT-2: 7. Production

All resources for articles in the series are centralized in this google drive folder.

(Aside) All work and no play, make Jimmy a dull boy! What to play with a nice implementation of GPT-2? Check out writeup.ai! Our final product will be an API that does all the text generation, but a website to connect it to is not provided so don’t get too excited.

(shameless self-promotion) Have you found this valuable?
If no, then tweet at me with some feedback.
If yes, then send me a pence for my thoughts.

Mathematician, enjoys his knowledge distilled. Find my insight deep, my jokes laughable, my resources useful, connect with me on twitter @Rowlando_13