Edward Girling

Sign in

The same thing goes by many names…

The Problem

Important things go by multiple names or even by codes. Think states in the US. Virginia might go by Virginia, VA, or the State of Virginia. You can’t write a regex to solve this because the two letter abbreviations are irregular. You could put a big dictionary for all the states and use pandas.Series.map, but the dictionary is really data. In an ideal world, it would be separated from your code.

The Solution

Store the States and their various names in a csv, read in the csv, transform the csv into a dictionary, and use…


Not uniform as expected but Benford’s…

This article series will cover more obscure, but nonetheless useful mathematics laws.

At the bottom of the page are links to the resources folder containing all code and data and to other articles in the series. To run the code in the Google Colab Notebooks, download the whole google drive folder (Mathematics Laws for Data Scientists), unzip it, and upload it to your google drive (so in Colab the path is ‘drive/my drive/Mathematics Laws for Data Scientists’). All data is included and will run with out any path alterations.

Benford’s Law

Benford’s law is also known…


Efficient, pythonic code nicely packaged for consumption.

Python and many of its packages have incredible official documentation, but it can be hard to get through it. Meet Python Quick Tips! For elegant, generalizable solutions to data science problems, see Python Deep Tips (forthcoming).

Why take my word on it?
Don’t. Explore for yourself. Each tip has links to documentation and each block of code runs in a Jupyter Notebook.

Python Resources

  1. W3 Schools. W3 Schools documentation and tutorials are very light. They are a great primer to get you mind around a topic before trying out the official documentation.
  2. Real Python. Real…

One idea, three ways of looking at in.

For learning, no one method works for all people. This is especially true for tough concepts. Things that come to mind are Python decorators, and Python regex.

Basic methods that help learning and/or retention always:

  1. Try, take a break, try again. Your brain is working on problems in the background. I have found that taking a break and exercising helps me a lot.
  2. Try, sleep on it, try again. When you sleep your brain reinforces pathways that have recently formed.
  3. The more senses you engage while learning the better. As such, each…


Deploy GPT-2 as an Autoscaling Web API

Prerequisites:

  1. AWS — Free account with administrative username setup for programmatic access. When you setup the account, you will create a root user. Use the root user to setup an administrator with programmatic access. Use this one for Cortex and your own use. Set up at least one empty S3 Storage bucket.
  2. Docker — Installed.
  3. Cortex — Installed per their website. Cortex basic tutorials completed. If you are running windows, then check out their Windows installation guide for running Cortex using WSL (Windows Subsystem for Linux) version 2. Note: If you install Ubuntu…

GPT-2 is good, now you will it make it fast.

At this point you have a finetuned 774M variant of GPT-2, but it has some problems. It is formatted for Tensorflow 1.x which has been deprecated and generating text with it is slow.

To fix the formatting problems, we will use a script from Huggingface. The result will be a finetuned model formatted for Pytorch 1.x (Pytorch started with 0.x). I could have converted it to Tensorflow 2.x but ONNX has better support for Pytorch. …


Specialize GPT-2 for enhanced performance on any text

This article is part of a series on GPT-2. It’s best if you start in the beginning. The links are located at the bottom of the page.

What is fine-tuning?
GPT-2 was trained on 40 gigabytes of text ranging across many subjects. It is very good at generating text, but it can be improved by training it on text specific to its application. This process is called transfer learning.

Prior to running either tutorial see this article for setup. The best way to go through this article is interactively:

  1. Finetune with GPT-2…


If you think your data is clean, you haven’t looked at it hard enough.

This article is part of a series on GPT-2. It’s best if you start in the beginning. The links are located at the bottom of the page.

In the next tutorial, you will fine tune (train) GPT-2 on any topic that you want with a single large text file or folder containing a lot of text files. In the example, I will work with a large selection of Pulitzer Prize winning novels. You can select any text you would like as long as there is a…


Open your toolbox, fill it up…

This article is part of a series on GPT-2. It’s best if you start in the beginning. The links are located at the bottom of the page.

In this article, I will talk about the resources you will use and direct you to tutorials. Some of the tutorials are built by others, some I built for this article series.

Initial Setup:

  1. Unless you have 15 gigabytes of storage free on your google drive, you should set up another one to prevent storage issues.
  2. Go to this google drive folder. Download the folder “Everything GPT-2…

Feel the burn …

This article is part of a series on GPT-2. It’s best if you start in the beginning. The links are located at the bottom of the page.

The existing resources for GPT-2’s architecture are very good, but are written for experienced scientists and developers. This article is a concept roadmap to make GPT-2 more accessible to technically minded people who have not had formally schooling in the Natural Language Processing (NLP). It contains the best resources I discovered while learning about GPT-2.

Prerequisites:

  1. Linear Algebra — specifically matrix multiplication, vectors, and projections from one space onto…

Edward Girling

Mathematician, enjoys his knowledge distilled. Find my insight deep, my jokes laughable, my resources useful, connect with me on twitter @Rowlando_13

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store