The same thing goes by many names…

The Problem

The Solution

The Github Repo

The Documentation at Read the Docs

To follow the tutorial with a Google Colab notebook:

  1. Go to this Google drive
  2. Download “The Renamer” folder and unzip it.
  3. Upload it into you own google drive at the top level. All the data is included in folder so all notebook cells will work without path alterations.

This is the direct link to the Colab Notebook, but many of the cells will not run, because you don’t have the data.

The code snippets included in this Medium article are just highlights from the Colab Notebook.

Example Problem

Example Data Source

Load and Explore the data

The data is day by day count of various vaccination metrics. There is state and other data. We only want the date, location, and people_fully_vaccinated_per_hundred. Additionally, for the date, lets just take the 12th of each month, since the data starts on the 12th on January.

Make a state_ids.csv File

For our example the steps are

  1. In row one enter the names of the columns.
  2. In row two enter destination, single, and many
  3. In column one, starting in row three enter all the state abbreviations.
  4. In column two, starting in row three enter all the full names of the states, with New York as New York State.
  5. In column three, starting in row three add all forms of the state name.

The tedious part of typing out all the states is done for you. The screen shot only capture the top few and displays the indices of the side(not part of the csv) since I read it in as a pandas.Dataframe.

Build Dictionaries for Transform

Leave Some Data Alone or not?

Option 1 Leave Non-state Data Alone

Option 2 Replace Nonstate Data and Drop it.

Plot the Data

Why use it?

Screen shot of tests on my local machine for 13,000 rows read in once and processed 3 times.

Can I install with pip?

Thanks for going through the tutorial. I hope The Renamer is useful to you.

