Since physical distancing began, I’ve spent much of my time monitoring and analyzing COVID-19 data. This includes working with colleagues in the Ontario Veterinary College to develop a dashboard to visualize and better understand the spread of COVID-19 in Canada.
During the process of developing the dashboard (coded by PhD student Kurtis Sobkowich), I spent some time rediscovering several packages in R that I haven’t used in a while, had only used in a very limited manner, or not at all. This included packages such as ggplot2, tibbles, leaflet, tidyverse, and shiny.
I reasoned with myself that this process of learning and relearning would be a productive use of my time, and that I might be able to use whatever came of the exercise for my Data Science class in the fall.
Mostly, however, it was a way to distract myself with code and data while physically distancing myself from the world.
One of the packages that I took some time to explore was the gganimate package. For much of the work that I do, animation is probably not necessary. However, animation can be a fantastic way to visualize data that changes over time. It’s also arguably more engaging than a static image.
Using data made available by John Hopkins University (here), I opted to recreate a trending animated bar chart which animates the changing rank of the levels of a categorical variable based on the value of some other measured variable. You will probably recognize these types of plots from social media where they have been used to track changes in populations, GDP, and other such things by country over time (see here for more examples). In some ways, the plots create a sense that the levels of the categorical variable are somehow in a race to the top.
The data (including cumulative counts of cases, recovered, and dead) were downloaded, merged, and aggregated based on date and country. I also modified the names of several countries as they were listed in different ways in the data set. For example, South Korea was listed as both “Korea, South” and “Republic of Korea”. Similar changes were made to account for the use of Iran versus Iran (Islamic Republic of), and Mainland China versus China. While this wouldn’t impact the ranks over time, ignoring these changes would mean that the labels for each of the bars would flicker from one version of the country name to another.
I’m not going to describe how the animation itself was created, mainly because there is a fantastic post already written about it here that provides full details on the code (and one that I used to guide the creation of my animations). Other than updating the code for my data and variables, I didn’t really change much.
With both code and data in hand, I decided I would plot both the cumulative number of confirmed cases of, and deaths due to COVID-19 by country. The results can be found below.
Note that I have arbitrarily used data from February 29th, 2020 to present to create these animations – which is why China holds the number one spot for cases and deaths for so long. Of course, China is eventually overtaken by Italy (for both cases and deaths), then the United States (for cases, and based on today’s data, deaths as well).
NOTE: Plots updated Sunday, April 12, 2020.
While both of these animations are interesting, it’s best not to infer too much from them for several reasons. The data are not adjusted for population size, and they are not aligned based on the time that the first case was identified in each country. The data do not indicate how each country has dealt with the pandemic, or when they might have put measures in place to stop the spread of the disease. That is, the animations tell a partial story at best.
What the data do convey is that this pandemic is very real, with both the total number of known cases and deaths growing quite rapidly over a short period of time.
As always, please stay safe and stay home.
[Update on Sunday, April 12, 2020] New plots: Cases per million, and deaths per million. In both cases, I’m only including countries that have a population of more than 4 million people. The population data come from Worldometers.info.