Statistics In A Time Of Crisis

The last week has been strange to say the least. Much like you, I’m left wondering with how best to cope with a pandemic that has shifted our reality significantly.

For me, coping is obsessively monitoring the confirmed COVID-19 cases. As I wrote in another post, knowing the data behind something helps me to better understand the world around me. It often calms any fears or anxieties I might have because I’m able to put boundaries around the things that I don’t know by better describing the things that I do.

Yes, I’m actually writing that statistics is my prescription of choice to deal with my own anxieties. That, and making lists.

So what do I actually mean by “statistics is my prescription of choice”?

For me, it begins with finding data. As much data as possible. And for COVID-19 data, I began with which has been posting regular updates about the spread of the disease1. This includes the number of cases, deaths, and recoveries globally, by country, and in some cases, by state or province. For those still infected, the website also identifies condition level: from mild to serious or critical.

To make these data useful to me, I created an Excel file where I recorded the daily numbers for several countries – Canada, the United States, Iran, Italy, and Spain – since March 1. I picked Canada for obvious reasons, and the United States given our proximity. Italy and Spain were selected because they’ve been seeing some rather unbelievable numbers. And I selected Iran simply because some of my students are Iranian.

On Sunday I developed a very simple R script to determine the exponential rate of growth of new cases. Specifically, I was trying to estimate the values of \alpha and \beta that parameterize an exponential model for each country:

y=\alpha e^{\beta t},

where y represents the number of new cases daily (per country or region of interest), and t is the time in days since March 1. This is easily converted to a simple linear regression model by taking the \log of both sides (\log(y)=\log{\alpha}+\beta t). To get an estimate of the change in number of new cases per day, we simply determine e^{\beta}.

For example, using the Canadian case counts (March 1 through March 17) we obtain a value for e^{\beta} of 1.218597. In other words, cases are increasing at an average rate of approximately 21.9% per day. Since the model estimated 63.37 new cases in Canada yesterday (the actual number of new cases was 63), it estimates today’s new cases to be 77.23. I’ve rounded this up to 78 in Table 1 because we can’t really have a person who is only a fraction of a case.

How does Canada’s rate compare to other countries? Check out Table 1 for some numbers.

Country\mathbf{\beta}\mathbf{e^{\beta}}Estimated Daily New Case Count Increase (%)Prediction & 95% Prediction Interval
Canada0.19771.218621.86%78 (9, 732)
United States0.30601.357935.79%3166 (1225, 8181)
Italy0.15491.167516.75%6698 (3091, 14515)
Spain0.29941.349034.90%5738 (1768, 18622)
Iran0.05961.06146.14%1542 (739, 3218)
Table 1: Estimated increase in daily new case counts (%) by country.

Does this mean that Canada will have 78 new cases today? No. But the model gives us a sense of what might be coming. These data can also help us to better understand if any of the things we are doing – such as social distancing – is having an effect (SPOILER ALERT: It will, so do it). Of course, we likely won’t begin to see the effects of social distancing for at least a week or two since the World Health Organization has estimated the incubation period to be 1-14 days.

It’s also important to recognize that this is a very simple model that assumes a particular type of growth using a limited data set. Will the disease continue to spread in an exponential fashion forever? No. Does it consider other variables that might be contributing to or limiting the spread of disease? No. It’s a very simple analysis that provides some sense of what is happening out there in the real world. And for that reason I wouldn’t trust it beyond using it to understand what has already happened, and to understand if things are changing as a result of any of the interventions we now have in place.

And, even when things begin to shift due to mitigation measures – and I can’t stress this enough – it is always best to follow the advice of our public health professionals. That is, even if we see the growth rate decrease, it doesn’t mean the pandemic is over.

Regardless, being able to explore the data helps me better understand what’s happening. It gives me something to hold onto when everything around me seems to keep shifting. And in that way it helps me to assuage my anxiety and my fears, even if just for a moment.

Statistics in a time of crisis? For me, the answer is most definitely yes.

1 While manually recording the numbers in an Excel file was fine, yesterday I stumbled on a Github data repository that is being used to fuel the COVID-19 dashboard at John Hopkin’s. I made a few simple updates to my R script that automatically downloads all of the available data (daily), and aggregates it by country. So much data!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.