Using R to map COVID-19 data in France
The French government has this website called data.gouv.fr with a lot of open access data, including data on how the COVID-19 crisis impacts the French healthcare system. Here, we can find a dataset with the number of people in intensive care units for every french department.
This data set contains a lot of interesting information, including data on how the COVID-19 crisis affected the Grand Est region. Unfortunatly, it is the kind of data set which is hard to make sense of at first glance.
covid_department_dataset <-
readr::read_csv2("https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7")
## Warning: 58419 parsing failures.
## row col expected actual file
## 109384 HospConv 1/0/T/F/TRUE/FALSE 111 'https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7'
## 109384 SSR_USLD 1/0/T/F/TRUE/FALSE 53 'https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7'
## 109385 HospConv 1/0/T/F/TRUE/FALSE 52 'https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7'
## 109385 SSR_USLD 1/0/T/F/TRUE/FALSE 23 'https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7'
## 109386 HospConv 1/0/T/F/TRUE/FALSE 58 'https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7'
## ...... ........ .................. ...... .............................................................................
## See problems(...) for more details.
head(covid_department_dataset)
## # A tibble: 6 x 10
## dep sexe jour hosp rea HospConv SSR_USLD autres rad dc
## <chr> <dbl> <date> <dbl> <dbl> <lgl> <lgl> <lgl> <dbl> <dbl>
## 1 01 0 2020-03-18 2 0 NA NA NA 1 0
## 2 01 1 2020-03-18 1 0 NA NA NA 1 0
## 3 01 2 2020-03-18 1 0 NA NA NA 0 0
## 4 02 0 2020-03-18 41 10 NA NA NA 18 11
## 5 02 1 2020-03-18 19 4 NA NA NA 11 6
## 6 02 2 2020-03-18 22 6 NA NA NA 7 5
To better vizualize the dataset, we could create a map where we would show the COVID-19 data for each department. This kind of map has been used before, for example showing US unemployment rate between states. Here, we will try to create a map of France where every department is a subplot. These subplots will be line graphs with the number of person in intensive care units on the y-axis and the date on the x-axis.
Fortunately , there’s an R package that is dedicated to the creation of
such plots. The geofacet
R package works with ggplot2 to create
subplots according to a geographical grid. The only information that it needs is
the data set containing those data we want to map (that we already have)
and a data set that is setting up geographical data on a grid.
First, let’s start by loading a few packages that could be useful.
library(tidyverse)
library(geofacet)
library(googlesheets4)
library(gghighlight)
We will then create a data set with the geographical grid we want to use. For
France, we can easily create the grid data set on
googlesheet
and then import it into R. The data set is available here, and we can import it
into R in two lines thanks to the googlesheets4
package (note that you probably
will have to call the sheets_auth
function at some point).
departments <-
sheets_read("1hKY0kjzLm55q1R2jD7FmbRkC6XTb0El4oEbVvi98pwo",
sheet = "departement")
## Warning: `sheets_read()` was deprecated in googlesheets4 0.2.0.
## Please use `range_read()` instead.
Now, because we’re greedy and because googlsheet4
is easy to use, we can
decide to create a second data set so we can add each departement’s regional
information.
regions <-
range_read("1hKY0kjzLm55q1R2jD7FmbRkC6XTb0El4oEbVvi98pwo",
sheet = "region")
The wrangling we need to do to prepare the data set is minimal. We just have to remove french departments that we don’t want to plot and add for every department its regional information.
covid_department_dataset <-
covid_department_dataset %>%
semi_join(departments,
by = c("dep" = "code")) %>%
left_join(regions,
by = c("dep" = "code")) %>%
filter(sexe == 0) # Because we don't want to make gender-based
# analysis, we keep the overall results
head(covid_department_dataset)
## # A tibble: 6 x 13
## dep sexe jour hosp rea HospConv SSR_USLD autres rad dc name
## <chr> <dbl> <date> <dbl> <dbl> <lgl> <lgl> <lgl> <dbl> <dbl> <chr>
## 1 01 0 2020-03-18 2 0 NA NA NA 1 0 Ain
## 2 02 0 2020-03-18 41 10 NA NA NA 18 11 Ainse
## 3 03 0 2020-03-18 4 0 NA NA NA 1 0 Allier
## 4 04 0 2020-03-18 3 1 NA NA NA 2 0 Alpes…
## 5 05 0 2020-03-18 8 1 NA NA NA 9 0 Haute…
## 6 06 0 2020-03-18 25 1 NA NA NA 47 2 Alpes…
## # … with 2 more variables: region <chr>, inhabitant <dbl>
Now, we just have to create the plot using the covid_department_dataset
. Note
that, to make the plot easier to interpret, we can use the gghighlight
package that works well with the ggplot2
faceting system.
covid_department_dataset %>%
ggplot(aes(y = rea,
x = jour,
color = region,
group = dep)) +
geom_line() +
gghighlight(unhighlighted_params = list(alpha = .15)) +
facet_geo(~ dep,
grid = departments) +
labs (x = "",
y = "",
title = "COVID-19 impacts on french health care system",
subtitle = "Number of people in intensive care units\ndue to COVID-19.",
caption = "Source: data.gouv.fr") +
theme(text = element_text("Atlas Grotesk"),
plot.caption = element_text("Atlas Grotesk"),
strip.background = element_blank(),
panel.background = element_blank(),
strip.text = element_text(family = "Atlas Grotesk",
size = 6),
axis.text = element_text(family = "Atlas Grotesk",
size = 5),
axis.text.x = element_text(angle = 45),
)