# Using R to map COVID-19 data in France

The French government has this website called data.gouv.fr with a lot of open access data, including data on how the COVID-19 crisis impacts the French healthcare system. Here, we can find a dataset with the number of people in intensive care units for every french department.

This data set contains a lot of interesting information, including data on how the COVID-19 crisis affected the Grand Est region. Unfortunatly, it is the kind of data set which is hard to make sense of at first glance.

covid_department_dataset <-

## Warning: 58419 parsing failures.
##    row      col           expected actual                                                                          file
## 109384 HospConv 1/0/T/F/TRUE/FALSE    111 'https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7'
## 109384 SSR_USLD 1/0/T/F/TRUE/FALSE    53  'https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7'
## 109385 HospConv 1/0/T/F/TRUE/FALSE    52  'https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7'
## 109385 SSR_USLD 1/0/T/F/TRUE/FALSE    23  'https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7'
## 109386 HospConv 1/0/T/F/TRUE/FALSE    58  'https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7'
## ...... ........ .................. ...... .............................................................................
## See problems(...) for more details.

head(covid_department_dataset)

## # A tibble: 6 x 10
##   dep    sexe jour        hosp   rea HospConv SSR_USLD autres   rad    dc
##   <chr> <dbl> <date>     <dbl> <dbl> <lgl>    <lgl>    <lgl>  <dbl> <dbl>
## 1 01        0 2020-03-18     2     0 NA       NA       NA         1     0
## 2 01        1 2020-03-18     1     0 NA       NA       NA         1     0
## 3 01        2 2020-03-18     1     0 NA       NA       NA         0     0
## 4 02        0 2020-03-18    41    10 NA       NA       NA        18    11
## 5 02        1 2020-03-18    19     4 NA       NA       NA        11     6
## 6 02        2 2020-03-18    22     6 NA       NA       NA         7     5


To better vizualize the dataset, we could create a map where we would show the COVID-19 data for each department. This kind of map has been used before, for example showing US unemployment rate between states. Here, we will try to create a map of France where every department is a subplot. These subplots will be line graphs with the number of person in intensive care units on the y-axis and the date on the x-axis.

Fortunately , there’s an R package that is dedicated to the creation of such plots. The geofacet R package works with ggplot2 to create subplots according to a geographical grid. The only information that it needs is the data set containing those data we want to map (that we already have) and a data set that is setting up geographical data on a grid.

First, let’s start by loading a few packages that could be useful.

library(tidyverse)
library(geofacet)
library(gghighlight)


We will then create a data set with the geographical grid we want to use. For France, we can easily create the grid data set on googlesheet and then import it into R. The data set is available here, and we can import it into R in two lines thanks to the googlesheets4 package (note that you probably will have to call the sheets_auth function at some point).

departments <-
sheet = "departement")

## Warning: sheets_read() was deprecated in googlesheets4 0.2.0.
## Please use range_read() instead.


Now, because we’re greedy and because googlsheet4 is easy to use, we can decide to create a second data set so we can add each departement’s regional information.

regions <-
sheet = "region")


The wrangling we need to do to prepare the data set is minimal. We just have to remove french departments that we don’t want to plot and add for every department its regional information.

covid_department_dataset <-
covid_department_dataset %>%
semi_join(departments,
by = c("dep" = "code")) %>%
left_join(regions,
by = c("dep" = "code")) %>%
filter(sexe == 0) # Because we don't want to make gender-based
# analysis, we keep the overall results


## # A tibble: 6 x 13
##   dep    sexe jour        hosp   rea HospConv SSR_USLD autres   rad    dc name
##   <chr> <dbl> <date>     <dbl> <dbl> <lgl>    <lgl>    <lgl>  <dbl> <dbl> <chr>
## 1 01        0 2020-03-18     2     0 NA       NA       NA         1     0 Ain
## 2 02        0 2020-03-18    41    10 NA       NA       NA        18    11 Ainse
## 3 03        0 2020-03-18     4     0 NA       NA       NA         1     0 Allier
## 4 04        0 2020-03-18     3     1 NA       NA       NA         2     0 Alpes…
## 5 05        0 2020-03-18     8     1 NA       NA       NA         9     0 Haute…
## 6 06        0 2020-03-18    25     1 NA       NA       NA        47     2 Alpes…
## # … with 2 more variables: region <chr>, inhabitant <dbl>


Now, we just have to create the plot using the covid_department_dataset. Note that, to make the plot easier to interpret, we can use the gghighlight package that works well with the ggplot2 faceting system.

covid_department_dataset %>%
ggplot(aes(y = rea,
x = jour,
color = region,
group = dep)) +
geom_line() +
gghighlight(unhighlighted_params = list(alpha = .15)) +
facet_geo(~ dep,
grid = departments) +
labs (x = "",
y = "",
title = "COVID-19 impacts on french health care system",
subtitle = "Number of people in intensive care units\ndue to COVID-19.",
caption = "Source: data.gouv.fr") +
theme(text = element_text("Atlas Grotesk"),
plot.caption = element_text("Atlas Grotesk"),
strip.background = element_blank(),
panel.background = element_blank(),
strip.text = element_text(family = "Atlas Grotesk",
size   = 6),
axis.text = element_text(family = "Atlas Grotesk",
size   = 5),
axis.text.x = element_text(angle = 45),
)


##### Cédric Batailler
###### Graduate student in Social Psychology

My research interests include social cognition, data visualization, and statistics.