The effect of a strict facial-mask policy on the spread of COVID-19 in Switzerland during the early phase of the pandemic

Nussli, Emanuel; Hediger, Simon; Spohn, Meta-Lina; Maathuis, Marloes H.

doi:10.1186/s41937-024-00119-0

Original article
Open access
Published: 12 February 2024

The effect of a strict facial-mask policy on the spread of COVID-19 in Switzerland during the early phase of the pandemic

Emanuel Nussli¹^na1,
Simon Hediger ORCID: orcid.org/0000-0003-4825-220X²^na1,
Meta-Lina Spohn¹^na1 &
…
Marloes H. Maathuis¹

Swiss Journal of Economics and Statistics volume 160, Article number: 2 (2024) Cite this article

1205 Accesses
1 Citations
17 Altmetric
Metrics details

Abstract

During several weeks in the second half of the year 2020, the cantons of Switzerland could choose to adopt the government-determined facial-mask policy, corresponding to mandatory facial-mask wearing on public transport, or a strict facial-mask policy, corresponding to mandatory facial-mask wearing on public transport and in all public or shared spaces where social distancing was not possible. We estimate the effect of introducing the strict facial-mask policy on the spread of COVID-19 in Switzerland during this first phase of the pandemic in 2020, using the cantonal heterogeneity in facial-mask policies. We adjust for social distancing behavior, weather, other non-pharmaceutical policies and further variables. We estimate a significant reduction in the expected spread of COVID-19 in the early pandemic if the strict facial-mask policy is adopted.

1 Introduction

The coronavirus disease (COVID-19) pandemic presented large challenges to societies around the world. In the early pandemic in 2020, where the alpha variant of the SARS-CoV-2 virus was predominant, knowledge about the spread of the virus and about COVID-19 was scarce. In close collaboration with science, politicians and decision makers were trying to contain the spread of COVID-19 while avoiding unnecessary restrictions. Non-pharmaceutical interventions such as school closures, restrictions on public and private gatherings and enforcement of home office were employed.

In this paper, we focus on the effect of introducing a strict facial-mask policy on the containment of COVID-19 in Switzerland during the first phase of the pandemic in 2020. Studying the effect of the facial-mask policy is especially interesting as it is arguably one of the most debated policies. This might be partially due to the position that the Federal Office of Public Health of the Swiss Confederation (BAG) took in March 2020, communicating that healthy people do not need to wear facial masks.^{Footnote 1} A second reason for focusing on the facial-mask policy is that it is a relatively cheap and noninvasive policy when compared to other non-pharmaceutical interventions.

After the country-wide lockdown in Switzerland from mid of March to end of April 2020, the federal government determined country-wide lower bounds on containment measures. The 26 Swiss cantons were given partial autonomy in introducing COVID-19 containment measures.^{Footnote 2} On July 6, 2020, wearing facial masks on public transport was made obligatory and thus formed the country-wide baseline for facial-mask policies.^{Footnote 3} Cantons could choose to enforce mandatory mask wearing on public transport and in all public or shared spaces where social distancing was not possible, which we henceforth refer to as the strict facial-mask policy. On October 19, 2020, the government enforced the strict facial-mask policy for all cantons. On December 21, 2020, vaccinations against COVID-19 were initiated in Switzerland, marking a massive change point in the pandemic. We thus consider the period from July 6, 2020, to December 20, 2020, as our period of analysis. During the whole period of analysis, a coordinated information campaign and international restrictions on entering the country were in place. On October 29, 2020, nationwide restrictions on public events were introduced by the government. On November 2, 2020, universities were closed in Switzerland.

We quantify the spread of COVID-19 by two different, but related response variables: the estimated effective reproductive number (Huisman et al., 2022) and the approximate weekly growth rate in supposed new infections (Chernozhukov et al., 2021).

To identify the effect of the strict facial-mask policy, we impose causal assumptions similar to Chernozhukov et al. (2021). We use a directed acyclic graph (DAG) to visualize the assumed causal relationships among the facial-mask policy variable, the response variable and different sets of control variables. By regressing the response variable on a suitable set of control variables determined in the DAG, we identify both the direct and total effect of the strict facial-mask policy variable on each of the two response variables. The direct effect, where direct means with respect to the variables we consider, captures changes in the response variable due to changes in the strict facial-mask policy variable, keeping the rest fixed. The total effect additionally captures changes in the response variable that are mediated through changes in the social distancing behavior.

We use publicly available data from different sources. The data have a balanced panel structure, as we have observations for each of the 26 cantons of Switzerland, measured during the 24 weeks considered. For both response variables, we assume a linear generating equation with a two-way error component, including a canton- and week-specific part. This model allows us to account for dependencies between the observations within cantons and weeks. Depending on the assumptions on the error components, a specific linear regression model is estimated with either a fixed-effects or random-effects approach to estimate the total and direct effect.

For both response variables with both fixed- and random-effects approaches, we obtain negative point estimates of both direct and total effect, most of them being significant. In other words, we estimate an expected reduction in the spread of COVID-19 in the early pandemic comparing the strict facial-mask policy to the government-determined country-wide baseline. We perform various sensitivity analyses to confirm the robustness of our results with respect to inevitable modeling choices.

To our knowledge, this is the first study that statistically analyses the effect of the strict facial-mask policy on the spread of COVID-19 in Switzerland during the early phase of the pandemic. Pleninger et al. (2022) analyse the combination of all COVID-19-related policies in Switzerland, as measured by the Stringency Index of the Konjunkturforschungsstelle (KOF). They do not examine the isolated effect of the strict facial-mask policy. For Switzerland and Germany, Huber and Langen (2020) study the impact of the timing of the non-pharmaceutical policy of lockdowns on COVID-19-related death and hospitalization rates. They find that an early introduction reduces said rates substantially.

In other countries however, the effect of facial-mask policies has been studied. For the USA, Chernozhukov et al. (2021) study the effect of mandatory mask wearing at the workplace. They estimate a significant reduction of the approximate weekly growth rate in supposed new infections by around 0.1. For Germany, Mitze et al. (2020) find a 15–75% reduction of new cases 20 days after the introduction of mandating facial masks in public transport and stores. Zhang et al. (2020) find that mandatory facial masks considerably slow down infection growth for the analysed entities of New York, Wuhan and Italy. There are studies confirming the functionality of facial masks in hindering transmission of viral droplets in laboratory settings (see e.g., Kähler and Hain (2020)). However, in observational settings it is the effect of facial-masks policies that is analysed, which includes mechanisms such as changes in risk-taking behavior and misuse of facial masks.

Direct comparison of our results to estimates in other countries and time spans is hard, due to different facial-mask policies and/or general differences between countries and their population. In essence, however, our findings support the existing literature regarding the sign and significance of the effect of a strict facial-mask policy on the spread of COVID-19.

The article is organized as follows. Section 2 explains the data, Sect. 3 explains the causal assumptions and the causal effects of interest, and Sect. 4 describes the methodology. Section 5 presents the results. Finally, we conclude in Sect. 6.

2 Data

Our data are measured in each of the $i=1,\ldots , N=26$ cantons in Switzerland in each of the $t=1, \ldots , T=24$ weeks in the period of analysis, ranging from July 6, 2020, until December 20, 2020. We use weekly data because data at the daily resolution would artificially increase the sample size with highly dependent observations, which would lead to faulty statistical inference. Our variable of interest, the so-called treatment variable, is the strict facial-mask policy variable, were 1 indicates the strict policy was in place while 0 indicates the government-determined country-wide baseline was in place. To quantify the spread of COVID-19, we use two different response variables: the first is the estimated effective reproductive number and the second is the approximate weekly growth rate in supposed new infections. In the control variables we only consider variables that varied between cantons and/or weeks in the period of analysis.

We can summarize the variables, observed in canton $i=1,\ldots ,N$ in week $t=1,\ldots , T$, into eleven groups:^{Footnote 4}

$Y_{i,t}$: response variable quantifying the spread of COVID-19 (estimated effective reproductive number or the approximate weekly growth rate in supposed new infections),
$M_{i,t}$: strict facial-mask policy variable (treatment variable),
$B_{i,t}$: (social distancing) behavior variable, quantified by financial transactions,
$\varvec{D}_{i}$: demographic variables that are canton-specific,
$H_{i,t}$: holiday indicator variable,
$\varvec{W}_{i,t}$: meteorological variables reflecting the weather situation,
$\varvec{P}_{i,t}$: non-pharmaceutical policy variables (excluding $M_{i,t}$),
$Y_{i,t'}$: response variable $Y_{i,t}$ lagged to the past with lag $t'$,
$\varvec{U}^1_{i}$: unmeasured and not further specified canton-specific variables,
$\varvec{U}^2_{t}$: unmeasured and not further specified week-specific variables,
$\varvec{U}^3_{i,t}$: unmeasured and not further specified variables that might vary between weeks and/or cantons.

Subsequently, we present the specific variables (with their short name in brackets) in each of the above categories (apart from $\varvec{U}^1_{i}$, $\varvec{U}^2_{t}$, and $\varvec{U}^3_{i,t}$). We present a list of these 16 variables with their sources and descriptive statistics in Table 1.

2.1 Response variables ($Y_{i,t}$)

The first response variable is the estimated effective reproductive number (r) of Huisman et al. (2022). The estimated effective reproductive number at day d is an estimate of the expected number of secondary infections at day d caused by a previously infected person. Its estimation involves multiple steps: 1) estimation of the number of newly infected people based on the number of newly confirmed cases, adjusting for reporting cycles and irregular reporting practices, 2) a deconvolution step using suitable delay distributions between transmission and reporting of the case to infer the actual infection incidence, 3) application of the EpiEpstim method developed by Cori et al. (2013) to estimate the effective reproductive number from the time series of newly infected people. Only cases stemming from infections within Switzerland are used for estimation. In each canton i, to obtain an observation for week t, we average the daily values within week t. We denote this response variable by $Y_{i,t}=R_{i,t}$. We obtain the data from the Federal Office of Public Health of Switzerland.^{Footnote 5}

The second response variable is the same as the one used in Chernozhukov et al. (2021), the approximate weekly growth rate in supposed new infections from week $t-1$ to week t. To specify this response, we define for each canton i and week t

$$\begin{aligned} G_{i,t} := \ln \left( \frac{C_{i,t}}{C_{i,t-1}}\right) , \end{aligned}$$

where $C_{i,t}$ represents the number of reported new cases in canton i in week t. Due to the delay between the reporting of a new case and the actual infection with the virus, $G_{i,t}$ does not represent the pandemic situation in week t but of a time period before t. Therefore, to obtain an approximation of the weekly growth rate in supposed new infections from week $t-1$ to week t, we need to use a future value of $G_{i,t}$. We employ the same time shift of two weeks to the future as Chernozhukov et al. (2021), resulting in the response variable $Y_{i,t}=G_{i, t+2}$ (growth.new.cases). We obtain the data on reported new cases from the Federal Office of Public Health of Switzerland.^{Footnote 6}

We plot both responses in Fig. 1 for all 26 cantons in the period of analysis. The plots and the Pearson correlation coefficients $\rho$ highlight that the two responses are similar but there is no one-to-one correspondence between the estimated effective reproductive number and the approximate weekly growth rate in supposed new infections from week $t-1$ to week t.

2.2 Strict facial-mask policy variable ($M_{i,t}$)

The strict facial-mask policy variable (facial. mask) is our treatment variable. In each canton i, it has a value of 1 if the strict policy is applied and a value of 0 if the government-determined baseline policy is applied. During the period of analysis, a total of 10 cantons deviate from the baseline policy by preemptively introducing the strict facial-mask policy. At a daily resolution, there are 1807 observations where the strict policy is in place and 2561 where the baseline policy is implemented. For more details, see Table 1 and Fig. 5 in Appendix A.

To obtain an observation for week t, we average the daily values within week t. We obtain the data from KOF, via their CRAN R-package kofdata (Bannert et al., 2022).

2.3 Social distancing behavior variable ($B_{i,t}$)

As a proxy for social distancing behavior, we use household spending, similar to Pleninger et al. (2022). In each canton i, we consider the approximated growth rate of transactions in CHF with credit cards, debit cards and bank transfers from mobile phones of Swiss residents (growth.transactions). E-commerce is not considered.

We obtain the data from Monitoring Consumption Switzerland.^{Footnote 7}

2.4 Demographic variables ($\varvec{D}_{i}$)

Demographic variables of a canton i are given by population size (population) and the percentage of people with age $\ge 80$ years (perc.o80). Wheaton and Thompson (2020) show that infection growth is also strongly linked to residential density, that is the number of people per km$^2$ of settlement area (density), which we also consider. These three variables can be considered constant for all weeks t. We obtain the data from the Federal Statistical Office of Switzerland.^{Footnote 8}

2.5 Holiday indicator ($H_{i,t}$)

In each canton i the daily holiday indicator (holiday) has a value of 1 if the majority of public schools in the canton are on holiday and 0 otherwise. To obtain an observation for week t, we average the daily values within week t.

We obtain the holiday data from the cantonal education departments.^{Footnote 9}

2.6 Meteorological variables ($\varvec{W}_{i,t}$)

Zoran et al. (2020) suggest that weather conditions are closely linked to the spread of COVID-19. In particular, they find that dry air supports the transmission of COVID-19. Their findings are supported by Zhu et al. (2020) and Fattorini and Regoli (2020). To incorporate these effects, we assemble weather data from a total of 100 weather stations from SwissMetNet,^{Footnote 10} not including stations on mountains. For each canton, we compute average daily weather values by weighting observations of stations by the population size of the respective municipality. Lastly, the canton of Basel-Stadt is mapped to the weather of canton Basel-Land, Appenzell Innerrhoden and Appenzell Ausserrhoden to St.Gallen and Nidwalden to Obwalden, due to the lack of suitable stations. In doing so, we get a characterization of the daily weather, quantified by the number of minutes of sunshine (sunshine), the mean air temperature in $^\circ \text {C}$ (temperature), and the relative humidity in $\%$ (humidity). To obtain an observation for week t, we average the daily values within week t.

2.7 Non-pharmaceutical policy variables ($\varvec{P}_{i,t}$)

The KOF Stringency Plus Index (Pleninger et al., 2022), the Government Response Index and the Economic Support Index (Hale et al., 2021),^{Footnote 11} compose different sets of policy variables into one index with the aim of reflecting the stringency of a government in regards to COVID-19 policies.

We do not use these indices, but use the policy variables directly, where we only consider those that vary at least across cantons or across weeks over the period of analysis. In each canton i, these policy variables are daily indicators for workplace closings (work.closing), school closings (school.closings), restrictions on gatherings (rest.gatherings), cancelation of public events (canc.events), and testing policy (testing.policy).^{Footnote 12} These indicators have 2 to 5 levels, where a higher level indicates a stricter policy. To obtain an observation for week t, we average the daily values within week t. We obtain the data from KOF, provided through their CRAN R-package kofdata (Bannert et al., 2022).

2.8 Lagged response variables as covariates ($Y_{i,t'}$)

We consider a lagged response variable as covariate (Y.lagged): We include the lagged response variable of which the value is known in week t, summarizing the information about the pandemic situation that is available and communicated to the public in week t. Knowledge about the current pandemic situation strongly drives the policy decisions and the behavior of the population.

If we consider the weekly average estimated effective reproductive number as response variable, that is $Y_{i,t}=R_{i,t}$, the information variable is given by $Y_{i,t'} = R_{i,t-3}$, which corresponds to the estimated effective reproductive number of three weeks ago. This lag is due to time delays between the infection, the start of the symptoms and the report of a case, such that in week t only the value of three weeks ago is known.

If we consider the approximate weekly growth rate in supposed new infections from week $t-1$ to week t as response variable, that is $Y_{i,t}=G_{i,t+2}$, the information variable is given by $Y_{i,t'} = G_{i,t}$, a value that we assume to be readily available in week t.

3 Causal assumptions and effects

We assume a directed acyclic graph (DAG) among the eleven sets of variables, which is a graphical model displaying the causal relationships among the variables. Based on this DAG we identify the direct and total effect of the treatment variable on the response variables. Our DAG is based on the DAG of Chernozhukov et al. (2021), while we adapt the causal structure to our setting.

The DAG^{Footnote 13} is displayed in Fig. 2. The causal relationships between the variables are assumed to be the same for both response variables. The gray-colored nodes represent the strict facial-mask policy, our treatment variable, and the response variable. The white nodes represent the covariates. A directed edge $A\rightarrow B$ between nodes A and B represents a causal relationship, where a change in A results in a change in B.

The measured (= observed) covariates are displayed within black circles. Note that all variables, apart from $Y_{i,t'}$, are indexed by the same week t. Further, we do not allow for spillover effects between cantons. The three types of unmeasured covariates $\varvec{U}^1_{i}$, $\varvec{U}^2_{t}$, and $\varvec{U}^3_{i,t}$ are displayed within light gray circles. Their relations to the other groups of variables varies with the modeling approach: We always allow for an unmeasured common cause $\varvec{U}^3_{i,t}$ of $M_{i,t}$ and $\varvec{P}_{i,t}$ (displayed with solid light gray edges). We also always allow for unmeasured causes of $Y_{i,t}$ that are constant within weeks or cantons, given by $\varvec{U}^1_{i}$ and $\varvec{U}^2_{t}$ (displayed with solid light gray edges into $Y_{i,t}$). However, only in the fixed-effects model, described upcoming in Sect. 4, we allow $\varvec{U}^1_{i}$ and $\varvec{U}^2_{t}$ to be also causes of the other input variables (displayed by dashed light gray edges), such that they constitute unobserved confounders between $M_{i,t}$ and $Y_{i,t}$.

The strict facial-mask policy variable $M_{i,t}$ is assumed to influence the response variable directly or indirectly. The blue edge, $M_{i,t} \rightarrow Y_{i,t}$, represents the direct effect. The path in orange, $M_{i,t} \rightarrow B_{i,t} \rightarrow Y_{i,t}$, represents the indirect effect, the effect of $M_{i,t}$ on the spread of COVID-19 through its effect on the mediator^{Footnote 14}$B_{i,t}$. The sum of the direct and indirect effects results in the total effect of the strict facial-mask policy variable on the response variable.

The orange edge $M_{i,t} \rightarrow B_{i,t}$, which is part of the indirect effect, corresponds to alternations in social distancing behavior of the public in canton i in week t due to changes in $M_{i,t}$. For example, some people might increase social contacts because the obligation to wear a mask gives them a feeling of security. Note, we assume that the behavior variable in week t is only affected by the policy value in week t and not the past policy values. We argue this is justified since the salience of the pandemic, which could be represented by past policy values, is already represented by the information variable $Y_{i,t'}$, as described in Sect. 2.8. In addition, the time period we investigate corresponds to the very early stage of the pandemic, thus most people were responding promptly to changes in the policy, that is their behavior in week t was only affected by the policy in week t.

We now give a more formal definition of the total, direct and indirect causal effect. For notational simplicity, we use the following abbreviation for a univariate random variable $V_{i,t}$ under a do-intervention (Pearl, 1995) on the facial-mask policy $M_{i,t}$ (treatment variable)

$$\begin{aligned} V_{i,t}(\tau ) :=V_{i,t}(\text {do}(M_{i,t}= \tau )), \end{aligned}$$

where $\tau \in \{0,1\}$. The total causal effect (Pearl, 2000) of the change of the treatment variable $M_{i,t}$ from 0 to 1 on the response $Y_{i,t}$ is given by

$$\begin{aligned} \text {TCE}:=&{\mathbb E}\left[ Y_{i,t}(1)\right] - {\mathbb E}\left[ Y_{i,t}(0)\right] . \end{aligned}$$

Note that the total causal effect describes the effect of $M_{i,t}$ on $Y_{i,t}$ considering all causal paths from $M_{i,t}$ to $Y_{i,t}$. Recall that $Y_{i,t}$ is the response that is already leaded to the future. In our DAG there is a mediator $B_{i,t}$ between $M_{i,t}$ and $Y_{i,t}$, such that the total causal effect can be decomposed into the sum of a direct and an indirect causal effect (Pearl, 2001). Explicitly, the total causal effect can be written as the sum of the total natural indirect effect (TNIE) (short name indirect effect) and the pure natural direct effect (PNDE) (short name direct effect) (Daniel et al., 2015), that is

$$\begin{aligned} \text {TCE}&= {\mathbb E}\left[ Y_{i,t}(1)\right] - {\mathbb E}\left[ Y_{i,t}(0)\right] \\&= {\mathbb E}\left[ Y_{i,t}(1, B_{i,t}(1))\right] - {\mathbb E}\left[ Y_{i,t}(0, B_{i,t}(0))\right] \\&= \left( {\mathbb E}\left[ Y_{i,t}(1, B_{i,t}(1))\right] - {\mathbb E}\left[ Y_{i,t}(1, B_{i,t}(0))\right] \right) \\&+ \left( {\mathbb E}\left[ Y_{i,t}(1, B_{i,t}(0))\right] - {\mathbb E}\left[ Y_{i,t}(0, B_{i,t}(0))\right] \right) \\&= \text {TNIE} + \text {PNDE}. \end{aligned}$$

In the next section, where we assume a linear model, we also identify the TCE, TNIE, and PNDE with (products) of model coefficients. We will ultimately estimate the total effect as well as the direct effect though linear regressions, which is possible by using valid adjustment sets.

Table 1 Short name, description, descriptive statistics and data source for all the variables and responses used in the analysis. For more details regarding the interpretation of the values of the non-pharmaceutical policy variables, see the description under https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/codebook.md, last visited August 21, 2023

Full size table

4 Methodology

In the following, for cantons $i=1,\ldots ,N$ and weeks $t=1,\ldots ,T$ let

$$\begin{aligned} Y_{i,t}&= \delta _1 M_{i,t}+\varvec{\delta }_2^{\top } \varvec{V}_{i,t} + \delta _3 B_{i,t} +\epsilon _{i,t}, \nonumber \\ B_{i,t}&= \delta _4 M_{i,t}+\varvec{\delta }_5^{\top } \varvec{\tilde{V}}_{i,t} +\nu _{i,t}, \end{aligned}$$

(1)

be the two generating equations of interest of the structural equation model (SEM) (Pearl, 2009) compatible with our DAG. The set $\varvec{V}_{i,t}$ is the parent set of $Y_{i,t}$ without $B_{i,t}$ and $M_{i,t}$, $\varvec{\tilde{V}}_{i,t}$ is the parent set of $B_{i,t}$ without $M_{i,t}$, and $\epsilon _{i,t}$ and $\nu _{i,t}$ are error terms with expectation zero. The next small lemma allows us to identify the TCE, TNIE and PNDE in our context via the regression coefficients in Equations (1).

Lemma 4.1

The TNIE and the PNDE can be expressed in terms of the regression coefficients in Equations (1) via

$$\begin{aligned} \text {TNIE}&= \delta _3\delta _4, \ \text {and}\\ \text {PNDE}&= \delta _1. \end{aligned}$$

The proof of Lemma 4.1 is given in Appendix D. By the decomposition $\text {TCE}=\text {TNIE}+\text {PNDE}$ and Lemma 4.1 we get that $\text {TCE} = \delta _3\delta _4 + \delta _1$.

Instead of estimating TNIE and PNDE directly via estimating the coefficients in Equations (1) and applying Lemma 4.1, we specify the following linear regression model with a two-way error component

$$\begin{aligned} Y_{i,t} =\theta M_{i,t}+\varvec{\beta }^{\top } \varvec{Z}_{i,t}+ \alpha _i+\gamma _t +\epsilon _{i,t}, \end{aligned}$$

(2)

where $\varvec{Z}_{i,t}$ is a row vector of covariates and $\alpha _i$ and $\gamma _t$ are explained in the next paragraph. If $\varvec{Z}_{i,t}$ is a valid adjustment set (Shpitser, 2012; Perković et al., 2018) for the effect of $M_{i,t}$ on $Y_{i,t}$, then $\theta$ is equal to the total effect (TCE). We include all parents of $Y_{i,t}$ except $B_{i,t}$ and $M_{i,t}$ in $\varvec{Z}_{i,t}$, which is a valid adjustment set for the total effect.^{Footnote 15} If $\varvec{Z}_{i,t}$ is the parent set of $Y_{i,t}$ except $M_{i,t}$, then $\theta$ is equal to the direct effect (PNDE). Thus, in our setting, the set $\varvec{Z}_{i,t}$ we use to identify the direct effect is given by the conjunction of $B_{i,t}$ and the valid adjustment set used to identify the total effect. Hence, $\theta$ is our target of inference, either with the interpretation of the direct or the total effect.

To relate the Model (2) to the DAG, $\alpha _i$ summarizes the effects of $\varvec{U}_i^1$ on $Y_{i,t}$, and similarly does $\gamma _t$ summarize the effects of $\varvec{U}_t^2$ on $Y_{i,t}$. Depending on the assumptions on the error components $\alpha _i$, $\gamma _t$ and $\epsilon _{i,t}$, Model (2) can be handled with either a fixed-effects or random-effects approach.

Generally, the fixed-effects approach is more robust than the random-effects approach, while the latter is more efficient in case all assumptions are met. We briefly outline both approaches in the upcoming sections; for more details, see, for example, Hansen (2022). The suitability of the linearity assumption in Equation (2) is assessed via Tukey-Anscombe plots (residual vs fitted values).

4.1 Fixed-effects approach

The fixed-effects approach assumes that the stochastic structure of $\alpha _i$ and $\gamma _t$ is unknown and possibly arbitrarily correlated with $M_{i,t}$ and $\varvec{Z}_{i,t}$. In this case, we call $\alpha _i$ an unobserved cantonal fixed effect and $\gamma _t$ an unobserved weekly fixed effect. The incorporation of fixed effects accounts for unobserved common causes of the treatment and response variable that are either canton-specific, but invariant across weeks, or week-specific, but invariant across cantons.

In particular, variables that are constant across cantons are national variables. In other words, by applying the fixed-effects approach we can control for national contextual information such as the total number of new cases in the whole country.

The variance-covariance structure of the error terms $\epsilon _{i,t}$ can take many forms; see the upcoming Sect. 4.1.1. However, $\epsilon _{i,t}$ are always supposed to satisfy the exogeneity assumption,

$$\begin{aligned} \mathbb {E}[\epsilon _{i,t}\mid M_{i,t},\varvec{Z}_{i,t}, \alpha _i, \gamma _t] = 0, \end{aligned}$$

(3)

for all $i=1,\ldots ,N$ and $t=1,\ldots , T$. This assumption implies no further unobserved confounding apart from $\alpha _i$ and $\gamma _t$. To eliminate $\alpha _i$ and $\gamma _t$, we apply the two-way within transformation,

$$\begin{aligned} \ddot{u}_{i,t} :=&\ u_{i,t}-\frac{1}{N}\sum _{i=1}^N u_{i,t}-\frac{1}{T}\sum _{t=1}^{T} u_{i,t} + \nonumber \\&\frac{1}{TN}\sum _{i=1}^{N}\sum _{t=1}^{T}u_{i,t}, \end{aligned}$$

(4)

to $u_{i,t} \in \{Y_{i,t}, M_{i,t}, \varvec{Z}_{i,t}, \alpha _i, \gamma _t, \epsilon _{i,t} \}$ of Model (2) and obtain the following equation

$$\begin{aligned} \ddot{Y}_{i,t}=\theta \ddot{M}_{i,t} + \varvec{\beta }^\top \ddot{\varvec{Z}}_{i,t} + \ddot{\epsilon }_{i,t}, \end{aligned}$$

(5)

where the interpretation of $\theta$ remains as in Model (2). Finally, we estimate the coefficient $\theta$ by estimating the whole coefficient vector $\varvec{\eta } = (\theta , \varvec{\beta })$, using Ordinary Least Squares (OLS). In the following, we use the acronym FE for this approach.

Apart from the basic OLS estimate, we also compute a debiased estimate of $\theta$, $\hat{\theta }_{BC}$, by cross-over Jackknife bias correction (Chernozhukov et al., 2021; Chen et al., 2020, 2019). We employ this method as the estimation of dynamic linear panel models (i.e., including lagged instances of the response variable as covariates) using the fixed-effects estimator potentially yields a bias. The debiased estimate is given by

$$\begin{aligned} \hat{\theta }_{BC}=2\hat{\theta }-(\hat{\theta }_{S_1} +\hat{\theta }_{S_2})/2, \end{aligned}$$

where $\hat{\theta }$ is the OLS regression coefficient based on the entire sample and $\hat{\theta }_{S_j}$ is the estimated coefficient computed on the sub-sample $S_j$, $j=1,2$. The sub-samples $S_1$ and $S_2$ are defined, as in Chernozhukov et al. (2021), by

$$\begin{aligned} S_1 :=&\{(i,t):i\le \lceil N/2\rceil , t\le \lceil T/2\rceil \}\\ \cup \ {}&\{(i,t):i\ge \lceil N/2+1\rceil ,t\ge \lceil T/2+1\rceil \} \end{aligned}$$

and

$$\begin{aligned} S_2 :=&\{(i,t):i\le \lceil N/2\rceil , t\ge \lceil T/2\rceil \}\\ \cup \ {}&\{(i,t):i\ge \lceil N/2+1\rceil ,t\le \lceil T/2+1\rceil \}, \end{aligned}$$

respecting the natural ordering of the weeks. Since there is no natural ordering of the cantons, we repeat the above procedure 500 times, where each time the cantons are randomly permuted. The final estimate is then the average of the 500 debiased estimates. In the following, we use the acronym DFE for this debiased fixed-effects approach.

We now detail the specific sets of control variables $\varvec{Z}_{i,t}$, which depend on whether we aim at estimating the direct or the total effect of the strict facial-mask policy on the spread of COVID-19. Due to the within transformation (4), apart from the unobservable $\alpha _i$ and $\gamma _t$, also all observable week-constant variables, such as policy indicators that do not vary over the period of analysis, and canton-constant variables, such as population or density, drop out of $\varvec{Z}_{i,t}$. In the case of the direct effect, we must thus regress the response variable on all its remaining parents, i.e., $\varvec{Z}_{i,t}= (B_{i,t}, H_{i,t},\varvec{W}_{i,t}, \varvec{P}_{i,t}, Y_{i,t'})$. In the case of the total effect, we need to remove the variable $B_{i,t}$ from the set, and obtain the valid adjustment set $\varvec{Z}_{i,t}= ( H_{i,t}, \varvec{W}_{i,t}, \varvec{P}_{i,t}, Y_{i,t'})$. Concretely, the following variables are contained in each category:

$B_{i,t}$: growth.transactions,
$H_{i,t}$: holiday,
$\varvec{W}_{i,t}$: sunshine, temperature and humidity,
$\varvec{P}_{i,t}$: work.closing, rest.gatherings and canc.events,

and the variable $Y_{i,t'}$ is specific for each of the two response variables, see Sect. 2.8.

4.1.1 Construction of confidence intervals

To construct $95\%$-confidence intervals for the coefficient $\theta$, we use the normal approximation,

$$\begin{aligned} CI_{95\%} :=\left[ \hat{\theta } \pm 1.96\sqrt{\widehat{\text {Var}}(\hat{\theta })}\right] , \end{aligned}$$

(6)

where $\hat{\theta }$ is the first entry in the estimated coefficient vector $\hat{\varvec{\eta }} = (\hat{\theta }, \hat{\varvec{\beta }})$, obtained either through the FE or DFE approach, and $\widehat{\text {Var}}(\hat{\theta })$ is the corresponding estimated variance. The estimation of the variance requires careful consideration due to the panel structure of our data.

Let in the following

$$\begin{aligned}&\varvec{X}_{i,t} :=(M_{i,t},\varvec{Z}_{i,t})^{\top } \ \in \ \mathbb {R}^{P \times 1} \end{aligned}$$

be the observed covariates of canton i and week t, where $P :=1+ |\varvec{Z}_{i,t}|$. Further let

$$\begin{aligned} \varvec{X} :=\begin{pmatrix} \varvec{X}_{1,1}^{\top } \\ \varvec{X}_{1,2}^{\top } \\ \vdots \\ \varvec{X}_{1,T}^{\top } \\ \varvec{X}_{2,1}^{\top } \\ \vdots \\ \varvec{X}_{N,T}^{\top } \end{pmatrix} \ \in \ \mathbb {R}^{NT \times P} \end{aligned}$$

be the stacked covariate matrix. The conditional variance-covariance matrix of $\hat{\varvec{\eta }}$ can be written as

$$\begin{aligned} \text {Var}(\hat{\varvec{\eta }} \mid \varvec{X}) = \varvec{Q}^{-1} \varvec{\Omega } \varvec{Q}^{-1}, \end{aligned}$$

(7)

where

$$\begin{aligned}\varvec{Q} :=\frac{1}{NT} \varvec{X}^\top \varvec{X}, \end{aligned}$$

and

$$\begin{aligned} \varvec{\Omega } :=\frac{1}{(NT)^2}\varvec{X}^\top \text {Var}(\varvec{\epsilon }) \varvec{X}, \end{aligned}$$

(8)

with

$$\begin{aligned} \varvec{\epsilon } :=\big (&\epsilon _{1,1}, \epsilon _{1,2}, \ldots , \epsilon _{1,T},\\&\epsilon _{2,1},\ldots , \epsilon _{2,T}, \ldots , \epsilon _{N,T} \big )^{\top }. \end{aligned}$$

We denote by $\hat{\varvec{\epsilon }}$ the empirical residuals obtained through the FE or DFE approach.

The variance of $\hat{\varvec{\eta }}$ is then estimated by plugging in an estimate of $\varvec{\Omega }$ into Equation (7), resulting in $\widehat{\text {Var}}(\hat{\varvec{\eta }}) =\varvec{Q}^{-1} \hat{\varvec{\Omega }} \varvec{Q}^{-1}$.^{Footnote 16}

We use the following seven estimators for $\varvec{\Omega }$ (with their short name in brackets):

1)
Heteroscedastic-Robust (HC3)
2)
One-Way Clustering on Canton (Canton)
3)
One-Way Clustering on Week (Week)
4)
Two-Way Clustering on Canton and Week (Canton-Week)
5)
Newey-West (NW) (Newey and West, 1987)
6)
Chiang-Hansen (CH) (Chiang et al., 2022)
7)
Informal Own Specification (Own) (motivated by (Colella et al., 2019))

The estimators correspond to different assumptions on the structure of $\textrm{Cov}(\epsilon _{i,t},\epsilon _{j,s})$ for $i,j=1,\ldots , N$ and $t,s=1,\ldots , T$. These assumptions reflect the clustered and/or heteroskedastic and/or autocorrelated nature of the error terms. The details can be found in Appendix C.

4.2 Random-effects approach

The random-effects approach assumes that the components of the error, $\alpha _i$, $\gamma _t$, and $\epsilon _{i,t}$, satisfy the following exogeneity assumptions

$$\begin{aligned}&\mathbb {E}\left[ \alpha _i\mid M_{i,t}, \varvec{Z}_{i,t}\right] =0,\nonumber \\&\mathbb {E}\left[ \gamma _t\mid M_{i,t}, \varvec{Z}_{i,t}\right] =0,\nonumber \\&\mathbb {E}\left[ \epsilon _{i,t}\mid M_{i,t}, \varvec{Z}_{i,t}\right] =0, \end{aligned}$$

(9)

for all $i=1,\ldots ,N$ and $t=1,\ldots , T$. These assumptions imply that $\alpha _i$, $\gamma _t$, and $\epsilon _{i,t}$ are uncorrelated with $M_{i,t}$ and $\varvec{Z}_{i,t}$, which implies the strong assumption of no unobserved confounding. In particular, in contrast to the fixed-effects approach the random-effects approach does not control for unobserved week- or canton-specific confounders. The correlation within weeks and within cantons in the composite error $v_{i,t} = \alpha _i + \gamma _t + \epsilon _{i,t}$ is accounted for by the Feasible Generalized Least Squares (FGLS) approach, where we assume the following structure of the variance-covariance matrix,

$$\begin{aligned} \text {Cov}(v_{i,t},v_{j,s}) = \left\{ \begin{array}{lr} \sigma ^{\alpha }_i + \sigma ^{\gamma }_i + \sigma ^{\epsilon }_i, &{} i=j, \ t=s, \\ \sigma ^{\alpha }_i + \sigma ^{\gamma }_i, &{} i=j, \ t\ne s, \\ 0, &{} \text {otherwise} \end{array}\right\} , \end{aligned}$$

where $\sigma _i^{\alpha }>0$, $\sigma _i^{\gamma }>0$ and $\sigma _i^{\epsilon }>0$.

With this approach we again obtain an estimator of the whole coefficient vector $\varvec{\eta } = (\theta , \varvec{\beta })$ and extract the estimator of $\theta$. To construct $95\%$-confidence intervals for $\theta$, we again use Equation (6), where we apply Formula (7) with a plug-in estimator of $\varvec{\Omega }$. In the following, we use the acronym RE for the point estimator as well as the confidence interval of this approach.

In contrast to the fixed-effects approach, the variables $\varvec{D}_{i}$ that are part of the parents of the response variable, do not drop out of $\varvec{Z}_{i,t}$. Furthermore, the set of policy variables now includes testing.policy, which was dropped in the FE approach as it is constant within cantons over the period of analysis and is thus eliminated by the within transformation. In the case of the direct effect we obtain $\varvec{Z}_{i,t}= (B_{i,t},\varvec{D}_{i},H_{i,t},\varvec{W}_{i,t}, \varvec{P}_{i,t}, Y_{i,t'})$. In the case of the total effect, we need to remove the variable $B_{i,t}$ from the set, and obtain the valid adjustment set $\varvec{Z}_{i,t}= (\varvec{D}_{i},H_{i,t},\varvec{W}_{i,t}, \varvec{P}_{i,t}, Y_{i,t'})$. Concretely, the following variables are contained in each category:

$B_{i,t}$: growth.transactions,
$\varvec{D}_{i}$: population, density and perc.O80,
$H_{i,t}:$ holiday,
$\varvec{W}_{i,t}$: sunshine, temperature and humidity,
$\varvec{P}_{i,t}$: work.closing, school.closing, rest.gatherings, canc.events and testing.policy,

and the variable $Y_{i,t'}$ is specific for each of the two response variables, see Sect. 2.8.

4.3 Sensitivity analysis

We perform an extensive sensitivity analysis to investigate the robustness of our results toward changes in our methodology and data preparation. We consider further altering the confidence interval construction, extending the set of information variables, joining half-cantons into one respective canton, changing the timing of information variables, discarding outliers, varying the time period, performing the doubly robust double machine learning estimation, and using the lag-1 response variable as a further covariate.

4.4 Implementation

We implement the methodology in the software R, using the packages plm, sandwich, lmtest and DoubleML.

5 Results

For each of the three modeling approaches (fixed effects FE, debiased fixed effects DFE, and random effects RE), we distinguish between the direct and total effect of introducing a strict facial-mask policy (as compared to the government-determined country-wide baseline policy) on the two response variables. This results in 12 estimated effects along with their (various) confidence intervals.

Figure 3 shows the results for the direct effect of the strict facial-mask policy. We see that all 95% confidence intervals apart from two lie to the left of zero. Thus, the direct effect of the strict facial-mask policy was estimated to be significantly negative in almost all modeling approaches, implying a significant reduction in the expected spread of COVID-19 in the early pandemic if the facial-mask policy is changed from the government-determined country-wide baseline to the strict facial-mask policy. Figure 4 shows the results for the total effect of the strict facial-mask policy, and we see that, apart from the RE approach, the overall picture is very similar to the direct effect. In the model with response variable r, the point estimators of both the direct and total effect lie between $-0.22$ and $-0.16$. In the model with response variable growth.new.cases, the point estimators of both the direct and total effect lie between $-0.29$ and $-0.17$.

The fact that the estimated direct and total effects are very similar suggests that either the facial-mask policy worked mainly through the direct path by reducing the transmissibility of COVID-19, or the behavioral variable growth.transactions is not capturing the important changes in social distancing behavior. It is plausible that the latter is at least part of the explanation, as it is for example unclear how this variable can reflect changes in behavior in private spaces. In fact, Chernozhukov et al. (2021), who employ a closely related empirical approach for the U.S., do not find the indirect effect to be significant either.

Overall, the FE and the DFE approach provide very similar results. This indicates that the dynamic structure of the panel model does not induce a large estimation bias. The substantial difference, however, between the RE and FE approach suggests that controlling only for demographic variables to capture time-invariant cantonal information as in the RE model is insufficient. Other unobserved canton- or week-specific confounders for which we control for with the fixed-effects approach, seem to play an important role.

For all 12 modeling approaches, the Tukey-Anscombe plots (residuals vs fitted values), displayed in Fig. 6 in Appendix B, show no evidence against the assumption of linearity. The results of the extensive sensitivity analysis are shown in Table 2 in Appendix E. In line with our main analyses, in all of the cases considered, we obtain a negative point estimate of the total causal effect, ranging from $-1.08$ to $-0.04$. The estimate is deemed significantly different from 0 at $\alpha =0.05$ in 31 out of the $41\ (76\%)$ sensitivity analyses conducted.

6 Conclusion

We analyse the effect of the strict facial-mask policy on the spread of COVID-19 during the early phase of the pandemic in Switzerland, using the cantonal heterogeneity in facial-mask policies from July 2020 to December 2020. The obligation to wear a facial mask in public transportation formed the government-determined country-wide baseline for facial-mask policies. The strict facial-mask policy corresponds to mandatory mask wearing on public transport and in all public or shared spaces where social distancing is not possible.

We estimate a significant reduction in the expected spread of COVID-19 in the early pandemic if the facial-mask policy is changed from the government-determined country-wide baseline to the strict facial-mask policy. Importantly, we do not investigate whether the estimated effect sizes are relevant in any given social context.

The correctness of the causal assumptions is crucial to the whole analysis. Hence, the results should be treated with caution and interpreted in light of the mostly untestable assumptions inherent in our modeling approaches, described in Sects. 3 and 4. In particular, we emphasize that the assumption of no unmeasured confounding imposed using the RE approach is very delicate and most likely does not hold. As such one can consider the RE approach more as a sensitivity check. It is also important to stress that in an observational study like the one at hand it is almost impossible to control for all confounders that vary with weeks and cantons. It is highly likely that our effects of interest are confounded by further unobserved social, cultural and economic traits that may differ between cantons and weeks. Implementation of and compliance with non-pharmaceutical policies like the strict facial-mask policy are subject to cultural norms, political backgrounds and defiance against political authorities and policy makers, to just name a few examples of such factors.

Furthermore, our results are conditional on the characteristics of the time period between July and December 2020 during the early pandemic, when the alpha variant of the SARS-CoV-2 virus was predominant and no vaccinations were available yet. Also, only parts of the three seasons summer, autumn, and winter are represented in the data. However, even though it is hard to directly compare our results to results in other countries, with other facial-mask policies and other time periods, our findings are largely in line with those of other research groups (see, e.g., Chernozhukov et al. (2021), Mitze et al. (2020), and Pleninger et al. (2022)).

Availability of data and materials

https://github.com/enussl/Facial-Mask-Policy-COVID-19.

Notes

This was in alignment with the perspective of the World Health Organization (WHO). In July 2020, this recommendation was overthrown due to increasing scientific evidence for the effectiveness of facial masks.
See https://www.bag.admin.ch/bag/de/home/das-bag/aktuell/medienmitteilungen.msg-id-79522.html, last visited August 21, 2023, for further details.
See https://www.admin.ch/gov/de/start/dokumentation/medienmitteilungen.msg-id-79711.html, last visited August 21, 2023, for further details.
As convention, we use bold letters for multivariate variables and normal letters for univariate variables.
See https://opendata.swiss/en/dataset/covid-19-schweiz, last visited August 21, 2023, for further details.
See https://opendata.swiss/en/dataset/covid-19-schweiz, last visited August 21, 2023, for further details.
A project by the universities of St. Gallen and Lausanne, see https://monitoringconsumption.com/, last visited August 21, 2023, for further details.
See https://www.bfs.admin.ch/bfs/en/home/statistics/regional-statistics/regional-portraits-key-figures/cantons.assetdetail.20784336.html, last visited August 21, 2023, for further details.
See https://www.edk.ch/en/education-system/websites-of-the-cantons, last visited August 21, 2023, for further details.
See https://www.meteoschweiz.admin.ch/wetter/messsysteme/bodenstationen/automatisches-messnetz.html, last visited August 21, 2023, for further details.
See https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/index_methodology.md last visited August 21, 2023, for all indicators used to calculate the index.
Indicator-coding: https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/codebook.md, last visited August 21, 2023.
Note that it is not a DAG in the strict sense, since certain nodes represent groups of variables. We allow the variables within such groups to be arbitrarily causally related so that there are no cycles. An edge to or from such a group of variables indicates that we allow such an edge for each variable in the group.
A mediator between A and B is an intermediate variable that lies on a causal path between A and B. A path from node A to node B is called a causal path if all edges on the path point toward B. For a recent exposition of this topic, see Robins et al. (2022).
For further details about controlling for observed covariates, we refer to the following literature Huber (2014); Tchetgen Tchetgen and Shpitser (2012); Imai et al. (2010).
Note that estimates of $\text {Var}(\hat{\theta })$ based on $\hat{\theta }$ remain valid for the debiased estimator $\hat{\theta }_{BC}$ (Chen et al., 2019).

Abbreviations

BAG:: Bundesamt für Gesundheit
CI:: Confidence interval
DAG:: Directed acyclic graph
DFE:: Debiased fixed effects
DML:: Double machine learning
FE:: Fixed effects
FGLS:: Feasible generalized least squares
KOF:: Konjunkturforschungsstelle
OLS:: Ordinary least squares
PNDE:: Pure natural direct effect
RE:: Random effects
SEM:: Structural equation model
TCE:: Total causal effect
TNIE:: Total natural indirect effect

References

Bannert, M., & Thoeni, S. Kofdata: Get Data from the ’KOF Datenservice’ API. (2022). R package version 0.2. https://CRAN.R-project.org/package=kofdata
Chen, S., Chernozhukov, V., & Fernández-Val, I. (2019). Mastering panel metrics: Causal impact of democracy on growth. In AEA Papers and Proceedings, 109
Chen, S., Chernozhukov, V., Fernandez-Val, I., Kasahara, H., & Schrimpf, P. (2020). Cross-over jackknife bias correction for non-stationary nonlinear panel data. Forthcoming
Chernozhukov, V., Kasahara, H., & Schrimpf, P. (2021). Causal impact of masks, policies, behavior on early Covid-19 pandemic in the U.S. Journal of Econometrics, 220, 23–62.
Article MathSciNet PubMed Google Scholar
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21
Chiang, H. D., Hansen, B. E., & Sasaki, Y. (2022). Standard errors for two-way clustering with serially correlated time effects. arXiv preprint arXiv:2201.11304
Colella, F., Lalive, R., Sakalli, S. O., & Thoenig, M. (2019). Inference with arbitrary clustering. IZA Discussion Paper No. 12584. SSRN: https://ssrn.com/abstract=3449578
Cori, A., Ferguson, N. M., Fraser, C., & Cauchemez, S. (2013). A new framework and software to estimate time-varying reproduction numbers during epidemics. American Journal of Epidemiology, 178, 1505–1512.
Article PubMed Google Scholar
Daniel, R. M., De Stavola, B. L., Cousens, S. N., & Vansteelandt, S. (2015). Causal mediation analysis with multiple mediators. Biometrics, 71(1), 1–14.
Article MathSciNet CAS PubMed Google Scholar
Fattorini, D., & Regoli, F. (2020). Role of the chronic air pollution levels in the Covid-19 outbreak risk in Italy. Environmental Pollution, 264, 114732.
Article CAS PubMed Google Scholar
Hale, T., Angrist, N., Goldszmidt, R., Kira, B., Petherick, A., Phillips, T., Webster, S., Cameron-Blake, E., Hallas, L., Majumdar, S., & Tatlow, H. (2021). A global panel database of pandemic policies (Oxford Covid-19 government response tracker). Nature Human Behaviour, 5, 529–538.
Article PubMed Google Scholar
Hansen, B. (2022). Econometrics. Princeton: Princeton University Press.
Google Scholar
Huber, M. (2014). Identifying causal mechanisms (primarily) based on inverse probability weighting. Journal of Applied Econometrics, 29(6), 920–943.
Article MathSciNet Google Scholar
Huber, M., & Langen, H. (2020). Timing matters: The impact of response measures on Covid-19-related hospitalization and death rates in Germany and Switzerland. Swiss Journal of Economics and Statistics, 156
Huisman, J. S., Scire, J., Angst, D. C., Li, J., Neher, R. A., Maathuis, M. H., Bonhoeffer, S., & Stadler, T. (2022). Estimation and worldwide monitoring of the effective reproductive number of SARS-COV-2. Elife, 11, e71345.
Article CAS PubMed PubMed Central Google Scholar
Imai, K., Keele, L., & Yamamoto, T. (2010). Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science, 25(1), 51–71.
Article MathSciNet Google Scholar
Kähler, C. J., & Hain, R. (2020). Fundamental protective mechanisms of face masks against droplet infections. Journal of Aerosol Science, 148.
Mitze, T., Kosfeld, R., Rode, J., & Walde, K. (2020). Face masks considerably reduce Covid-19 cases in Germany. Proceedings of the National Academy of Sciences of the United States of America, 117.
Newey, W. K., & West, K. D. (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55(3), 703–708.
Article MathSciNet Google Scholar
Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4), 669–688.
Article MathSciNet Google Scholar
Pearl, J. (2001). Direct and indirect effects. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence. UAI’01, pp. 411–420. Morgan Kaufmann Publishers Inc., San Francisco
Pearl, J. (2009). Causality (2nd ed.). Cambridge: Cambridge University Press.
Book Google Scholar
Pearl, J., et al. (2000). Models, reasoning and inference (Vol. 19(2), p. 3). Cambridge: Cambridge University Press.
Google Scholar
Perković, E., Textor, J., Kalisch, M., & Maathuis, M. H. (2018). Complete graphical characterization and construction of adjustment sets in Markov equivalence classes of ancestral graphs. Journal of Machine Learning Research, 18(220), 1–62.
Google Scholar
Pleninger, R., Streicher, S., & Sturm, J.-E. (2022). Do Covid-19 containment measures work? Evidence from Switzerland. Swiss Journal of Economics and Statistics, 158, 5.
Article PubMed PubMed Central Google Scholar
Robins, J. M., Richardson, T. S., & Shpitser, I. (2022). An interventionist approach to mediation analysis
Shpitser, I. (2012). Appendum to “on the validity of covariate adjustment for estimating causal effects”. Personal Communication
Tchetgen Tchetgen, E. J., & Shpitser, I. (2012). Semiparametric theory for causal mediation analysis: Efficiency bounds, multiple robustness, and sensitivity analysis. Annals of Statistics, 40(3), 1816.
Article MathSciNet Google Scholar
Wheaton, W. C., & Thompson, A. K. (2020). The geography of Covid-19 growth in the US: Counties and metropolitan areas. SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3570540
Zhang, R., Li, Y., Zhang, A. L., Wang, Y., & Molina, M. J. (2020). Identifying airborne transmission as the dominant route for the spread of Covid-19. In Proceedings of the National Academy of Sciences of the United States of America (Vol. 117)
Zhu, Y., Xie, J., Huang, F., & Cao, L. (2020). Association between short-term exposure to air pollution and Covid-19 infection: Evidence from China. Science of the Total Environment, 727, 138704.
Article CAS PubMed ADS Google Scholar
Zoran, M. A., Savastru, R. S., Savastru, D. M., & Tautan, M. N. (2020). Assessing the relationship between surface levels of pm2.5 and pm10 particulate matter impact on covid-19 in Milan, Italy. Science of the Total Environment, 738, 139825.
Article CAS PubMed ADS Google Scholar

Download references

Acknowledgements

We thank the participants of the 2023 annual congress of the Swiss Society of Economics and Statistics (SSES/SGVS) in Neuchâtel for their valuable comments.

Funding

Not applicable.

Author information

Emanuel Nussli, Simon Hediger and Meta-Lina Spohn contributed equally to this work.

Authors and Affiliations

Seminar for Statistics, ETH Zurich, Zurich, Switzerland
Emanuel Nussli, Meta-Lina Spohn & Marloes H. Maathuis
Department of Economics, University of Zurich, Zurich, Switzerland
Simon Hediger

Authors

Emanuel Nussli
View author publications
You can also search for this author in PubMed Google Scholar
Simon Hediger
View author publications
You can also search for this author in PubMed Google Scholar
Meta-Lina Spohn
View author publications
You can also search for this author in PubMed Google Scholar
Marloes H. Maathuis
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.N., S.H. and M-L.S. contributed equally to the manuscript. M.H.M. advised the creation of this manuscript with her statistical expertise. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Simon Hediger.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Evolution of the strict facial-mask policy

Appendix B: Tukey-Anscombe plots (residuals vs fitted values)

Appendix C: Details on the confidence interval construction

Subsequently, we present the seven estimators for $\varvec{\Omega }$, defined in (8) (with their short name in brackets).

1)
Heteroscedastic-Robust (HC3) ($\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\ne 0$ iff $i=j$ and $s=t$):
$$\begin{aligned} \hat{\varvec{\Omega }}_1 :=\frac{1}{(NT)^2}\sum _{i=1}^N\sum _{t=1}^T \varvec{X}_{i,t}\varvec{X}_{i,t}^\top \tilde{\epsilon }_{i,t}^2, \end{aligned}$$
where the residual $\tilde{\epsilon }_{i,t}$ is given by the classical HC3 representation
$$\begin{aligned} \tilde{\epsilon }_{i,t} :=\frac{\hat{\epsilon }_{i,t}}{(1-\varvec{X}_{i,t}^{\top }(\varvec{X}^{\top }\varvec{X})^{-1}\varvec{X}_{i,t})}. \end{aligned}$$
2)
One-Way Clustering on Canton (Canton) ($\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\ne 0$ iff $i=j$):
$$\begin{aligned} \hat{\varvec{\Omega }}_2 :=\frac{1}{(NT)^2}\sum _{i=1}^N \hat{\varvec{R}}_i\hat{\varvec{R}}_i^\top , \end{aligned}$$
where $\hat{\varvec{R}}_i :=\sum _{t=1}^T \varvec{X}_{i,t}\hat{\epsilon }_{i,t}.$
3)
One-Way Clustering on Week (Week) ($\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\ne 0$ iff $t=s$):
$$\begin{aligned} \hat{\varvec{\Omega }}_3 :=\frac{1}{(NT)^2}\sum _{t=1}^T \hat{\varvec{S}}_t\hat{\varvec{S}}_t^\top , \end{aligned}$$
(10)
where $\hat{\varvec{S}}_t :=\sum _{i=1}^N \varvec{X}_{i,t}\hat{\epsilon }_{i,t}.$
4)
Two-Way Clustering on Canton and Week (Canton-Week) ($\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\ne 0$ iff $t=s$ or $i=j$):
$$\begin{aligned} \hat{\varvec{\Omega }}_4&:=\frac{1}{(NT)^2} \biggl ( \sum _{i=1}^N \hat{\varvec{R}}_i\hat{\varvec{R}}_i^\top + \sum _{t=1}^T \hat{\varvec{S}}_t\hat{\varvec{S}}_t^\top \nonumber \\&- \sum _{i=1}^N\sum _{t=1}^T\varvec{X}_{i,t}\varvec{X}_{i,t}^\top \hat{\epsilon }_{i,t}^2 \biggr ). \end{aligned}$$
(11)
5)
Newey-West (NW) (Newey and West, 1987) ($\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\ne 0$ iff $i=j$, where $\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})$ is decreasing with $|t-s|$ increasing):
$$\begin{aligned} \hat{\varvec{\Omega }}_5&:=\frac{1}{(NT)^2} \biggl ( \sum _{i=1}^N \hat{\varvec{R}}_i\hat{\varvec{R}}_i^\top + \sum _{t=1}^T \hat{\varvec{S}}_t\hat{\varvec{S}}_t^\top \nonumber \\&- \sum _{i=1}^N\sum _{t=1}^T\varvec{X}_{i,t}\varvec{X}_{i,t}^\top \hat{\epsilon }_{i,t}^2 \nonumber \\&+ \sum _{m=1}^{M}w(m,M)( \hat{\varvec{G}}_m+\hat{\varvec{G}}_m^{\top } \nonumber \\&-\hat{\varvec{H}}_m-\hat{\varvec{H}}_m^{\top } )\biggr ), \end{aligned}$$
(12)
where $\hat{\varvec{G}}_m :=\sum _{t=1}^{T-m}\hat{\varvec{S}}_t \hat{\varvec{S}}_{t+m}^{\top },$ $\hat{\varvec{H}}_m :=\sum _{i=1}^N\sum _{t=1}^{T-m}\varvec{X}_{i,t}\hat{\epsilon }_{i,t}\varvec{X}_{i,t+m}^{\top }\hat{\epsilon }_{i,t+m}$, $w(m,M)=1-m/(M+1)$ are triangular weights and $M=\lfloor T^{1/4} \rfloor$.
6)
Chiang-Hansen (CH) (Chiang et al., 2022) ($\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\ne 0$, where $\textrm{Cov}(\epsilon _{i,t},\epsilon _{j,s})$ for arbitrary $i\ne j$ is decreasing with $|t-s|$ increasing): The estimator $\hat{\varvec{\Omega }}_6$ is given by Equation (12), where w(m, M) are the triangular weights as in NW and M is data driven.
7)
Informal Own Specification (Own) (motivated by (Colella et al., 2019)) ($\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\ne 0$ if $i=j$ or i and j are neighboring cantons, where $\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})$ is decreasing with $|t-s|$ increasing):
$$\begin{aligned} \hat{\varvec{\Omega }}_7 :=\frac{1}{(NT)^2} \sum _{i=1}^N\sum _{t=1}^{T}\sum _{j=1}^N\sum _{s=1}^{T} \omega _{itjs}V_{itjs}, \end{aligned}$$
where
$$\begin{aligned} V_{itjs} :=\varvec{X}_{i,t}\hat{\epsilon }_{i,t}\hat{\epsilon }_{j,s}\varvec{X}_{j,s}^{\top }, \end{aligned}$$
and the weights $\omega _{itjs}$ specify the dependence between two error terms $\epsilon _{i,t}$ and $\epsilon _{j,s}$ and are given by
$$\begin{aligned} \omega _{itjs} :=\left\{ \begin{array}{lr} 1, &{} i=j, \ t=s,\\ \lambda _{ij}0.5^{\mid t-s \mid }, &{} \text {otherwise} \end{array}\right\} , \end{aligned}$$
and
$$\begin{aligned} \lambda _{ij} :=\left\{ \begin{array}{lr} 1, &{} i=j, \\ 0.5, &{} i,j \ \text {neighbors}, \\ 0, &{} \text {otherwise} \end{array}\right\} . \end{aligned}$$

Appendix D: Proof of Lemma 4.1

Lemma D.1

(Restatement of Lemma 4.1) The TNIE and the PNDE can be expressed in terms of the regression coefficients in Equations (1) via

$$\begin{aligned} \text {TNIE}&= \delta _3\delta _4, \ \text {and}\\ \text {PNDE}&= \delta _1. \\ \end{aligned}$$

Proof

Plugging the expressions in Equations (1) into the mediator equation for both do-interventions, respectively, we get

$$\begin{aligned} B_{i,t}(1) = \delta _4 +\varvec{\delta _5}^{\top } \varvec{\tilde{V}}_{i,t} +\nu _{i,t} \end{aligned}$$

and

$$\begin{aligned} B_{i,t}(0) = \varvec{\delta _5}^{\top } \varvec{\tilde{V}}_{i,t} +\nu _{i,t}. \end{aligned}$$

Using the definitions of TNIE and PNDE we get

$$\begin{aligned} \text {TNIE}&= {\mathbb E}\left[ Y_{i,t}(1, B_{i,t}(1))\right] - {\mathbb E}\left[ Y_{i,t}(1, B_{i,t}(0))\right] \\&= \delta _1 + \varvec{\delta _2}^{\top } {\mathbb E}\left[ \varvec{V}_{i,t}\right] + {\mathbb E}\left[ \epsilon _{i,t}\right] + \\&\quad \delta _3 \left( \delta _4 +\varvec{\delta _5}^{\top } {\mathbb E}\left[ \varvec{\tilde{V}}_{i,t}\right] +{\mathbb E}\left[ \nu _{i,t}\right] \right) - \\&\quad \bigg (\delta _1 + \varvec{\delta _2}^{\top } {\mathbb E}\left[ \varvec{V}_{i,t}\right] +{\mathbb E}\left[ \epsilon _{i,t}\right] + \\&\quad \delta _3 \left( \varvec{\delta _5}^{\top } {\mathbb E}\left[ \varvec{\tilde{V}}_{i,t}\right] +{\mathbb E}\left[ \nu _{i,t}\right] \right) \bigg ) \\&= \delta _3\delta _4, \end{aligned}$$

and

$$\begin{aligned} \text {PNDE}&= \left( {\mathbb E}\left[ Y_{i,t}(1, B_{i,t}(0))\right] - {\mathbb E}\left[ Y_{i,t}(0, B_{i,t}(0))\right] \right) \\&= \delta _1 + \varvec{\delta _2}^{\top } {\mathbb E}\left[ \varvec{V}_{i,t}\right] + {\mathbb E}\left[ \epsilon _{i,t}\right] + \\&\quad \delta _3 \left( \varvec{\delta _5}^{\top } {\mathbb E}\left[ \varvec{\tilde{V}}_{i,t}\right] +{\mathbb E}\left[ \nu _{i,t}\right] \right) - \\&\quad \bigg ( \varvec{\delta _2}^{\top } {\mathbb E}\left[ \varvec{V}_{i,t}\right] +{\mathbb E}\left[ \epsilon _{i,t}\right] + \\&\quad \delta _3 \left( \varvec{\delta _5}^{\top } {\mathbb E}\left[ \varvec{\tilde{V}}_{i,t}\right] +{\mathbb E}\left[ \nu _{i,t}\right] \right) \bigg ) \\&= \delta _1. \end{aligned}$$

$\square$

Appendix E: Sensitivity analysis

We perform an extensive sensitivity analysis to investigate the robustness of our results. In the following, we explain the robustness checks conducted and subsequently present the results in Table 2. We restrict the robustness checks to the total effect of the strict facial-mask policy on both response variables and do not consider the direct effect.

Table 2 Results of the sensitivity analyses for the estimation of the total effect for both responses r and growth.new.cases. For the FE, DFE and double machine learning approach, the $95\%$-confidence intervals ($CI_{95\%}$) are constructed via two-way clustering on the canton and week (Canton-Week). For the RE approach, we compute the standard errors as described in Sect. 4.2. For the sake of comparison, we add the main results of the main text at the top of the table

Full size table

1.1 E.1 Alterations to confidence interval construction

For the point estimators of the FE and DFE approaches, described in Sects. 4.1 and 4.2, we compute the standard errors using one-way clustering on the month instead of the week in Formula (10) (Month FE and Month DFE) as well as two-way clustering on the canton and the month in Formula (11) (Canton-Month FE and Canton-Month DFE).

1.2 E.2: Alterations to the data and point estimation

The upcoming sections describe changes to the data and point estimation. For all these adaptations we construct the confidence intervals with one method only: For the FE and DFE approaches we compute the standard errors using the two-way clustering on the canton and week (Canton-Week), given in Formula (11), to construct the confidence intervals. For the RE approach, we compute the standard errors as described in Sect. 4.2.

1.2.1 Additional information variables

For the RE approach and the approximate weekly growth rate in supposed new infections as response variable we include additional covariates as motivated by Chernozhukov et al. (2021). To describe them, consider the following new definitions,

$$\begin{aligned} &G^{\prime}_{i,t} :=\log \left( C_{i,t}\right) , \\&G_{t}.\text {nat} :=\log \left( \frac{C_{t}.\text{nat}}{C_{t-1}.\text {nat}}\right) , \\&G^{\prime}_{t}.\text{nat} :=\log \left( C_{t}.\text{nat}\right) ,\end{aligned}$$

where $C_{i,t}$ represents the number of new confirmed cases in canton i in week t and $C_{t}.\text {nat}$ represents the national number of new confirmed cases in week t. We use $Y_{i,t}=G_{i,t+2}$, and the corresponding set $\varvec{Y}_{i,t'}= (G_{i, t+1}, G_{i,t},G'_{i,t},G_{t}.\text {nat},G'_{t}.\text {nat} )$. To the adjustment set $\varvec{Z}_{i,t}$ we also add $\log \left( T_{i,t}\right)$, where $T_{i,t}$ represents the number COVID-19-tests performed in canton i in week t, representing an additional information variable. Note that this is only done for the RE model as the variables at the national level are omitted in the FE and DFE approaches through the within transformation.

1.2.2 Half-Cantons

The observations from the half-cantons Basel-Landschaft and Basel-Stadt, Appenzell-Inner- rhoden and Appenzell-Ausserrhoden and Obwalden and Nidwalden are combined into one canton by taking the average of the two observations, respectively. The only exceptions are the calculation of growth.new.cases and growth.transactions, where we sum the number of new cases or transactions, respectively, from the two half-cantons, and then calculate the approximate growth rates as before. We perform the analysis for the FE, DFE and RE approaches.

1.2.3 Timing of information variables

We examine the influence of the lag of the information variable that is part of the lagged response variable $Y_{i,t'}$. For the response r, we change the lag of the information variable from $t'=3$ to $t'=2$, resulting in the lagged response variable $Y_{i,t'} = R_{i,t-2}$. For the response growth.new.cases, we change the lag of the information variable from $t'=2$ to $t'=3$, resulting in the lagged response variable $Y_{i,t'} = G_{i,t-1}$. We perform the analysis for the FE, DFE and RE approaches.

1.2.4 Outliers

We fit the FE approach as described in Sect. 4.1, compute the Cook’s distance for each observation and exclude observations with a corresponding Cook’s distance $>4\times (NT)^{-1}$. We then refit the model using the FE approach based on the reduced sample. In the model where $\texttt {r}$ is the response, 28 observations are excluded. When $\texttt {growth.new.cases}$ is the response, 27 observations are excluded. Since the calculation of the Cook’s distance in the DFE and RE approaches is not straightforward, we restrict this robustness check to the FE model.

1.2.5 Very short sample period

We restrict the period of analysis to the time between August 21, 2020, and October 19, 2020. On August 21, 2020, the canton of Neuchâtel was the first canton to introduce the strict facial-mask policy. On October 19, 2020, the federal government enforced the strict facial-mask policy nationwide. This period is very short with only $T=9$ weeks which constitutes a problem for the DFE and RE approach. We thus perform the analysis only for the FE approach.

1.2.6 Short sample period

We restrict the period of analysis to the time between July 6, 2020, and October 18, 2020. During this time-window, the cantons were free to choose between the strict facial-mask policy and the government-determined country-wide baseline policy. With only $T=15$ weeks this period is also short.

1.2.7 Double machine learning approach

We relax the assumption of a linear regression model to a partially linear regression model, where the effect of the adjustment set $\varvec{Z}_{i,t}$ on $Y_{i,t}$ is nonparametric. We use the adjustment set of the RE approach. Estimation is done via the double machine learning framework (Chernozhukov et al., 2018), which is a doubly robust method. This approach assumes the following model,

$$\begin{aligned}&M_{i,t} = m(\varvec{Z}_{i,t})+\nu _{i,t},\\&Y_{i,t} = \theta M_{i,t} + g(\varvec{Z}_{i,t})+\epsilon _{i,t}, \end{aligned}$$

where $\mathbb {E}[\nu _{i,t}\mid \varvec{Z}_{i,t}] = \mathbb {E}[\epsilon _{i,t}\mid M_{i,t},\varvec{Z}_{i,t}]=0.$ We learn the functions $m(\cdot )$ and $g(\cdot )$ with random forests. See Chernozhukov et al. (2018) for more details. We implement the procedure with the R-package DoubleML.

1.2.8 Lag-1 response variable as covariate

In addition to the information variable $Y_{i,t'}$, we consider for both response variables an additional lag of the response variable as covariate. We include the lag-1 response variable $Y_{i,t-1}$ in the models, since for both response variables we observe a possibly nonzero autocorrelation at lag one for some cantons.

1.3 E.3: Results sensitivity analysis

We show the results of all considered sensitivity analyses in Table 2. We obtain a negative point estimate of the total effect, ranging from $-1.08$ to $-0.04$. The estimate is deemed significantly different from 0 at $5\%$-level in 31 out of the $41\ (76\%)$ sensitivity analyses conducted.

We provide some general remarks on the results. As stressed earlier, the RE approach cannot control for unobserved confounding and is therefore less trustworthy than the FE and DFE approaches. In some of the sensitivity analyses, the DFE approach produces estimates that vary substantially from the FE approach. Due to the properties discussed in Sect. 4.1, the DFE approach is more trustworthy. For the DML approach, we obtain much larger estimated effect sizes of the total effect. However, the uncertainty is very large, so the results are not significant. Using this methodology, we also cannot control for unmeasured confounding. As there is no clear pattern apparent in the Tukey-Anscombe plots in Fig. 6, we suspect that the difference in point estimates between the linear methods and the nonlinear DML methodology is mostly driven by the latter’s lack to control for unobserved confounding—and not by an underlying nonlinear relationship between the facial-mask policy and the response variables.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nussli, E., Hediger, S., Spohn, ML. et al. The effect of a strict facial-mask policy on the spread of COVID-19 in Switzerland during the early phase of the pandemic. Swiss J Economics Statistics 160, 2 (2024). https://doi.org/10.1186/s41937-024-00119-0

Download citation

Received: 12 December 2022
Accepted: 06 January 2024
Published: 12 February 2024
DOI: https://doi.org/10.1186/s41937-024-00119-0

The effect of a strict facial-mask policy on the spread of COVID-19 in Switzerland during the early phase of the pandemic

Abstract

1 Introduction

2 Data

2.1 Response variables (\(Y_{i,t}\))

2.2 Strict facial-mask policy variable (\(M_{i,t}\))

2.3 Social distancing behavior variable (\(B_{i,t}\))

2.4 Demographic variables (\(\varvec{D}_{i}\))

2.5 Holiday indicator (\(H_{i,t}\))

2.6 Meteorological variables (\(\varvec{W}_{i,t}\))

2.7 Non-pharmaceutical policy variables (\(\varvec{P}_{i,t}\))

2.8 Lagged response variables as covariates (\(Y_{i,t'}\))

3 Causal assumptions and effects

4 Methodology

Lemma 4.1

4.1 Fixed-effects approach

4.1.1 Construction of confidence intervals

4.2 Random-effects approach

4.3 Sensitivity analysis

4.4 Implementation

5 Results

6 Conclusion

Availability of data and materials

Notes

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendices

Appendix A: Evolution of the strict facial-mask policy

Appendix B: Tukey-Anscombe plots (residuals vs fitted values)

Appendix C: Details on the confidence interval construction

Appendix D: Proof of Lemma 4.1

Lemma D.1

Proof

Appendix E: Sensitivity analysis

1.1 E.1 Alterations to confidence interval construction

1.2 E.2: Alterations to the data and point estimation

1.2.1 Additional information variables

1.2.2 Half-Cantons

1.2.3 Timing of information variables

1.2.4 Outliers

1.2.5 Very short sample period

1.2.6 Short sample period

1.2.7 Double machine learning approach

1.2.8 Lag-1 response variable as covariate

1.3 E.3: Results sensitivity analysis

Rights and permissions

About this article

Cite this article

Share this article

Keywords