 Original article
 Open access
 Published:
The effect of a strict facialmask policy on the spread of COVID19 in Switzerland during the early phase of the pandemic
Swiss Journal of Economics and Statistics volume 160, Article number: 2 (2024)
Abstract
During several weeks in the second half of the year 2020, the cantons of Switzerland could choose to adopt the governmentdetermined facialmask policy, corresponding to mandatory facialmask wearing on public transport, or a strict facialmask policy, corresponding to mandatory facialmask wearing on public transport and in all public or shared spaces where social distancing was not possible. We estimate the effect of introducing the strict facialmask policy on the spread of COVID19 in Switzerland during this first phase of the pandemic in 2020, using the cantonal heterogeneity in facialmask policies. We adjust for social distancing behavior, weather, other nonpharmaceutical policies and further variables. We estimate a significant reduction in the expected spread of COVID19 in the early pandemic if the strict facialmask policy is adopted.
1 Introduction
The coronavirus disease (COVID19) pandemic presented large challenges to societies around the world. In the early pandemic in 2020, where the alpha variant of the SARSCoV2 virus was predominant, knowledge about the spread of the virus and about COVID19 was scarce. In close collaboration with science, politicians and decision makers were trying to contain the spread of COVID19 while avoiding unnecessary restrictions. Nonpharmaceutical interventions such as school closures, restrictions on public and private gatherings and enforcement of home office were employed.
In this paper, we focus on the effect of introducing a strict facialmask policy on the containment of COVID19 in Switzerland during the first phase of the pandemic in 2020. Studying the effect of the facialmask policy is especially interesting as it is arguably one of the most debated policies. This might be partially due to the position that the Federal Office of Public Health of the Swiss Confederation (BAG) took in March 2020, communicating that healthy people do not need to wear facial masks.^{Footnote 1} A second reason for focusing on the facialmask policy is that it is a relatively cheap and noninvasive policy when compared to other nonpharmaceutical interventions.
After the countrywide lockdown in Switzerland from mid of March to end of April 2020, the federal government determined countrywide lower bounds on containment measures. The 26 Swiss cantons were given partial autonomy in introducing COVID19 containment measures.^{Footnote 2} On July 6, 2020, wearing facial masks on public transport was made obligatory and thus formed the countrywide baseline for facialmask policies.^{Footnote 3} Cantons could choose to enforce mandatory mask wearing on public transport and in all public or shared spaces where social distancing was not possible, which we henceforth refer to as the strict facialmask policy. On October 19, 2020, the government enforced the strict facialmask policy for all cantons. On December 21, 2020, vaccinations against COVID19 were initiated in Switzerland, marking a massive change point in the pandemic. We thus consider the period from July 6, 2020, to December 20, 2020, as our period of analysis. During the whole period of analysis, a coordinated information campaign and international restrictions on entering the country were in place. On October 29, 2020, nationwide restrictions on public events were introduced by the government. On November 2, 2020, universities were closed in Switzerland.
We quantify the spread of COVID19 by two different, but related response variables: the estimated effective reproductive number (Huisman et al., 2022) and the approximate weekly growth rate in supposed new infections (Chernozhukov et al., 2021).
To identify the effect of the strict facialmask policy, we impose causal assumptions similar to Chernozhukov et al. (2021). We use a directed acyclic graph (DAG) to visualize the assumed causal relationships among the facialmask policy variable, the response variable and different sets of control variables. By regressing the response variable on a suitable set of control variables determined in the DAG, we identify both the direct and total effect of the strict facialmask policy variable on each of the two response variables. The direct effect, where direct means with respect to the variables we consider, captures changes in the response variable due to changes in the strict facialmask policy variable, keeping the rest fixed. The total effect additionally captures changes in the response variable that are mediated through changes in the social distancing behavior.
We use publicly available data from different sources. The data have a balanced panel structure, as we have observations for each of the 26 cantons of Switzerland, measured during the 24 weeks considered. For both response variables, we assume a linear generating equation with a twoway error component, including a canton and weekspecific part. This model allows us to account for dependencies between the observations within cantons and weeks. Depending on the assumptions on the error components, a specific linear regression model is estimated with either a fixedeffects or randomeffects approach to estimate the total and direct effect.
For both response variables with both fixed and randomeffects approaches, we obtain negative point estimates of both direct and total effect, most of them being significant. In other words, we estimate an expected reduction in the spread of COVID19 in the early pandemic comparing the strict facialmask policy to the governmentdetermined countrywide baseline. We perform various sensitivity analyses to confirm the robustness of our results with respect to inevitable modeling choices.
To our knowledge, this is the first study that statistically analyses the effect of the strict facialmask policy on the spread of COVID19 in Switzerland during the early phase of the pandemic. Pleninger et al. (2022) analyse the combination of all COVID19related policies in Switzerland, as measured by the Stringency Index of the Konjunkturforschungsstelle (KOF). They do not examine the isolated effect of the strict facialmask policy. For Switzerland and Germany, Huber and Langen (2020) study the impact of the timing of the nonpharmaceutical policy of lockdowns on COVID19related death and hospitalization rates. They find that an early introduction reduces said rates substantially.
In other countries however, the effect of facialmask policies has been studied. For the USA, Chernozhukov et al. (2021) study the effect of mandatory mask wearing at the workplace. They estimate a significant reduction of the approximate weekly growth rate in supposed new infections by around 0.1. For Germany, Mitze et al. (2020) find a 15–75% reduction of new cases 20 days after the introduction of mandating facial masks in public transport and stores. Zhang et al. (2020) find that mandatory facial masks considerably slow down infection growth for the analysed entities of New York, Wuhan and Italy. There are studies confirming the functionality of facial masks in hindering transmission of viral droplets in laboratory settings (see e.g., Kähler and Hain (2020)). However, in observational settings it is the effect of facialmasks policies that is analysed, which includes mechanisms such as changes in risktaking behavior and misuse of facial masks.
Direct comparison of our results to estimates in other countries and time spans is hard, due to different facialmask policies and/or general differences between countries and their population. In essence, however, our findings support the existing literature regarding the sign and significance of the effect of a strict facialmask policy on the spread of COVID19.
The article is organized as follows. Section 2 explains the data, Sect. 3 explains the causal assumptions and the causal effects of interest, and Sect. 4 describes the methodology. Section 5 presents the results. Finally, we conclude in Sect. 6.
2 Data
Our data are measured in each of the \(i=1,\ldots , N=26\) cantons in Switzerland in each of the \(t=1, \ldots , T=24\) weeks in the period of analysis, ranging from July 6, 2020, until December 20, 2020. We use weekly data because data at the daily resolution would artificially increase the sample size with highly dependent observations, which would lead to faulty statistical inference. Our variable of interest, the socalled treatment variable, is the strict facialmask policy variable, were 1 indicates the strict policy was in place while 0 indicates the governmentdetermined countrywide baseline was in place. To quantify the spread of COVID19, we use two different response variables: the first is the estimated effective reproductive number and the second is the approximate weekly growth rate in supposed new infections. In the control variables we only consider variables that varied between cantons and/or weeks in the period of analysis.
We can summarize the variables, observed in canton \(i=1,\ldots ,N\) in week \(t=1,\ldots , T\), into eleven groups:^{Footnote 4}

\(Y_{i,t}\): response variable quantifying the spread of COVID19 (estimated effective reproductive number or the approximate weekly growth rate in supposed new infections),

\(M_{i,t}\): strict facialmask policy variable (treatment variable),

\(B_{i,t}\): (social distancing) behavior variable, quantified by financial transactions,

\(\varvec{D}_{i}\): demographic variables that are cantonspecific,

\(H_{i,t}\): holiday indicator variable,

\(\varvec{W}_{i,t}\): meteorological variables reflecting the weather situation,

\(\varvec{P}_{i,t}\): nonpharmaceutical policy variables (excluding \(M_{i,t}\)),

\(Y_{i,t'}\): response variable \(Y_{i,t}\) lagged to the past with lag \(t'\),

\(\varvec{U}^1_{i}\): unmeasured and not further specified cantonspecific variables,

\(\varvec{U}^2_{t}\): unmeasured and not further specified weekspecific variables,

\(\varvec{U}^3_{i,t}\): unmeasured and not further specified variables that might vary between weeks and/or cantons.
Subsequently, we present the specific variables (with their short name in brackets) in each of the above categories (apart from \(\varvec{U}^1_{i}\), \(\varvec{U}^2_{t}\), and \(\varvec{U}^3_{i,t}\)). We present a list of these 16 variables with their sources and descriptive statistics in Table 1.
2.1 Response variables (\(Y_{i,t}\))
The first response variable is the estimated effective reproductive number (r) of Huisman et al. (2022). The estimated effective reproductive number at day d is an estimate of the expected number of secondary infections at day d caused by a previously infected person. Its estimation involves multiple steps: 1) estimation of the number of newly infected people based on the number of newly confirmed cases, adjusting for reporting cycles and irregular reporting practices, 2) a deconvolution step using suitable delay distributions between transmission and reporting of the case to infer the actual infection incidence, 3) application of the EpiEpstim method developed by Cori et al. (2013) to estimate the effective reproductive number from the time series of newly infected people. Only cases stemming from infections within Switzerland are used for estimation. In each canton i, to obtain an observation for week t, we average the daily values within week t. We denote this response variable by \(Y_{i,t}=R_{i,t}\). We obtain the data from the Federal Office of Public Health of Switzerland.^{Footnote 5}
The second response variable is the same as the one used in Chernozhukov et al. (2021), the approximate weekly growth rate in supposed new infections from week \(t1\) to week t. To specify this response, we define for each canton i and week t
where \(C_{i,t}\) represents the number of reported new cases in canton i in week t. Due to the delay between the reporting of a new case and the actual infection with the virus, \(G_{i,t}\) does not represent the pandemic situation in week t but of a time period before t. Therefore, to obtain an approximation of the weekly growth rate in supposed new infections from week \(t1\) to week t, we need to use a future value of \(G_{i,t}\). We employ the same time shift of two weeks to the future as Chernozhukov et al. (2021), resulting in the response variable \(Y_{i,t}=G_{i, t+2}\) (growth.new.cases). We obtain the data on reported new cases from the Federal Office of Public Health of Switzerland.^{Footnote 6}
We plot both responses in Fig. 1 for all 26 cantons in the period of analysis. The plots and the Pearson correlation coefficients \(\rho\) highlight that the two responses are similar but there is no onetoone correspondence between the estimated effective reproductive number and the approximate weekly growth rate in supposed new infections from week \(t1\) to week t.
2.2 Strict facialmask policy variable (\(M_{i,t}\))
The strict facialmask policy variable (facial. mask) is our treatment variable. In each canton i, it has a value of 1 if the strict policy is applied and a value of 0 if the governmentdetermined baseline policy is applied. During the period of analysis, a total of 10 cantons deviate from the baseline policy by preemptively introducing the strict facialmask policy. At a daily resolution, there are 1807 observations where the strict policy is in place and 2561 where the baseline policy is implemented. For more details, see Table 1 and Fig. 5 in Appendix A.
To obtain an observation for week t, we average the daily values within week t. We obtain the data from KOF, via their CRAN Rpackage kofdata (Bannert et al., 2022).
2.3 Social distancing behavior variable (\(B_{i,t}\))
As a proxy for social distancing behavior, we use household spending, similar to Pleninger et al. (2022). In each canton i, we consider the approximated growth rate of transactions in CHF with credit cards, debit cards and bank transfers from mobile phones of Swiss residents (growth.transactions). Ecommerce is not considered.
We obtain the data from Monitoring Consumption Switzerland.^{Footnote 7}
2.4 Demographic variables (\(\varvec{D}_{i}\))
Demographic variables of a canton i are given by population size (population) and the percentage of people with age \(\ge 80\) years (perc.o80). Wheaton and Thompson (2020) show that infection growth is also strongly linked to residential density, that is the number of people per km\(^2\) of settlement area (density), which we also consider. These three variables can be considered constant for all weeks t. We obtain the data from the Federal Statistical Office of Switzerland.^{Footnote 8}
2.5 Holiday indicator (\(H_{i,t}\))
In each canton i the daily holiday indicator (holiday) has a value of 1 if the majority of public schools in the canton are on holiday and 0 otherwise. To obtain an observation for week t, we average the daily values within week t.
We obtain the holiday data from the cantonal education departments.^{Footnote 9}
2.6 Meteorological variables (\(\varvec{W}_{i,t}\))
Zoran et al. (2020) suggest that weather conditions are closely linked to the spread of COVID19. In particular, they find that dry air supports the transmission of COVID19. Their findings are supported by Zhu et al. (2020) and Fattorini and Regoli (2020). To incorporate these effects, we assemble weather data from a total of 100 weather stations from SwissMetNet,^{Footnote 10} not including stations on mountains. For each canton, we compute average daily weather values by weighting observations of stations by the population size of the respective municipality. Lastly, the canton of BaselStadt is mapped to the weather of canton BaselLand, Appenzell Innerrhoden and Appenzell Ausserrhoden to St.Gallen and Nidwalden to Obwalden, due to the lack of suitable stations. In doing so, we get a characterization of the daily weather, quantified by the number of minutes of sunshine (sunshine), the mean air temperature in \(^\circ \text {C}\) (temperature), and the relative humidity in \(\%\) (humidity). To obtain an observation for week t, we average the daily values within week t.
2.7 Nonpharmaceutical policy variables (\(\varvec{P}_{i,t}\))
The KOF Stringency Plus Index (Pleninger et al., 2022), the Government Response Index and the Economic Support Index (Hale et al., 2021),^{Footnote 11} compose different sets of policy variables into one index with the aim of reflecting the stringency of a government in regards to COVID19 policies.
We do not use these indices, but use the policy variables directly, where we only consider those that vary at least across cantons or across weeks over the period of analysis. In each canton i, these policy variables are daily indicators for workplace closings (work.closing), school closings (school.closings), restrictions on gatherings (rest.gatherings), cancelation of public events (canc.events), and testing policy (testing.policy).^{Footnote 12} These indicators have 2 to 5 levels, where a higher level indicates a stricter policy. To obtain an observation for week t, we average the daily values within week t. We obtain the data from KOF, provided through their CRAN Rpackage kofdata (Bannert et al., 2022).
2.8 Lagged response variables as covariates (\(Y_{i,t'}\))
We consider a lagged response variable as covariate (Y.lagged): We include the lagged response variable of which the value is known in week t, summarizing the information about the pandemic situation that is available and communicated to the public in week t. Knowledge about the current pandemic situation strongly drives the policy decisions and the behavior of the population.
If we consider the weekly average estimated effective reproductive number as response variable, that is \(Y_{i,t}=R_{i,t}\), the information variable is given by \(Y_{i,t'} = R_{i,t3}\), which corresponds to the estimated effective reproductive number of three weeks ago. This lag is due to time delays between the infection, the start of the symptoms and the report of a case, such that in week t only the value of three weeks ago is known.
If we consider the approximate weekly growth rate in supposed new infections from week \(t1\) to week t as response variable, that is \(Y_{i,t}=G_{i,t+2}\), the information variable is given by \(Y_{i,t'} = G_{i,t}\), a value that we assume to be readily available in week t.
3 Causal assumptions and effects
We assume a directed acyclic graph (DAG) among the eleven sets of variables, which is a graphical model displaying the causal relationships among the variables. Based on this DAG we identify the direct and total effect of the treatment variable on the response variables. Our DAG is based on the DAG of Chernozhukov et al. (2021), while we adapt the causal structure to our setting.
The DAG^{Footnote 13} is displayed in Fig. 2. The causal relationships between the variables are assumed to be the same for both response variables. The graycolored nodes represent the strict facialmask policy, our treatment variable, and the response variable. The white nodes represent the covariates. A directed edge \(A\rightarrow B\) between nodes A and B represents a causal relationship, where a change in A results in a change in B.
The measured (= observed) covariates are displayed within black circles. Note that all variables, apart from \(Y_{i,t'}\), are indexed by the same week t. Further, we do not allow for spillover effects between cantons. The three types of unmeasured covariates \(\varvec{U}^1_{i}\), \(\varvec{U}^2_{t}\), and \(\varvec{U}^3_{i,t}\) are displayed within light gray circles. Their relations to the other groups of variables varies with the modeling approach: We always allow for an unmeasured common cause \(\varvec{U}^3_{i,t}\) of \(M_{i,t}\) and \(\varvec{P}_{i,t}\) (displayed with solid light gray edges). We also always allow for unmeasured causes of \(Y_{i,t}\) that are constant within weeks or cantons, given by \(\varvec{U}^1_{i}\) and \(\varvec{U}^2_{t}\) (displayed with solid light gray edges into \(Y_{i,t}\)). However, only in the fixedeffects model, described upcoming in Sect. 4, we allow \(\varvec{U}^1_{i}\) and \(\varvec{U}^2_{t}\) to be also causes of the other input variables (displayed by dashed light gray edges), such that they constitute unobserved confounders between \(M_{i,t}\) and \(Y_{i,t}\).
The strict facialmask policy variable \(M_{i,t}\) is assumed to influence the response variable directly or indirectly. The blue edge, \(M_{i,t} \rightarrow Y_{i,t}\), represents the direct effect. The path in orange, \(M_{i,t} \rightarrow B_{i,t} \rightarrow Y_{i,t}\), represents the indirect effect, the effect of \(M_{i,t}\) on the spread of COVID19 through its effect on the mediator^{Footnote 14}\(B_{i,t}\). The sum of the direct and indirect effects results in the total effect of the strict facialmask policy variable on the response variable.
The orange edge \(M_{i,t} \rightarrow B_{i,t}\), which is part of the indirect effect, corresponds to alternations in social distancing behavior of the public in canton i in week t due to changes in \(M_{i,t}\). For example, some people might increase social contacts because the obligation to wear a mask gives them a feeling of security. Note, we assume that the behavior variable in week t is only affected by the policy value in week t and not the past policy values. We argue this is justified since the salience of the pandemic, which could be represented by past policy values, is already represented by the information variable \(Y_{i,t'}\), as described in Sect. 2.8. In addition, the time period we investigate corresponds to the very early stage of the pandemic, thus most people were responding promptly to changes in the policy, that is their behavior in week t was only affected by the policy in week t.
We now give a more formal definition of the total, direct and indirect causal effect. For notational simplicity, we use the following abbreviation for a univariate random variable \(V_{i,t}\) under a dointervention (Pearl, 1995) on the facialmask policy \(M_{i,t}\) (treatment variable)
where \(\tau \in \{0,1\}\). The total causal effect (Pearl, 2000) of the change of the treatment variable \(M_{i,t}\) from 0 to 1 on the response \(Y_{i,t}\) is given by
Note that the total causal effect describes the effect of \(M_{i,t}\) on \(Y_{i,t}\) considering all causal paths from \(M_{i,t}\) to \(Y_{i,t}\). Recall that \(Y_{i,t}\) is the response that is already leaded to the future. In our DAG there is a mediator \(B_{i,t}\) between \(M_{i,t}\) and \(Y_{i,t}\), such that the total causal effect can be decomposed into the sum of a direct and an indirect causal effect (Pearl, 2001). Explicitly, the total causal effect can be written as the sum of the total natural indirect effect (TNIE) (short name indirect effect) and the pure natural direct effect (PNDE) (short name direct effect) (Daniel et al., 2015), that is
In the next section, where we assume a linear model, we also identify the TCE, TNIE, and PNDE with (products) of model coefficients. We will ultimately estimate the total effect as well as the direct effect though linear regressions, which is possible by using valid adjustment sets.
4 Methodology
In the following, for cantons \(i=1,\ldots ,N\) and weeks \(t=1,\ldots ,T\) let
be the two generating equations of interest of the structural equation model (SEM) (Pearl, 2009) compatible with our DAG. The set \(\varvec{V}_{i,t}\) is the parent set of \(Y_{i,t}\) without \(B_{i,t}\) and \(M_{i,t}\), \(\varvec{\tilde{V}}_{i,t}\) is the parent set of \(B_{i,t}\) without \(M_{i,t}\), and \(\epsilon _{i,t}\) and \(\nu _{i,t}\) are error terms with expectation zero. The next small lemma allows us to identify the TCE, TNIE and PNDE in our context via the regression coefficients in Equations (1).
Lemma 4.1
The TNIE and the PNDE can be expressed in terms of the regression coefficients in Equations (1) via
The proof of Lemma 4.1 is given in Appendix D. By the decomposition \(\text {TCE}=\text {TNIE}+\text {PNDE}\) and Lemma 4.1 we get that \(\text {TCE} = \delta _3\delta _4 + \delta _1\).
Instead of estimating TNIE and PNDE directly via estimating the coefficients in Equations (1) and applying Lemma 4.1, we specify the following linear regression model with a twoway error component
where \(\varvec{Z}_{i,t}\) is a row vector of covariates and \(\alpha _i\) and \(\gamma _t\) are explained in the next paragraph. If \(\varvec{Z}_{i,t}\) is a valid adjustment set (Shpitser, 2012; Perković et al., 2018) for the effect of \(M_{i,t}\) on \(Y_{i,t}\), then \(\theta\) is equal to the total effect (TCE). We include all parents of \(Y_{i,t}\) except \(B_{i,t}\) and \(M_{i,t}\) in \(\varvec{Z}_{i,t}\), which is a valid adjustment set for the total effect.^{Footnote 15} If \(\varvec{Z}_{i,t}\) is the parent set of \(Y_{i,t}\) except \(M_{i,t}\), then \(\theta\) is equal to the direct effect (PNDE). Thus, in our setting, the set \(\varvec{Z}_{i,t}\) we use to identify the direct effect is given by the conjunction of \(B_{i,t}\) and the valid adjustment set used to identify the total effect. Hence, \(\theta\) is our target of inference, either with the interpretation of the direct or the total effect.
To relate the Model (2) to the DAG, \(\alpha _i\) summarizes the effects of \(\varvec{U}_i^1\) on \(Y_{i,t}\), and similarly does \(\gamma _t\) summarize the effects of \(\varvec{U}_t^2\) on \(Y_{i,t}\). Depending on the assumptions on the error components \(\alpha _i\), \(\gamma _t\) and \(\epsilon _{i,t}\), Model (2) can be handled with either a fixedeffects or randomeffects approach.
Generally, the fixedeffects approach is more robust than the randomeffects approach, while the latter is more efficient in case all assumptions are met. We briefly outline both approaches in the upcoming sections; for more details, see, for example, Hansen (2022). The suitability of the linearity assumption in Equation (2) is assessed via TukeyAnscombe plots (residual vs fitted values).
4.1 Fixedeffects approach
The fixedeffects approach assumes that the stochastic structure of \(\alpha _i\) and \(\gamma _t\) is unknown and possibly arbitrarily correlated with \(M_{i,t}\) and \(\varvec{Z}_{i,t}\). In this case, we call \(\alpha _i\) an unobserved cantonal fixed effect and \(\gamma _t\) an unobserved weekly fixed effect. The incorporation of fixed effects accounts for unobserved common causes of the treatment and response variable that are either cantonspecific, but invariant across weeks, or weekspecific, but invariant across cantons.
In particular, variables that are constant across cantons are national variables. In other words, by applying the fixedeffects approach we can control for national contextual information such as the total number of new cases in the whole country.
The variancecovariance structure of the error terms \(\epsilon _{i,t}\) can take many forms; see the upcoming Sect. 4.1.1. However, \(\epsilon _{i,t}\) are always supposed to satisfy the exogeneity assumption,
for all \(i=1,\ldots ,N\) and \(t=1,\ldots , T\). This assumption implies no further unobserved confounding apart from \(\alpha _i\) and \(\gamma _t\). To eliminate \(\alpha _i\) and \(\gamma _t\), we apply the twoway within transformation,
to \(u_{i,t} \in \{Y_{i,t}, M_{i,t}, \varvec{Z}_{i,t}, \alpha _i, \gamma _t, \epsilon _{i,t} \}\) of Model (2) and obtain the following equation
where the interpretation of \(\theta\) remains as in Model (2). Finally, we estimate the coefficient \(\theta\) by estimating the whole coefficient vector \(\varvec{\eta } = (\theta , \varvec{\beta })\), using Ordinary Least Squares (OLS). In the following, we use the acronym FE for this approach.
Apart from the basic OLS estimate, we also compute a debiased estimate of \(\theta\), \(\hat{\theta }_{BC}\), by crossover Jackknife bias correction (Chernozhukov et al., 2021; Chen et al., 2020, 2019). We employ this method as the estimation of dynamic linear panel models (i.e., including lagged instances of the response variable as covariates) using the fixedeffects estimator potentially yields a bias. The debiased estimate is given by
where \(\hat{\theta }\) is the OLS regression coefficient based on the entire sample and \(\hat{\theta }_{S_j}\) is the estimated coefficient computed on the subsample \(S_j\), \(j=1,2\). The subsamples \(S_1\) and \(S_2\) are defined, as in Chernozhukov et al. (2021), by
and
respecting the natural ordering of the weeks. Since there is no natural ordering of the cantons, we repeat the above procedure 500 times, where each time the cantons are randomly permuted. The final estimate is then the average of the 500 debiased estimates. In the following, we use the acronym DFE for this debiased fixedeffects approach.
We now detail the specific sets of control variables \(\varvec{Z}_{i,t}\), which depend on whether we aim at estimating the direct or the total effect of the strict facialmask policy on the spread of COVID19. Due to the within transformation (4), apart from the unobservable \(\alpha _i\) and \(\gamma _t\), also all observable weekconstant variables, such as policy indicators that do not vary over the period of analysis, and cantonconstant variables, such as population or density, drop out of \(\varvec{Z}_{i,t}\). In the case of the direct effect, we must thus regress the response variable on all its remaining parents, i.e., \(\varvec{Z}_{i,t}= (B_{i,t}, H_{i,t},\varvec{W}_{i,t}, \varvec{P}_{i,t}, Y_{i,t'})\). In the case of the total effect, we need to remove the variable \(B_{i,t}\) from the set, and obtain the valid adjustment set \(\varvec{Z}_{i,t}= ( H_{i,t}, \varvec{W}_{i,t}, \varvec{P}_{i,t}, Y_{i,t'})\). Concretely, the following variables are contained in each category:

\(B_{i,t}\): growth.transactions,

\(H_{i,t}\): holiday,

\(\varvec{W}_{i,t}\): sunshine, temperature and humidity,

\(\varvec{P}_{i,t}\): work.closing, rest.gatherings and canc.events,
and the variable \(Y_{i,t'}\) is specific for each of the two response variables, see Sect. 2.8.
4.1.1 Construction of confidence intervals
To construct \(95\%\)confidence intervals for the coefficient \(\theta\), we use the normal approximation,
where \(\hat{\theta }\) is the first entry in the estimated coefficient vector \(\hat{\varvec{\eta }} = (\hat{\theta }, \hat{\varvec{\beta }})\), obtained either through the FE or DFE approach, and \(\widehat{\text {Var}}(\hat{\theta })\) is the corresponding estimated variance. The estimation of the variance requires careful consideration due to the panel structure of our data.
Let in the following
be the observed covariates of canton i and week t, where \(P :=1+ \varvec{Z}_{i,t}\). Further let
be the stacked covariate matrix. The conditional variancecovariance matrix of \(\hat{\varvec{\eta }}\) can be written as
where
and
with
We denote by \(\hat{\varvec{\epsilon }}\) the empirical residuals obtained through the FE or DFE approach.
The variance of \(\hat{\varvec{\eta }}\) is then estimated by plugging in an estimate of \(\varvec{\Omega }\) into Equation (7), resulting in \(\widehat{\text {Var}}(\hat{\varvec{\eta }}) =\varvec{Q}^{1} \hat{\varvec{\Omega }} \varvec{Q}^{1}\).^{Footnote 16}
We use the following seven estimators for \(\varvec{\Omega }\) (with their short name in brackets):

1)
HeteroscedasticRobust (HC3)

2)
OneWay Clustering on Canton (Canton)

3)
OneWay Clustering on Week (Week)

4)
TwoWay Clustering on Canton and Week (CantonWeek)

5)
NeweyWest (NW) (Newey and West, 1987)

6)
ChiangHansen (CH) (Chiang et al., 2022)

7)
Informal Own Specification (Own) (motivated by (Colella et al., 2019))
The estimators correspond to different assumptions on the structure of \(\textrm{Cov}(\epsilon _{i,t},\epsilon _{j,s})\) for \(i,j=1,\ldots , N\) and \(t,s=1,\ldots , T\). These assumptions reflect the clustered and/or heteroskedastic and/or autocorrelated nature of the error terms. The details can be found in Appendix C.
4.2 Randomeffects approach
The randomeffects approach assumes that the components of the error, \(\alpha _i\), \(\gamma _t\), and \(\epsilon _{i,t}\), satisfy the following exogeneity assumptions
for all \(i=1,\ldots ,N\) and \(t=1,\ldots , T\). These assumptions imply that \(\alpha _i\), \(\gamma _t\), and \(\epsilon _{i,t}\) are uncorrelated with \(M_{i,t}\) and \(\varvec{Z}_{i,t}\), which implies the strong assumption of no unobserved confounding. In particular, in contrast to the fixedeffects approach the randomeffects approach does not control for unobserved week or cantonspecific confounders. The correlation within weeks and within cantons in the composite error \(v_{i,t} = \alpha _i + \gamma _t + \epsilon _{i,t}\) is accounted for by the Feasible Generalized Least Squares (FGLS) approach, where we assume the following structure of the variancecovariance matrix,
where \(\sigma _i^{\alpha }>0\), \(\sigma _i^{\gamma }>0\) and \(\sigma _i^{\epsilon }>0\).
With this approach we again obtain an estimator of the whole coefficient vector \(\varvec{\eta } = (\theta , \varvec{\beta })\) and extract the estimator of \(\theta\). To construct \(95\%\)confidence intervals for \(\theta\), we again use Equation (6), where we apply Formula (7) with a plugin estimator of \(\varvec{\Omega }\). In the following, we use the acronym RE for the point estimator as well as the confidence interval of this approach.
In contrast to the fixedeffects approach, the variables \(\varvec{D}_{i}\) that are part of the parents of the response variable, do not drop out of \(\varvec{Z}_{i,t}\). Furthermore, the set of policy variables now includes testing.policy, which was dropped in the FE approach as it is constant within cantons over the period of analysis and is thus eliminated by the within transformation. In the case of the direct effect we obtain \(\varvec{Z}_{i,t}= (B_{i,t},\varvec{D}_{i},H_{i,t},\varvec{W}_{i,t}, \varvec{P}_{i,t}, Y_{i,t'})\). In the case of the total effect, we need to remove the variable \(B_{i,t}\) from the set, and obtain the valid adjustment set \(\varvec{Z}_{i,t}= (\varvec{D}_{i},H_{i,t},\varvec{W}_{i,t}, \varvec{P}_{i,t}, Y_{i,t'})\). Concretely, the following variables are contained in each category:

\(B_{i,t}\): growth.transactions,

\(\varvec{D}_{i}\): population, density and perc.O80,

\(H_{i,t}:\) holiday,

\(\varvec{W}_{i,t}\): sunshine, temperature and humidity,

\(\varvec{P}_{i,t}\): work.closing, school.closing, rest.gatherings, canc.events and testing.policy,
and the variable \(Y_{i,t'}\) is specific for each of the two response variables, see Sect. 2.8.
4.3 Sensitivity analysis
We perform an extensive sensitivity analysis to investigate the robustness of our results toward changes in our methodology and data preparation. We consider further altering the confidence interval construction, extending the set of information variables, joining halfcantons into one respective canton, changing the timing of information variables, discarding outliers, varying the time period, performing the doubly robust double machine learning estimation, and using the lag1 response variable as a further covariate.
4.4 Implementation
We implement the methodology in the software R, using the packages plm, sandwich, lmtest and DoubleML.
5 Results
For each of the three modeling approaches (fixed effects FE, debiased fixed effects DFE, and random effects RE), we distinguish between the direct and total effect of introducing a strict facialmask policy (as compared to the governmentdetermined countrywide baseline policy) on the two response variables. This results in 12 estimated effects along with their (various) confidence intervals.
Figure 3 shows the results for the direct effect of the strict facialmask policy. We see that all 95% confidence intervals apart from two lie to the left of zero. Thus, the direct effect of the strict facialmask policy was estimated to be significantly negative in almost all modeling approaches, implying a significant reduction in the expected spread of COVID19 in the early pandemic if the facialmask policy is changed from the governmentdetermined countrywide baseline to the strict facialmask policy. Figure 4 shows the results for the total effect of the strict facialmask policy, and we see that, apart from the RE approach, the overall picture is very similar to the direct effect. In the model with response variable r, the point estimators of both the direct and total effect lie between \(0.22\) and \(0.16\). In the model with response variable growth.new.cases, the point estimators of both the direct and total effect lie between \(0.29\) and \(0.17\).
The fact that the estimated direct and total effects are very similar suggests that either the facialmask policy worked mainly through the direct path by reducing the transmissibility of COVID19, or the behavioral variable growth.transactions is not capturing the important changes in social distancing behavior. It is plausible that the latter is at least part of the explanation, as it is for example unclear how this variable can reflect changes in behavior in private spaces. In fact, Chernozhukov et al. (2021), who employ a closely related empirical approach for the U.S., do not find the indirect effect to be significant either.
Overall, the FE and the DFE approach provide very similar results. This indicates that the dynamic structure of the panel model does not induce a large estimation bias. The substantial difference, however, between the RE and FE approach suggests that controlling only for demographic variables to capture timeinvariant cantonal information as in the RE model is insufficient. Other unobserved canton or weekspecific confounders for which we control for with the fixedeffects approach, seem to play an important role.
For all 12 modeling approaches, the TukeyAnscombe plots (residuals vs fitted values), displayed in Fig. 6 in Appendix B, show no evidence against the assumption of linearity. The results of the extensive sensitivity analysis are shown in Table 2 in Appendix E. In line with our main analyses, in all of the cases considered, we obtain a negative point estimate of the total causal effect, ranging from \(1.08\) to \(0.04\). The estimate is deemed significantly different from 0 at \(\alpha =0.05\) in 31 out of the \(41\ (76\%)\) sensitivity analyses conducted.
6 Conclusion
We analyse the effect of the strict facialmask policy on the spread of COVID19 during the early phase of the pandemic in Switzerland, using the cantonal heterogeneity in facialmask policies from July 2020 to December 2020. The obligation to wear a facial mask in public transportation formed the governmentdetermined countrywide baseline for facialmask policies. The strict facialmask policy corresponds to mandatory mask wearing on public transport and in all public or shared spaces where social distancing is not possible.
We estimate a significant reduction in the expected spread of COVID19 in the early pandemic if the facialmask policy is changed from the governmentdetermined countrywide baseline to the strict facialmask policy. Importantly, we do not investigate whether the estimated effect sizes are relevant in any given social context.
The correctness of the causal assumptions is crucial to the whole analysis. Hence, the results should be treated with caution and interpreted in light of the mostly untestable assumptions inherent in our modeling approaches, described in Sects. 3 and 4. In particular, we emphasize that the assumption of no unmeasured confounding imposed using the RE approach is very delicate and most likely does not hold. As such one can consider the RE approach more as a sensitivity check. It is also important to stress that in an observational study like the one at hand it is almost impossible to control for all confounders that vary with weeks and cantons. It is highly likely that our effects of interest are confounded by further unobserved social, cultural and economic traits that may differ between cantons and weeks. Implementation of and compliance with nonpharmaceutical policies like the strict facialmask policy are subject to cultural norms, political backgrounds and defiance against political authorities and policy makers, to just name a few examples of such factors.
Furthermore, our results are conditional on the characteristics of the time period between July and December 2020 during the early pandemic, when the alpha variant of the SARSCoV2 virus was predominant and no vaccinations were available yet. Also, only parts of the three seasons summer, autumn, and winter are represented in the data. However, even though it is hard to directly compare our results to results in other countries, with other facialmask policies and other time periods, our findings are largely in line with those of other research groups (see, e.g., Chernozhukov et al. (2021), Mitze et al. (2020), and Pleninger et al. (2022)).
Availability of data and materials
Notes
This was in alignment with the perspective of the World Health Organization (WHO). In July 2020, this recommendation was overthrown due to increasing scientific evidence for the effectiveness of facial masks.
See https://www.bag.admin.ch/bag/de/home/dasbag/aktuell/medienmitteilungen.msgid79522.html, last visited August 21, 2023, for further details.
See https://www.admin.ch/gov/de/start/dokumentation/medienmitteilungen.msgid79711.html, last visited August 21, 2023, for further details.
As convention, we use bold letters for multivariate variables and normal letters for univariate variables.
See https://opendata.swiss/en/dataset/covid19schweiz, last visited August 21, 2023, for further details.
See https://opendata.swiss/en/dataset/covid19schweiz, last visited August 21, 2023, for further details.
A project by the universities of St. Gallen and Lausanne, see https://monitoringconsumption.com/, last visited August 21, 2023, for further details.
See https://www.bfs.admin.ch/bfs/en/home/statistics/regionalstatistics/regionalportraitskeyfigures/cantons.assetdetail.20784336.html, last visited August 21, 2023, for further details.
See https://www.edk.ch/en/educationsystem/websitesofthecantons, last visited August 21, 2023, for further details.
See https://www.meteoschweiz.admin.ch/wetter/messsysteme/bodenstationen/automatischesmessnetz.html, last visited August 21, 2023, for further details.
See https://github.com/OxCGRT/covidpolicytracker/blob/master/documentation/index_methodology.md last visited August 21, 2023, for all indicators used to calculate the index.
Indicatorcoding: https://github.com/OxCGRT/covidpolicytracker/blob/master/documentation/codebook.md, last visited August 21, 2023.
Note that it is not a DAG in the strict sense, since certain nodes represent groups of variables. We allow the variables within such groups to be arbitrarily causally related so that there are no cycles. An edge to or from such a group of variables indicates that we allow such an edge for each variable in the group.
A mediator between A and B is an intermediate variable that lies on a causal path between A and B. A path from node A to node B is called a causal path if all edges on the path point toward B. For a recent exposition of this topic, see Robins et al. (2022).
Note that estimates of \(\text {Var}(\hat{\theta })\) based on \(\hat{\theta }\) remain valid for the debiased estimator \(\hat{\theta }_{BC}\) (Chen et al., 2019).
Abbreviations
 BAG:

Bundesamt für Gesundheit
 CI:

Confidence interval
 DAG:

Directed acyclic graph
 DFE:

Debiased fixed effects
 DML:

Double machine learning
 FE:

Fixed effects
 FGLS:

Feasible generalized least squares
 KOF:

Konjunkturforschungsstelle
 OLS:

Ordinary least squares
 PNDE:

Pure natural direct effect
 RE:

Random effects
 SEM:

Structural equation model
 TCE:

Total causal effect
 TNIE:

Total natural indirect effect
References
Bannert, M., & Thoeni, S. Kofdata: Get Data from the ’KOF Datenservice’ API. (2022). R package version 0.2. https://CRAN.Rproject.org/package=kofdata
Chen, S., Chernozhukov, V., & FernándezVal, I. (2019). Mastering panel metrics: Causal impact of democracy on growth. In AEA Papers and Proceedings, 109
Chen, S., Chernozhukov, V., FernandezVal, I., Kasahara, H., & Schrimpf, P. (2020). Crossover jackknife bias correction for nonstationary nonlinear panel data. Forthcoming
Chernozhukov, V., Kasahara, H., & Schrimpf, P. (2021). Causal impact of masks, policies, behavior on early Covid19 pandemic in the U.S. Journal of Econometrics, 220, 23–62.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21
Chiang, H. D., Hansen, B. E., & Sasaki, Y. (2022). Standard errors for twoway clustering with serially correlated time effects. arXiv preprint arXiv:2201.11304
Colella, F., Lalive, R., Sakalli, S. O., & Thoenig, M. (2019). Inference with arbitrary clustering. IZA Discussion Paper No. 12584. SSRN: https://ssrn.com/abstract=3449578
Cori, A., Ferguson, N. M., Fraser, C., & Cauchemez, S. (2013). A new framework and software to estimate timevarying reproduction numbers during epidemics. American Journal of Epidemiology, 178, 1505–1512.
Daniel, R. M., De Stavola, B. L., Cousens, S. N., & Vansteelandt, S. (2015). Causal mediation analysis with multiple mediators. Biometrics, 71(1), 1–14.
Fattorini, D., & Regoli, F. (2020). Role of the chronic air pollution levels in the Covid19 outbreak risk in Italy. Environmental Pollution, 264, 114732.
Hale, T., Angrist, N., Goldszmidt, R., Kira, B., Petherick, A., Phillips, T., Webster, S., CameronBlake, E., Hallas, L., Majumdar, S., & Tatlow, H. (2021). A global panel database of pandemic policies (Oxford Covid19 government response tracker). Nature Human Behaviour, 5, 529–538.
Hansen, B. (2022). Econometrics. Princeton: Princeton University Press.
Huber, M. (2014). Identifying causal mechanisms (primarily) based on inverse probability weighting. Journal of Applied Econometrics, 29(6), 920–943.
Huber, M., & Langen, H. (2020). Timing matters: The impact of response measures on Covid19related hospitalization and death rates in Germany and Switzerland. Swiss Journal of Economics and Statistics, 156
Huisman, J. S., Scire, J., Angst, D. C., Li, J., Neher, R. A., Maathuis, M. H., Bonhoeffer, S., & Stadler, T. (2022). Estimation and worldwide monitoring of the effective reproductive number of SARSCOV2. Elife, 11, e71345.
Imai, K., Keele, L., & Yamamoto, T. (2010). Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science, 25(1), 51–71.
Kähler, C. J., & Hain, R. (2020). Fundamental protective mechanisms of face masks against droplet infections. Journal of Aerosol Science, 148.
Mitze, T., Kosfeld, R., Rode, J., & Walde, K. (2020). Face masks considerably reduce Covid19 cases in Germany. Proceedings of the National Academy of Sciences of the United States of America, 117.
Newey, W. K., & West, K. D. (1987). A simple, positive semidefinite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55(3), 703–708.
Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4), 669–688.
Pearl, J. (2001). Direct and indirect effects. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence. UAI’01, pp. 411–420. Morgan Kaufmann Publishers Inc., San Francisco
Pearl, J. (2009). Causality (2nd ed.). Cambridge: Cambridge University Press.
Pearl, J., et al. (2000). Models, reasoning and inference (Vol. 19(2), p. 3). Cambridge: Cambridge University Press.
Perković, E., Textor, J., Kalisch, M., & Maathuis, M. H. (2018). Complete graphical characterization and construction of adjustment sets in Markov equivalence classes of ancestral graphs. Journal of Machine Learning Research, 18(220), 1–62.
Pleninger, R., Streicher, S., & Sturm, J.E. (2022). Do Covid19 containment measures work? Evidence from Switzerland. Swiss Journal of Economics and Statistics, 158, 5.
Robins, J. M., Richardson, T. S., & Shpitser, I. (2022). An interventionist approach to mediation analysis
Shpitser, I. (2012). Appendum to “on the validity of covariate adjustment for estimating causal effects”. Personal Communication
Tchetgen Tchetgen, E. J., & Shpitser, I. (2012). Semiparametric theory for causal mediation analysis: Efficiency bounds, multiple robustness, and sensitivity analysis. Annals of Statistics, 40(3), 1816.
Wheaton, W. C., & Thompson, A. K. (2020). The geography of Covid19 growth in the US: Counties and metropolitan areas. SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3570540
Zhang, R., Li, Y., Zhang, A. L., Wang, Y., & Molina, M. J. (2020). Identifying airborne transmission as the dominant route for the spread of Covid19. In Proceedings of the National Academy of Sciences of the United States of America (Vol. 117)
Zhu, Y., Xie, J., Huang, F., & Cao, L. (2020). Association between shortterm exposure to air pollution and Covid19 infection: Evidence from China. Science of the Total Environment, 727, 138704.
Zoran, M. A., Savastru, R. S., Savastru, D. M., & Tautan, M. N. (2020). Assessing the relationship between surface levels of pm2.5 and pm10 particulate matter impact on covid19 in Milan, Italy. Science of the Total Environment, 738, 139825.
Acknowledgements
We thank the participants of the 2023 annual congress of the Swiss Society of Economics and Statistics (SSES/SGVS) in Neuchâtel for their valuable comments.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
E.N., S.H. and ML.S. contributed equally to the manuscript. M.H.M. advised the creation of this manuscript with her statistical expertise. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Evolution of the strict facialmask policy
Appendix B: TukeyAnscombe plots (residuals vs fitted values)
Appendix C: Details on the confidence interval construction
Subsequently, we present the seven estimators for \(\varvec{\Omega }\), defined in (8) (with their short name in brackets).

1)
HeteroscedasticRobust (HC3) (\(\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\ne 0\) iff \(i=j\) and \(s=t\)):
$$\begin{aligned} \hat{\varvec{\Omega }}_1 :=\frac{1}{(NT)^2}\sum _{i=1}^N\sum _{t=1}^T \varvec{X}_{i,t}\varvec{X}_{i,t}^\top \tilde{\epsilon }_{i,t}^2, \end{aligned}$$where the residual \(\tilde{\epsilon }_{i,t}\) is given by the classical HC3 representation
$$\begin{aligned} \tilde{\epsilon }_{i,t} :=\frac{\hat{\epsilon }_{i,t}}{(1\varvec{X}_{i,t}^{\top }(\varvec{X}^{\top }\varvec{X})^{1}\varvec{X}_{i,t})}. \end{aligned}$$ 
2)
OneWay Clustering on Canton (Canton) (\(\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\ne 0\) iff \(i=j\)):
$$\begin{aligned} \hat{\varvec{\Omega }}_2 :=\frac{1}{(NT)^2}\sum _{i=1}^N \hat{\varvec{R}}_i\hat{\varvec{R}}_i^\top , \end{aligned}$$where \(\hat{\varvec{R}}_i :=\sum _{t=1}^T \varvec{X}_{i,t}\hat{\epsilon }_{i,t}.\)

3)
OneWay Clustering on Week (Week) (\(\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\ne 0\) iff \(t=s\)):
$$\begin{aligned} \hat{\varvec{\Omega }}_3 :=\frac{1}{(NT)^2}\sum _{t=1}^T \hat{\varvec{S}}_t\hat{\varvec{S}}_t^\top , \end{aligned}$$(10)where \(\hat{\varvec{S}}_t :=\sum _{i=1}^N \varvec{X}_{i,t}\hat{\epsilon }_{i,t}.\)

4)
TwoWay Clustering on Canton and Week (CantonWeek) (\(\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\ne 0\) iff \(t=s\) or \(i=j\)):
$$\begin{aligned} \hat{\varvec{\Omega }}_4&:=\frac{1}{(NT)^2} \biggl ( \sum _{i=1}^N \hat{\varvec{R}}_i\hat{\varvec{R}}_i^\top + \sum _{t=1}^T \hat{\varvec{S}}_t\hat{\varvec{S}}_t^\top \nonumber \\& \sum _{i=1}^N\sum _{t=1}^T\varvec{X}_{i,t}\varvec{X}_{i,t}^\top \hat{\epsilon }_{i,t}^2 \biggr ). \end{aligned}$$(11) 
5)
NeweyWest (NW) (Newey and West, 1987) (\(\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\ne 0\) iff \(i=j\), where \(\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\) is decreasing with \(ts\) increasing):
$$\begin{aligned} \hat{\varvec{\Omega }}_5&:=\frac{1}{(NT)^2} \biggl ( \sum _{i=1}^N \hat{\varvec{R}}_i\hat{\varvec{R}}_i^\top + \sum _{t=1}^T \hat{\varvec{S}}_t\hat{\varvec{S}}_t^\top \nonumber \\& \sum _{i=1}^N\sum _{t=1}^T\varvec{X}_{i,t}\varvec{X}_{i,t}^\top \hat{\epsilon }_{i,t}^2 \nonumber \\&+ \sum _{m=1}^{M}w(m,M)( \hat{\varvec{G}}_m+\hat{\varvec{G}}_m^{\top } \nonumber \\&\hat{\varvec{H}}_m\hat{\varvec{H}}_m^{\top } )\biggr ), \end{aligned}$$(12)where \(\hat{\varvec{G}}_m :=\sum _{t=1}^{Tm}\hat{\varvec{S}}_t \hat{\varvec{S}}_{t+m}^{\top },\) \(\hat{\varvec{H}}_m :=\sum _{i=1}^N\sum _{t=1}^{Tm}\varvec{X}_{i,t}\hat{\epsilon }_{i,t}\varvec{X}_{i,t+m}^{\top }\hat{\epsilon }_{i,t+m}\), \(w(m,M)=1m/(M+1)\) are triangular weights and \(M=\lfloor T^{1/4} \rfloor\).

6)
ChiangHansen (CH) (Chiang et al., 2022) (\(\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\ne 0\), where \(\textrm{Cov}(\epsilon _{i,t},\epsilon _{j,s})\) for arbitrary \(i\ne j\) is decreasing with \(ts\) increasing): The estimator \(\hat{\varvec{\Omega }}_6\) is given by Equation (12), where w(m, M) are the triangular weights as in NW and M is data driven.

7)
Informal Own Specification (Own) (motivated by (Colella et al., 2019)) (\(\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\ne 0\) if \(i=j\) or i and j are neighboring cantons, where \(\textrm{Cov}(\epsilon _{i,t}, \epsilon _{j,s})\) is decreasing with \(ts\) increasing):
$$\begin{aligned} \hat{\varvec{\Omega }}_7 :=\frac{1}{(NT)^2} \sum _{i=1}^N\sum _{t=1}^{T}\sum _{j=1}^N\sum _{s=1}^{T} \omega _{itjs}V_{itjs}, \end{aligned}$$where
$$\begin{aligned} V_{itjs} :=\varvec{X}_{i,t}\hat{\epsilon }_{i,t}\hat{\epsilon }_{j,s}\varvec{X}_{j,s}^{\top }, \end{aligned}$$and the weights \(\omega _{itjs}\) specify the dependence between two error terms \(\epsilon _{i,t}\) and \(\epsilon _{j,s}\) and are given by
$$\begin{aligned} \omega _{itjs} :=\left\{ \begin{array}{lr} 1, &{} i=j, \ t=s,\\ \lambda _{ij}0.5^{\mid ts \mid }, &{} \text {otherwise} \end{array}\right\} , \end{aligned}$$and
$$\begin{aligned} \lambda _{ij} :=\left\{ \begin{array}{lr} 1, &{} i=j, \\ 0.5, &{} i,j \ \text {neighbors}, \\ 0, &{} \text {otherwise} \end{array}\right\} . \end{aligned}$$
Appendix D: Proof of Lemma 4.1
Lemma D.1
(Restatement of Lemma 4.1) The TNIE and the PNDE can be expressed in terms of the regression coefficients in Equations (1) via
Proof
Plugging the expressions in Equations (1) into the mediator equation for both dointerventions, respectively, we get
and
Using the definitions of TNIE and PNDE we get
and
\(\square\)
Appendix E: Sensitivity analysis
We perform an extensive sensitivity analysis to investigate the robustness of our results. In the following, we explain the robustness checks conducted and subsequently present the results in Table 2. We restrict the robustness checks to the total effect of the strict facialmask policy on both response variables and do not consider the direct effect.
1.1 E.1 Alterations to confidence interval construction
For the point estimators of the FE and DFE approaches, described in Sects. 4.1 and 4.2, we compute the standard errors using oneway clustering on the month instead of the week in Formula (10) (Month FE and Month DFE) as well as twoway clustering on the canton and the month in Formula (11) (CantonMonth FE and CantonMonth DFE).
1.2 E.2: Alterations to the data and point estimation
The upcoming sections describe changes to the data and point estimation. For all these adaptations we construct the confidence intervals with one method only: For the FE and DFE approaches we compute the standard errors using the twoway clustering on the canton and week (CantonWeek), given in Formula (11), to construct the confidence intervals. For the RE approach, we compute the standard errors as described in Sect. 4.2.
1.2.1 Additional information variables
For the RE approach and the approximate weekly growth rate in supposed new infections as response variable we include additional covariates as motivated by Chernozhukov et al. (2021). To describe them, consider the following new definitions,
where \(C_{i,t}\) represents the number of new confirmed cases in canton i in week t and \(C_{t}.\text {nat}\) represents the national number of new confirmed cases in week t. We use \(Y_{i,t}=G_{i,t+2}\), and the corresponding set \(\varvec{Y}_{i,t'}= (G_{i, t+1}, G_{i,t},G'_{i,t},G_{t}.\text {nat},G'_{t}.\text {nat} )\). To the adjustment set \(\varvec{Z}_{i,t}\) we also add \(\log \left( T_{i,t}\right)\), where \(T_{i,t}\) represents the number COVID19tests performed in canton i in week t, representing an additional information variable. Note that this is only done for the RE model as the variables at the national level are omitted in the FE and DFE approaches through the within transformation.
1.2.2 HalfCantons
The observations from the halfcantons BaselLandschaft and BaselStadt, AppenzellInner rhoden and AppenzellAusserrhoden and Obwalden and Nidwalden are combined into one canton by taking the average of the two observations, respectively. The only exceptions are the calculation of growth.new.cases and growth.transactions, where we sum the number of new cases or transactions, respectively, from the two halfcantons, and then calculate the approximate growth rates as before. We perform the analysis for the FE, DFE and RE approaches.
1.2.3 Timing of information variables
We examine the influence of the lag of the information variable that is part of the lagged response variable \(Y_{i,t'}\). For the response r, we change the lag of the information variable from \(t'=3\) to \(t'=2\), resulting in the lagged response variable \(Y_{i,t'} = R_{i,t2}\). For the response growth.new.cases, we change the lag of the information variable from \(t'=2\) to \(t'=3\), resulting in the lagged response variable \(Y_{i,t'} = G_{i,t1}\). We perform the analysis for the FE, DFE and RE approaches.
1.2.4 Outliers
We fit the FE approach as described in Sect. 4.1, compute the Cook’s distance for each observation and exclude observations with a corresponding Cook’s distance \(>4\times (NT)^{1}\). We then refit the model using the FE approach based on the reduced sample. In the model where \(\texttt {r}\) is the response, 28 observations are excluded. When \(\texttt {growth.new.cases}\) is the response, 27 observations are excluded. Since the calculation of the Cook’s distance in the DFE and RE approaches is not straightforward, we restrict this robustness check to the FE model.
1.2.5 Very short sample period
We restrict the period of analysis to the time between August 21, 2020, and October 19, 2020. On August 21, 2020, the canton of Neuchâtel was the first canton to introduce the strict facialmask policy. On October 19, 2020, the federal government enforced the strict facialmask policy nationwide. This period is very short with only \(T=9\) weeks which constitutes a problem for the DFE and RE approach. We thus perform the analysis only for the FE approach.
1.2.6 Short sample period
We restrict the period of analysis to the time between July 6, 2020, and October 18, 2020. During this timewindow, the cantons were free to choose between the strict facialmask policy and the governmentdetermined countrywide baseline policy. With only \(T=15\) weeks this period is also short.
1.2.7 Double machine learning approach
We relax the assumption of a linear regression model to a partially linear regression model, where the effect of the adjustment set \(\varvec{Z}_{i,t}\) on \(Y_{i,t}\) is nonparametric. We use the adjustment set of the RE approach. Estimation is done via the double machine learning framework (Chernozhukov et al., 2018), which is a doubly robust method. This approach assumes the following model,
where \(\mathbb {E}[\nu _{i,t}\mid \varvec{Z}_{i,t}] = \mathbb {E}[\epsilon _{i,t}\mid M_{i,t},\varvec{Z}_{i,t}]=0.\) We learn the functions \(m(\cdot )\) and \(g(\cdot )\) with random forests. See Chernozhukov et al. (2018) for more details. We implement the procedure with the Rpackage DoubleML.
1.2.8 Lag1 response variable as covariate
In addition to the information variable \(Y_{i,t'}\), we consider for both response variables an additional lag of the response variable as covariate. We include the lag1 response variable \(Y_{i,t1}\) in the models, since for both response variables we observe a possibly nonzero autocorrelation at lag one for some cantons.
1.3 E.3: Results sensitivity analysis
We show the results of all considered sensitivity analyses in Table 2. We obtain a negative point estimate of the total effect, ranging from \(1.08\) to \(0.04\). The estimate is deemed significantly different from 0 at \(5\%\)level in 31 out of the \(41\ (76\%)\) sensitivity analyses conducted.
We provide some general remarks on the results. As stressed earlier, the RE approach cannot control for unobserved confounding and is therefore less trustworthy than the FE and DFE approaches. In some of the sensitivity analyses, the DFE approach produces estimates that vary substantially from the FE approach. Due to the properties discussed in Sect. 4.1, the DFE approach is more trustworthy. For the DML approach, we obtain much larger estimated effect sizes of the total effect. However, the uncertainty is very large, so the results are not significant. Using this methodology, we also cannot control for unmeasured confounding. As there is no clear pattern apparent in the TukeyAnscombe plots in Fig. 6, we suspect that the difference in point estimates between the linear methods and the nonlinear DML methodology is mostly driven by the latter’s lack to control for unobserved confounding—and not by an underlying nonlinear relationship between the facialmask policy and the response variables.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Nussli, E., Hediger, S., Spohn, ML. et al. The effect of a strict facialmask policy on the spread of COVID19 in Switzerland during the early phase of the pandemic. Swiss J Economics Statistics 160, 2 (2024). https://doi.org/10.1186/s41937024001190
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s41937024001190