The effect of a strict facial-mask policy on the spread of COVID-19 in Switzerland during the early phase of the pandemic

During several weeks in the second half of the year 2020, the cantons of Switzerland could choose to adopt the gov‑ ernment‑determined facial‑mask policy, corresponding to mandatory facial‑mask wearing on public transport, or a strict facial‑mask policy, corresponding to mandatory facial‑mask wearing on public transport and in all public or shared spaces where social distancing was not possible. We estimate the effect of introducing the strict facial‑mask policy on the spread of COVID‑19 in Switzerland during this first phase of the pandemic in 2020, using the cantonal heterogeneity in facial‑mask policies. We adjust for social distancing behavior, weather, other non‑pharmaceutical policies and further variables. We estimate a significant reduction in the expected spread of COVID‑19 in the early pandemic if the strict facial‑mask policy is adopted.


Introduction
The coronavirus disease (COVID-19) pandemic presented large challenges to societies around the world.In the early pandemic in 2020, where the alpha variant of the SARS-CoV-2 virus was predominant, knowledge about the spread of the virus and about COVID-19 was scarce.In close collaboration with science, politicians and decision makers were trying to contain the spread of COVID-19 while avoiding unnecessary restrictions.Non-pharmaceutical interventions such as school closures, restrictions on public and private gatherings and enforcement of home office were employed.
In this paper, we focus on the effect of introducing a strict facial-mask policy on the containment of COVID-19 in Switzerland during the first phase of the pandemic in 2020.Studying the effect of the facial-mask policy is especially interesting as it is arguably one of the most debated policies.This might be partially due to the position that the Federal Office of Public Health of the Swiss Confederation (BAG) took in March 2020, communicating that healthy people do not need to wear facial masks. 1 A second reason for focusing on the facialmask policy is that it is a relatively cheap and noninvasive policy when compared to other non-pharmaceutical interventions.
After the country-wide lockdown in Switzerland from mid of March to end of April 2020, the federal government determined country-wide lower bounds on containment measures.The 26 Swiss cantons were given partial autonomy in introducing COVID-19 containment measures. 2On July 6, 2020, wearing facial masks on public transport was made obligatory and thus formed the country-wide baseline for facial-mask policies. 3Cantons could choose to enforce mandatory mask wearing on public transport and in all public or shared spaces where social distancing was not possible, which we henceforth refer to as the strict facial-mask policy.On October 19, 2020, the government enforced the strict facial-mask policy for all cantons.On December 21, 2020, vaccinations against COVID-19 were initiated in Switzerland, marking a massive change point in the pandemic.We thus consider the period from July 6, 2020, to December 20, 2020, as our period of analysis.During the whole period of analysis, a coordinated information campaign and international restrictions on entering the country were in place.On October 29, 2020, nationwide restrictions on public events were introduced by the government.On November 2, 2020, universities were closed in Switzerland.
We quantify the spread of COVID-19 by two different, but related response variables: the estimated effective reproductive number (Huisman et al., 2022) and the approximate weekly growth rate in supposed new infections (Chernozhukov et al., 2021).
To identify the effect of the strict facial-mask policy, we impose causal assumptions similar to Chernozhukov et al. (2021).We use a directed acyclic graph (DAG) to visualize the assumed causal relationships among the facial-mask policy variable, the response variable and different sets of control variables.By regressing the response variable on a suitable set of control variables determined in the DAG, we identify both the direct and total effect of the strict facial-mask policy variable on each of the two response variables.The direct effect, where direct means with respect to the variables we consider, captures changes in the response variable due to changes in the strict facial-mask policy variable, keeping the rest fixed.The total effect additionally captures changes in the response variable that are mediated through changes in the social distancing behavior.
We use publicly available data from different sources.The data have a balanced panel structure, as we have observations for each of the 26 cantons of Switzerland, measured during the 24 weeks considered.For both response variables, we assume a linear generating equation with a two-way error component, including a canton-and week-specific part.This model allows us to account for dependencies between the observations within cantons and weeks.Depending on the assumptions on the error components, a specific linear regression model is estimated with either a fixed-effects or random-effects approach to estimate the total and direct effect.
For both response variables with both fixed-and random-effects approaches, we obtain negative point estimates of both direct and total effect, most of them being significant.In other words, we estimate an expected reduction in the spread of COVID-19 in the early pandemic comparing the strict facial-mask policy to the government-determined country-wide baseline.We perform various sensitivity analyses to confirm the robustness of our results with respect to inevitable modeling choices.
To our knowledge, this is the first study that statistically analyses the effect of the strict facial-mask policy on the spread of COVID-19 in Switzerland during the early phase of the pandemic.Pleninger et al. (2022) analyse the combination of all COVID-19-related policies in Switzerland, as measured by the Stringency Index of the Konjunkturforschungsstelle (KOF).They do not examine the isolated effect of the strict facial-mask policy.For Switzerland and Germany, Huber and Langen (2020) study the impact of the timing of the non-pharmaceutical policy of lockdowns on COVID-19-related death and hospitalization rates.They find that an early introduction reduces said rates substantially.
In other countries however, the effect of facial-mask policies has been studied.For the USA, Chernozhukov et al. (2021) study the effect of mandatory mask wearing at the workplace.They estimate a significant reduction of the approximate weekly growth rate in supposed new infections by around 0.1.For Germany, Mitze et al. (2020) find a 15-75% reduction of new cases 20 days after the introduction of mandating facial masks in public transport and stores.Zhang et al. (2020) find that mandatory facial masks considerably slow down infection growth for the analysed entities of New York, Wuhan and Italy.There are studies confirming the functionality of facial masks in hindering transmission of viral droplets in laboratory settings (see e.g., Kähler and Hain (2020)).However, in observational settings it is the effect of facial-masks policies that is analysed, which includes mechanisms such as changes in risk-taking behavior and misuse of facial masks.
Direct comparison of our results to estimates in other countries and time spans is hard, due to different facialmask policies and/or general differences between countries and their population.In essence, however, our findings support the existing literature regarding the sign and significance of the effect of a strict facial-mask policy on the spread of COVID-19.
The article is organized as follows.Section 2 explains the data, Sect. 3 explains the causal assumptions and the causal effects of interest, and Sect. 4 describes the methodology.Section 5 presents the results.Finally, we conclude in Sect.6.

Data
Our data are measured in each of the i = 1, . . ., N = 26 cantons in Switzerland in each of the t = 1, . . ., T = 24 weeks in the period of analysis, ranging from July 6, 2020, until December 20, 2020.We use weekly data because data at the daily resolution would artificially increase the sample size with highly dependent observations, which would lead to faulty statistical inference.Our variable of interest, the so-called treatment variable, is the strict facial-mask policy variable, were 1 indicates the strict policy was in place while 0 indicates the government-determined country-wide baseline was in place.To quantify the spread of COVID-19, we use two different response variables: the first is the estimated effective reproductive number and the second is the approximate weekly growth rate in supposed new infections.In the control variables we only consider variables that varied between cantons and/or weeks in the period of analysis.
We can summarize the variables, observed in canton i = 1, . . ., N in week t = 1, . . ., T , into eleven groups:  Subsequently, we present the specific variables (with their short name in brackets) in each of the above categories (apart from U 1 i , U 2 t , and U 3 i,t ).We present a list of these 16 variables with their sources and descriptive statistics in Table 1.

Response variables ( Y i,t )
The first response variable is the estimated effective reproductive number (r) of Huisman et al. (2022).The estimated effective reproductive number at day d is an estimate of the expected number of secondary infections at day d caused by a previously infected person.Its estimation involves multiple steps: 1) estimation of the number of newly infected people based on the number of newly confirmed cases, adjusting for reporting cycles and irregular reporting practices, 2) a deconvolution step using suitable delay distributions between transmission and reporting of the case to infer the actual infection incidence, 3) application of the EpiEpstim method developed by Cori et al. (2013) to estimate the effective reproductive number from the time series of newly infected people.Only cases stemming from infections within Switzerland are used for estimation.In each canton i, to obtain an observation for week t, we average the daily values within week t.We denote this response variable by Y i,t = R i,t .We obtain the data from the Federal Office of Public Health of Switzerland. 5he second response variable is the same as the one used in Chernozhukov et al. (2021), the approximate weekly growth rate in supposed new infections from week t − 1 to week t.To specify this response, we define for each canton i and week t where C i,t represents the number of reported new cases in canton i in week t.Due to the delay between the reporting of a new case and the actual infection with the virus, G i,t does not represent the pandemic situation in week t but of a time period before t.Therefore, to obtain an approximation of the weekly growth rate in supposed new infections from week t − 1 to week t, we need to use a future value of G i,t .We employ the same time shift of two weeks to the future as Chernozhukov et al. (2021), resulting in the response variable Y i,t = G i,t+2 (growth.new.cases).We obtain the data on reported new cases from the Federal Office of Public Health of Switzerland. 6e plot both responses in Fig. 1 for all 26 cantons in the period of analysis.The plots and the Pearson correlation coefficients ρ highlight that the two responses are similar but there is no one-to-one correspondence between the estimated effective reproductive number Fig. 1 Time series of the two response variables, the estimated effective reproductive number R i,t , and the approximate weekly growth rate in supposed new infections from week t − 1 to week t, G i,t+2 , between July 6, 2020, and December 20, 2020, for each of the 26 cantons.For each canton, we report the Pearson correlation coefficient ρ between the two time series and the approximate weekly growth rate in supposed new infections from week t − 1 to week t.

Strict facial-mask policy variable ( M i,t )
The strict facial-mask policy variable (facial.mask) is our treatment variable.In each canton i, it has a value of 1 if the strict policy is applied and a value of 0 if the government-determined baseline policy is applied.During the period of analysis, a total of 10 cantons deviate from the baseline policy by preemptively introducing the strict facial-mask policy.At a daily resolution, there are 1807 observations where the strict policy is in place and 2561 where the baseline policy is implemented.For more details, see Table 1 and Fig. 5 in Appendix A.
To obtain an observation for week t, we average the daily values within week t.We obtain the data from KOF, via their CRAN R-package kofdata (Bannert et al., 2022).

Social distancing behavior variable ( B i,t )
As a proxy for social distancing behavior, we use household spending, similar to Pleninger et al. (2022).In each canton i, we consider the approximated growth rate of transactions in CHF with credit cards, debit cards and bank transfers from mobile phones of Swiss residents (growth.transactions).E-commerce is not considered.
We obtain the data from Monitoring Consumption Switzerland.7

Demographic variables ( D i )
Demographic variables of a canton i are given by population size (population) and the percentage of people with age ≥ 80 years (perc.o80).Wheaton and Thomp- son (2020) show that infection growth is also strongly linked to residential density, that is the number of people per km 2 of settlement area (density), which we also consider.These three variables can be considered constant for all weeks t.We obtain the data from the Federal Statistical Office of Switzerland.8

Holiday indicator ( H i,t )
In each canton i the daily holiday indicator (holiday) has a value of 1 if the majority of public schools in the canton are on holiday and 0 otherwise.To obtain an observation for week t, we average the daily values within week t.
We obtain the holiday data from the cantonal education departments.92.6 Meteorological variables ( W i,t ) Zoran et al. (2020) suggest that weather conditions are closely linked to the spread of COVID-19.In particular, they find that dry air supports the transmission of COVID-19.Their findings are supported by Zhu et al. (2020) and Fattorini and Regoli (2020).To incorporate these effects, we assemble weather data from a total of 100 weather stations from SwissMetNet,10 not including stations on mountains.For each canton, we compute average daily weather values by weighting observations of stations by the population size of the respective municipality.Lastly, the canton of Basel-Stadt is mapped to the weather of canton Basel-Land, Appenzell Innerrhoden and Appenzell Ausserrhoden to St.Gallen and Nidwalden to Obwalden, due to the lack of suitable stations.In doing so, we get a characterization of the daily weather, quantified by the number of minutes of sunshine (sunshine), the mean air temperature in • C (temperature), and the relative humidity in % (humidity).To obtain an observation for week t, we average the daily values within week t.

Non-pharmaceutical policy variables ( P i,t )
The KOF Stringency Plus Index (Pleninger et al., 2022), the Government Response Index and the Economic Support Index (Hale et al., 2021), 11 compose different sets of policy variables into one index with the aim of reflecting the stringency of a government in regards to COVID-19 policies.
We do not use these indices, but use the policy variables directly, where we only consider those that vary at least across cantons or across weeks over the period of analysis.In each canton i, these policy variables are daily indicators for workplace closings (work.closing),school closings (school.closings),restrictions on gatherings (rest.gatherings),cancelation of public events (canc.events),and testing policy (testing.policy).12These indicators have 2 to 5 levels, where a higher level indicates a stricter policy.To obtain an observation for week t, we average the daily values within week t.We obtain the data from KOF, provided through their CRAN R-package kofdata (Bannert et al., 2022).

Lagged response variables as covariates ( Y i,t ′)
We consider a lagged response variable as covariate (Y.lagged): We include the lagged response variable of which the value is known in week t, summarizing the information about the pandemic situation that is available and communicated to the public in week t.Knowledge about the current pandemic situation strongly drives the policy decisions and the behavior of the population.
If we consider the weekly average estimated effective reproductive number as response variable, that is Y i,t = R i,t , the information variable is given by Y i,t ′ = R i,t−3 , which corresponds to the estimated effec- tive reproductive number of three weeks ago.This lag is due to time delays between the infection, the start of the symptoms and the report of a case, such that in week t only the value of three weeks ago is known.
If we consider the approximate weekly growth rate in supposed new infections from week t − 1 to week t as response variable, that is Y i,t = G i,t+2 , the information variable is given by Y i,t ′ = G i,t , a value that we assume to be readily available in week t.

Causal assumptions and effects
We assume a directed acyclic graph (DAG) among the eleven sets of variables, which is a graphical model displaying the causal relationships among the variables.Based on this DAG we identify the direct and total effect of the treatment variable on the response variables.Our DAG is based on the DAG of Chernozhukov et al. (2021), while we adapt the causal structure to our setting.
The DAG13 is displayed in Fig. 2. The causal relationships between the variables are assumed to be the same for both response variables.The gray-colored nodes represent the strict facial-mask policy, our treatment variable, and the response variable.The white nodes represent the covariates.A directed edge A → B between nodes A and B represents a causal relationship, where a change in A results in a change in B.
The measured (= observed) covariates are displayed within black circles.Note that all variables, apart from Y i,t ′ , are indexed by the same week t.Further, we do not allow for spillover effects between cantons.The three types of unmeasured covariates U 1 i , U 2 t , and U 3 i,t are displayed within light gray circles.Their relations to the other groups of variables varies with the modeling approach: We always allow for an unmeasured common cause U 3 i,t of M i,t and P i,t (displayed with solid light gray edges).We also always allow for unmeasured causes of Y i,t that are constant within weeks or cantons, given by U 1 i and U 2 t (displayed with solid light gray edges into Y i,t ).However, only in the fixed-effects model, described upcoming in Sect.4, we allow U 1 i and U 2 t to be also causes of the other input variables (displayed by dashed light gray edges), such that they constitute unobserved confounders between M i,t and Y i,t .
The strict facial-mask policy variable M i,t is assumed to influence the response variable directly or indirectly.The blue edge, M i,t → Y i,t , represents the direct effect.The path in orange, M i,t → B i,t → Y i,t , represents the indirect effect, the effect of M i,t on the spread of COVID- 19 through its effect on the mediator14 B i,t .The sum of the direct and indirect effects results in the total effect of the strict facial-mask policy variable on the response variable.
The orange edge M i,t → B i,t , which is part of the indi- rect effect, corresponds to alternations in social distancing behavior of the public in canton i in week t due to changes in M i,t .For example, some people might increase social contacts because the obligation to wear a mask gives them a feeling of security.Note, we assume that the behavior variable in week t is only affected by the policy value in week t and not the past policy values.We argue this is justified since the salience of the pandemic, which could be represented by past policy values, is already Fig. 2 Assumed DAG on the eleven sets of variables.The blue edge represents the direct effect of the strict facial-mask policy M i,t on the response variable Y i,t .The blue edge in conjunction with the path along the orange edges represents the total effect of the strict facial-mask policy M i,t on the response variable Y i,t represented by the information variable Y i,t ′ , as described in Sect.2.8.In addition, the time period we investigate corresponds to the very early stage of the pandemic, thus most people were responding promptly to changes in the policy, that is their behavior in week t was only affected by the policy in week t.
We now give a more formal definition of the total, direct and indirect causal effect.For notational simplicity, we use the following abbreviation for a univariate random variable V i,t under a do-intervention (Pearl, 1995) on the facial-mask policy M i,t (treatment variable) where τ ∈ {0, 1} .The total causal effect (Pearl, 2000) of the change of the treatment variable M i,t from 0 to 1 on the response Y i,t is given by Note that the total causal effect describes the effect of M i,t on Y i,t considering all causal paths from M i,t to Y i,t .
Recall that Y i,t is the response that is already leaded to the future.In our DAG there is a mediator B i,t between M i,t and Y i,t , such that the total causal effect can be decom- posed into the sum of a direct and an indirect causal effect (Pearl, 2001).Explicitly, the total causal effect can be written as the sum of the total natural indirect effect (TNIE) (short name indirect effect) and the pure natural direct effect (PNDE) (short name direct effect) (Daniel et al., 2015), that is In the next section, where we assume a linear model, we also identify the TCE, TNIE, and PNDE with (products) of model coefficients.We will ultimately estimate the total effect as well as the direct effect though linear regressions, which is possible by using valid adjustment sets.

Methodology
In the following, for cantons i = 1, . . ., N and weeks t = 1, . . ., T let be the two generating equations of interest of the structural equation model (SEM) (Pearl, 2009)  The proof of Lemma 4.1 is given in Appendix D. By the decomposition TCE = TNIE + PNDE and Lemma 4.1 we get that TCE = δ 3 δ 4 + δ 1 .
Instead of estimating TNIE and PNDE directly via estimating the coefficients in Equations (1) and applying Lemma 4.1, we specify the following linear regression model with a two-way error component where Z i,t is a row vector of covariates and α i and γ t are explained in the next paragraph.If Z i,t is a valid adjust- ment set (Shpitser, 2012;Perković et al., 2018) for the effect of M i,t on Y i,t , then θ is equal to the total effect (TCE).We include all parents of Y i,t except B i,t and M i,t in Z i,t , which is a valid adjustment set for the total effect. 15f Z i,t is the parent set of Y i,t except M i,t , then θ is equal to the direct effect (PNDE).Thus, in our setting, the set Z i,t we use to identify the direct effect is given by the con- junction of B i,t and the valid adjustment set used to iden- tify the total effect.Hence, θ is our target of inference, either with the interpretation of the direct or the total effect.
To relate the Model (2) to the DAG, α i summarizes the effects of U 1 i on Y i,t , and similarly does γ t summarize (1) (2) the effects of U 2 t on Y i,t .Depending on the assumptions on the error components α i , γ t and ǫ i,t , Model (2) can be handled with either a fixed-effects or random-effects approach.
Generally, the fixed-effects approach is more robust than the random-effects approach, while the latter is more efficient in case all assumptions are met.We briefly outline both approaches in the upcoming sections; for more details, see, for example, Hansen (2022).The suitability of the linearity assumption in Equation ( 2) is assessed via Tukey-Anscombe plots (residual vs fitted values).

Fixed-effects approach
The fixed-effects approach assumes that the stochastic structure of α i and γ t is unknown and possibly arbitrar- ily correlated with M i,t and Z i,t .In this case, we call α i an unobserved cantonal fixed effect and γ t an unob- served weekly fixed effect.The incorporation of fixed effects accounts for unobserved common causes of the treatment and response variable that are either cantonspecific, but invariant across weeks, or week-specific, but invariant across cantons.
In particular, variables that are constant across cantons are national variables.In other words, by applying the fixed-effects approach we can control for national contextual information such as the total number of new cases in the whole country.
The variance-covariance structure of the error terms ǫ i,t can take many forms; see the upcoming Sect.4.1.1.However, ǫ i,t are always supposed to satisfy the exogene- ity assumption, for all i = 1, . . ., N and t = 1, . . ., T .This assumption implies no further unobserved confounding apart from α i and γ t .To eliminate α i and γ t , we apply the two-way within transformation, to u i,t ∈ {Y i,t , M i,t , Z i,t , α i , γ t , ǫ i,t } of Model (2) and obtain the following equation where the interpretation of θ remains as in Model (2).Finally, we estimate the coefficient θ by estimating the whole coefficient vector η = (θ, β) , using Ordinary Least Squares (OLS).In the following, we use the acronym FE for this approach.
Apart from the basic OLS estimate, we also compute a debiased estimate of θ , θBC , by cross-over Jackknife bias correction (Chernozhukov et al., 2021;Chen et al., 2020Chen et al., , 2019)).We employ this method as the estimation of dynamic linear panel models (i.e., including lagged instances of the response variable as covariates) using the fixed-effects estimator potentially yields a bias.The debiased estimate is given by where θ is the OLS regression coefficient based on the entire sample and θS j is the estimated coefficient com- puted on the sub-sample S j , j = 1, 2 .The sub-samples S 1 and S 2 are defined, as in Chernozhukov et al. (2021), by and respecting the natural ordering of the weeks.Since there is no natural ordering of the cantons, we repeat the above procedure 500 times, where each time the cantons are randomly permuted.The final estimate is then the average of the 500 debiased estimates.In the following, we use the acronym DFE for this debiased fixed-effects approach.
We now detail the specific sets of control variables Z i,t , which depend on whether we aim at estimating the direct or the total effect of the strict facial-mask policy on the spread of COVID-19.Due to the within transformation (4), apart from the unobservable α i and γ t , also all observable week-constant variables, such as policy indicators that do not vary over the period of analysis, and canton-constant variables, such as population or density, drop out of Z i,t .In the case of the direct effect, we must thus regress the response variable on all its remaining parents, i.e., Z i,t = (B i,t , H i,t , W i,t , P i,t , Y i,t ′ ) .In the case of the total effect, we need to remove the variable B i,t from the set, and obtain the valid adjustment set Z i,t = (H i,t , W i,t , P i,t , Y i,t ′ ) .Concretely, the following var- iables are contained in each category:

Construction of confidence intervals
To construct 95%-confidence intervals for the coefficient θ , we use the normal approximation, where θ is the first entry in the estimated coefficient vec- tor η = ( θ, β) , obtained either through the FE or DFE approach, and Var( θ) is the corresponding estimated variance.The estimation of the variance requires careful consideration due to the panel structure of our data.
Let in the following be the observed covariates of canton i and week t, where P := 1 + |Z i,t | .Further let be the stacked covariate matrix.The conditional variance-covariance matrix of η can be written as where and with ( 6) We denote by ǫ the empirical residuals obtained through the FE or DFE approach.
The variance of η is then estimated by plugging in an estimate of into Equation ( 7), resulting in Var( η) = Q −1 �Q −1 . 16e use the following seven estimators for (with their short name in brackets): 1) Heteroscedastic-Robust (HC3) 2) One-Way Clustering on Canton (Canton) 3) One-Way Clustering on Week (Week) 4) Two-Way Clustering on Canton and Week (Canton-Week) 5) Newey-West (NW) (Newey and West, 1987) 6) Chiang-Hansen (CH) (Chiang et al., 2022) 7) Informal Own Specification (Own) (motivated by (Colella et al., 2019)) The estimators correspond to different assumptions on the structure of Cov(ǫ i,t , ǫ j,s ) for i, j = 1, . . ., N and t, s = 1, . . ., T .These assumptions reflect the clustered and/or heteroskedastic and/or autocorrelated nature of the error terms.The details can be found in Appendix C.

Random-effects approach
The random-effects approach assumes that the components of the error, α i , γ t , and ǫ i,t , satisfy the following exo- geneity assumptions for all i = 1, . . ., N and t = 1, . . ., T .These assumptions imply that α i , γ t , and ǫ i,t are uncorrelated with M i,t and Z i,t , which implies the strong assumption of no unob- served confounding.In particular, in contrast to the fixed-effects approach the random-effects approach does not control for unobserved week-or canton-specific confounders.The correlation within weeks and within cantons in the composite error v i,t = α i + γ t + ǫ i,t is accounted for by the Feasible Generalized Least Squares (FGLS) approach, where we assume the following structure of the variance-covariance matrix, where σ α i > 0 , σ γ i > 0 and σ ǫ i > 0. With this approach we again obtain an estimator of the whole coefficient vector η = (θ , β) and extract the esti- mator of θ .To construct 95%-confidence intervals for θ , we again use Equation ( 6), where we apply Formula ( 7) with a plug-in estimator of .In the following, we use the acronym RE for the point estimator as well as the confidence interval of this approach.
In contrast to the fixed-effects approach, the variables D i that are part of the parents of the response variable, do not drop out of Z i,t .Furthermore, the set of policy variables now includes testing.policy,which was dropped in the FE approach as it is constant within cantons over the period of analysis and is thus eliminated by the within transformation.In the case of the direct effect we obtain Z i,t = (B i,t , D i , H i,t , W i,t , P i,t , Y i,t ′ ) .In the case of the total effect, we need to remove the variable B i,t from the set, and obtain the valid adjustment set Z i,t = (D i , H i,t , W i,t , P i,t , Y i,t ′ ) .Concretely, the following variables are contained in each category: • B i,t : growth.transactions,• D i : population, density and perc.O80, • H i,t : holiday, • W i,t : sunshine, temperature and humidity, • P i,t : work.closing,school.closing,rest.gatherings, canc.events and testing.policy, and the variable Y i,t ′ is specific for each of the two response variables, see Sect.2.8.

Sensitivity analysis
We perform an extensive sensitivity analysis to investigate the robustness of our results toward changes in our methodology and data preparation.We consider further altering the confidence interval construction, extending the set of information variables, joining half-cantons into one respective canton, changing the timing of information variables, discarding outliers, varying the time period, performing the doubly robust double machine learning estimation, and using the lag-1 response variable as a further covariate.

Implementation
We implement the methodology in the software R, using the packages plm, sandwich, lmtest and DoubleML.

Results
For each of the three modeling approaches (fixed effects FE, debiased fixed effects DFE, and random effects RE), we distinguish between the direct and total effect of introducing a strict facial-mask policy (as compared to the government-determined country-wide baseline policy) on the two response variables.This results in 12 estimated effects along with their (various) confidence intervals.
Figure 3 shows the results for the direct effect of the strict facial-mask policy.We see that all 95% confidence intervals apart from two lie to the left of zero.Thus, the direct effect of the strict facial-mask policy was estimated to be significantly negative in almost all modeling approaches, implying a significant reduction in the expected spread of COVID-19 in the early pandemic if the facial-mask policy is changed from the governmentdetermined country-wide baseline to the strict facialmask policy.Figure 4 shows the results for the total effect of the strict facial-mask policy, and we see that, apart from the RE approach, the overall picture is very similar to the direct effect.In the model with response variable r, the point estimators of both the direct and total effect lie between −0.22 and −0.16 .In the model with response variable growth.new.cases, the point estimators of both the direct and total effect lie between −0.29 and −0.17.
The fact that the estimated direct and total effects are very similar suggests that either the facial-mask policy worked mainly through the direct path by reducing the transmissibility of COVID-19, or the behavioral variable growth.transactions is not capturing the important changes in social distancing behavior.It is plausible that the latter is at least part of the explanation, as it is for example unclear how this variable can reflect changes in behavior in private spaces.In fact, Chernozhukov et al. (2021), who employ a closely related empirical approach for the U.S., do not find the indirect effect to be significant either.
Overall, the FE and the DFE approach provide very similar results.This indicates that the dynamic structure of the panel model does not induce a large estimation bias.The substantial difference, however, between the RE and FE approach suggests that controlling only for demographic variables to capture time-invariant cantonal information as in the RE model is insufficient.Other unobserved canton-or week-specific confounders for which we control for with the fixed-effects approach, seem to play an important role.
For all 12 modeling approaches, the Tukey-Anscombe plots (residuals vs fitted values), displayed in Fig. 6 in Appendix B, show no evidence against the assumption of linearity.The results of the extensive sensitivity analysis are shown in Table 2 in Appendix E. In line with our main analyses, in all of the cases considered, we obtain a negative point estimate of the total causal effect, ranging from −1.08 to −0.04 .The estimate is deemed significantly different from 0 at α = 0.05 in 31 out of the 41 (76%) sen- sitivity analyses conducted.

Conclusion
We analyse the effect of the strict facial-mask policy on the spread of COVID-19 during the early phase of the pandemic in Switzerland, using the cantonal heterogeneity in facial-mask policies from July 2020 to December 2020.The obligation to wear a facial mask in public transportation formed the government-determined country-wide baseline for facial-mask policies.The strict facial-mask policy corresponds to mandatory mask wearing on public transport and in all public or shared spaces where social distancing is not possible.
We estimate a significant reduction in the expected spread of COVID-19 in the early pandemic if the facialmask policy is changed from the government-determined country-wide baseline to the strict facial-mask policy.Importantly, we do not investigate whether the estimated effect sizes are relevant in any given social context.
The correctness of the causal assumptions is crucial to the whole analysis.Hence, the results should be treated with caution and interpreted in light of the mostly untestable assumptions inherent in our modeling approaches, described in Sects.3 and 4. In particular, we emphasize that the assumption of no unmeasured confounding imposed using the RE approach is very delicate and most likely does not hold.As such one can consider the RE approach more as a sensitivity check.It is also important to stress that in an observational study like the one at hand it is almost impossible to control for all confounders that vary with weeks and cantons.It is highly likely that our effects of interest are confounded by further unobserved social, cultural and economic traits that may differ between cantons and weeks.Implementation of and compliance with non-pharmaceutical policies like the strict facial-mask policy are subject to cultural norms, political backgrounds and defiance against political authorities and policy makers, to just name a few examples of such factors.3 Point estimators and 95%-confidence intervals ( CI 95% ) for the direct effect of the strict facial-mask policy variable (facial.mask) on the estimated effective reproductive number (r) or the approximate weekly growth rate in supposed new infections (growth.new.cases) for different modeling approaches (described in Sect.4): the fixed-effects approach (FE), a debiased variant of the fixed-effects approach (DFE), and a random-effects approach (RE).For FE and DFE, we construct seven different confidence intervals reflecting different assumptions on the dependencies in the data, as described in Sect. 4 and Appendix C Fig. 4 Point estimators and 95%-confidence intervals ( CI 95% ) for the total effect of the strict facial-mask policy variable (facial.mask) on the estimated effective reproductive number (r) or the approximate weekly growth rate in supposed new infections (growth.new.cases) for different modeling approaches (described in Sect.4): the fixed-effects approach (FE), a debiased variant of the fixed-effects approach (DFE), and a random-effects approach (RE).For FE and DFE, we construct seven different confidence intervals reflecting different assumptions on the dependencies in the data, as described in Sect. 4 and Appendix C Furthermore, our results are conditional on the characteristics of the time period between July and December 2020 during the early pandemic, when the alpha variant of the SARS-CoV-2 virus was predominant and no vaccinations were available yet.Also, only parts of the three seasons summer, autumn, and winter are represented in the data.However, even though it is hard to directly compare our results to results in other countries, with other facial-mask policies and other time periods, our findings are largely in line with those of other research groups (see, e.g., Chernozhukov et al. (2021), Mitze et al. (2020), andPleninger et al. (2022)).
Appendix A: Evolution of the strict facial-mask policy Fig. 5 Evolution of the strict facial-mask policy during the second half of 2020: A value of 0 corresponds to the government-determined baseline policy while a value of 1 indicates a strict facial-mask policy as described in Sect.2.2.The red line denotes the government-determined baseline policy where Ri := T t=1 X i,t ǫi,t .

3) One-Way
Clustering on Week (Week) ( Cov(ǫ i,t , ǫ j,s ) = 0 iff t = s ): where Ŝt := N i=1 X i,t ǫi,t .4) Two-Way Clustering on Canton and Week (Canton-Week) ( Cov(ǫ i,t , ǫ j,s ) = 0 iff t = s or i = j ): 5) Newey-West (NW) (Newey and West, 1987) ( Cov(ǫ i,t , ǫ j,s ) = 0 iff i = j , where Cov(ǫ i,t , ǫ j,s ) is decreasing with |t − s| increasing): . where 1) are triangular weights and M = ⌊T 1/4 ⌋. 6) Chiang-Hansen (CH) (Chiang et al., 2022) ( Cov(ǫ i,t , ǫ j,s ) = 0 , where Cov(ǫ i,t , ǫ j,s ) for arbitrary i = j is decreasing with |t − s| increasing): The esti- mator ˆ 6 is given by Equation ( 12), where w(m, M) are the triangular weights as in NW and M is data driven.7) Informal Own Specification (Own) (motivated by (Colella et al., 2019)) ( Cov(ǫ i,t , ǫ j,s ) = 0 i = j or i and j are neighboring cantons, where Cov(ǫ i,t , ǫ j,s ) is decreasing with |t − s| increasing): where and the weights ω itjs specify the dependence between two error terms ǫ i,t and ǫ j,s and are given by and (12) the calculation of growth.new.casesand growth.transactions, where we sum the number of new cases or transactions, respectively, from the two half-cantons, and then calculate the approximate growth rates as before.We perform the analysis for the FE, DFE and RE approaches.

Timing of information variables
We examine the influence of the lag of the information variable that is part of the lagged response variable Y i,t ′ .For the response r, we change the lag of the infor- mation variable from t ′ = 3 to t ′ = 2 , resulting in the lagged response variable Y i,t ′ = R i,t−2 .For the response growth.new.cases,we change the lag of the information variable from t ′ = 2 to t ′ = 3 , resulting in the lagged Table 2 Results of the sensitivity analyses for the estimation of the total effect for both responses r and growth.new.cases.For the FE, DFE and double machine learning approach, the 95%-confidence intervals ( CI 95% ) are constructed via two-way clustering on the canton and week (Canton-Week).For the RE approach, we compute the standard errors as described in Sect.4.2.For the sake of comparison, we add the main results of the main text at the top of the response variable Y i,t ′ = G i,t−1 .We perform the analysis for the FE, DFE and RE approaches.

Outliers
We fit the FE approach as described in Sect.4.1, compute the Cook's distance for each observation and exclude observations with a corresponding Cook's distance > 4 × (NT ) −1 .We then refit the model using the FE approach based on the reduced sample.In the model where r is the response, 28 observations are excluded.When growth.new.cases is the response, 27 obser- vations are excluded.Since the calculation of the Cook's distance in the DFE and RE approaches is not straightforward, we restrict this robustness check to the FE model.

Very short sample period
We restrict the period of analysis to the time between August 21, 2020, and October 19, 2020.On August 21, 2020, the canton of Neuchâtel was the first canton to introduce the strict facial-mask policy.On October 19, 2020, the federal government enforced the strict facialmask policy nationwide.This period is very short with only T = 9 weeks which constitutes a problem for the DFE and RE approach.We thus perform the analysis only for the FE approach.

Short sample period
We restrict the period of analysis to the time between July 6, 2020, and October 18, 2020.During this timewindow, the cantons were free to choose between the strict facial-mask policy and the government-determined country-wide baseline policy.With only T = 15 weeks this period is also short.

Double machine learning approach
We relax the assumption of a linear regression model to a partially linear regression model, where the effect of the adjustment set Z i,t on Y i,t is nonparametric.We use the adjustment set of the RE approach.Estimation is done via the double machine learning framework (Chernozhukov et al., 2018), which is a doubly robust method.This approach assumes the following model, where E[ν i,t | Z i,t ] = E[ǫ i,t | M i,t , Z i,t ] = 0. We learn the functions m(•) and g(•) with random forests.See Cher- nozhukov et al. (2018) for more details.We implement the procedure with the R-package DoubleML.
M i,t = m(Z i,t ) + ν i,t , Y i,t = θ M i,t + g(Z i,t ) + ǫ i,t ,

Lag-1 response variable as covariate
In addition to the information variable Y i,t ′ , we con- sider for both response variables an additional lag of the response variable as covariate.We include the lag-1 response variable Y i,t−1 in the models, since for both response variables we observe a possibly nonzero autocorrelation at lag one for some cantons.

E.3: Results sensitivity analysis
We show the results of all considered sensitivity analyses in Table 2.We obtain a negative point estimate of the total effect, ranging from −1.08 to −0.04 .The estimate is deemed significantly different from 0 at 5%-level in 31 out of the 41 (76%) sensitivity analyses conducted.
We provide some general remarks on the results.As stressed earlier, the RE approach cannot control for unobserved confounding and is therefore less trustworthy than the FE and DFE approaches.In some of the sensitivity analyses, the DFE approach produces estimates that vary substantially from the FE approach.Due to the properties discussed in Sect.4.1, the DFE approach is more trustworthy.For the DML approach, we obtain much larger estimated effect sizes of the total effect.However, the uncertainty is very large, so the results are not significant.Using this methodology, we also cannot control for unmeasured confounding.As there is no clear pattern apparent in the Tukey-Anscombe plots in Fig. 6, we suspect that the difference in point estimates between the linear methods and the nonlinear DML methodology is mostly driven by the latter's lack to control for unobserved confounding-and not by an underlying nonlinear relationship between the facial-mask policy and the response variables.

Fig.
Fig.3Point estimators and 95%-confidence intervals ( CI 95% ) for the direct effect of the strict facial-mask policy variable (facial.mask) on the estimated effective reproductive number (r) or the approximate weekly growth rate in supposed new infections (growth.new.cases) for different modeling approaches (described in Sect.4): the fixed-effects approach (FE), a debiased variant of the fixed-effects approach (DFE), and a random-effects approach (RE).For FE and DFE, we construct seven different confidence intervals reflecting different assumptions on the dependencies in the data, as described in Sect. 4 and Appendix C 4

Table 1
Short name, description, descriptive statistics and data source for all the variables and responses used in the analysis.For more details regarding the interpretation of the values of the non-pharmaceutical policy variables, see the description under https:// github.com/ OxCGRT/ covid-policy-track er/ blob/ master/ docum entat ion/ codeb ook.md, last visited August 21, 2023 compatible with our DAG.The set V i,t is the parent set of Y i,t with- out B i,t and M i,t , Ṽ i,t is the parent set of B i,t without M i,t , and ǫ i,t and ν i,t are error terms with expectation zero.The next small lemma allows us to identify the TCE, TNIE and PNDE in our context via the regression coefficients in Equations (1).The TNIE and the PNDE can be expressed in terms of the regression coefficients in Equations (1) via • B i,t : growth.transactions,• H i,t : holiday, • W i,t : sunshine, temperature and humidity, • P i,t : work.closing,rest.gatheringsand canc.events, and the variable Y i,t ′ is specific for each of the two response variables, see Sect.2.8.