 Original article
 Open Access
 Published:
Partial identification of nonlinear peer effects models with missing data
Swiss Journal of Economics and Statistics volume 158, Article number: 15 (2022)
Abstract
This paper examines inference on social interactions models in the presence of missing data on outcomes. In these models, missing data on outcomes imply an incomplete data problem on both the endogenous variable and the regressors. However, getting a sharp estimate of the partially identified coefficients is computationally difficult. Using a monotonicity property of the peer effects and a mean independence condition of individual decisions on the missing data, I show partial identification results for the binary choice peer effect model. A Monte Carlo exercise then summarizes the computational time and the accuracy performance of the interval estimators under some calibrations.
Introduction
In models of social interactions, the individual behavior depends both on individual characteristics and on aggregate characteristics of members of the group of which the agent is a member (Advani & Malde, 2018), integrating sociological concepts and economic thinking (Blume et al. 2010). Important applications of peer effects models have been developed for education (Sacerdote, 2001, Cipollone & Rosolia, 2007, Lalive & Cattaneo, 2009, Sojourner, 2013, Ammermueller & Pischke, 2009, Madeira, 2018), health behaviors (Bruhin et al., 2020, Bailey et al., 2021), employment (Roth, 2020) or migration (Slotwinski et al., 2019).
This work analyzes inference on nonlinear peer effects models in the presence of missing data. There are many situations (for example, drug use, teenage risk profiles, sexual behavior) where respondents might not be willing to reveal their personal experience, creating problems of missing data in the study of social interactions in these settings. Most social interaction studies use the average outcome of each group as an explanatory variable; therefore, missing outcome data imply that we face both a problem of missing outcome values and an undetermined regressor, aggravating the identification problem. It is, therefore, important to extend the robustness of the social interaction estimators to scenarios of missing data.
In the linear case, Manski (1993, 2000) showed that it is difficult to distinguish between the effects of endogenous social interactions and the impact of measures of exogenous group quality. Several works analyze identification of peer effects in the linear case (Advani & Malde, 2018, Sojourner, 2013, Ammermueller & Pischke, 2009). These works analyze partial and point identification of the linear peer effects model with missing data on outcomes. Sojourner (2013) shows that if individuals are randomly assigned to each group; then, it is possible to pointidentify the true coefficient for the peer effects variable. Ammermueller and Pischke (2009) show that missing data on peers create measurement error for the group variables and using an analysis similar to Hausman (2001) find upper and lower bounds for the true peer effect coefficient of the linear model. The authors then apply an instrument for the peer effects variable to obtain point identification.
However, several economic decisions such as discrete choices require nonlinear models (Blume et al., 2010). Nonlinear settings for peer effects include smoking behavior (Krauth, 2006), high school truancy, cell phone ownership (Kooreman & Soetevent, 2007) or college life (Sacerdote, 2001). Brock and Durlauf (2007) present a very general model of peer effects in a discrete choice setting, showing that it is possible to identify asymptotically both exogenous and endogenous peer effects under the assumption of random group assignment and no missing data.
I extend the identification results of Brock and Durlauf (2002, 2007) to the case of missing outcomes. Using an incomplete data approach proposed by Horowitz and Manski (2006), it is possible to get sharp bounds for the coefficients of this model with missing data, but this method can be timeconsuming for larger peer groups. Therefore, I propose an estimator to obtain nonsharp bounds for this model based on Manski and Tamer (2002) interval regressors’ approach. My suggested approach extends the interval regressors approach of Manski and Tamer (2002) by showing that it can easily be extended for a case with both interval regressors and missing outcomes. If a discrete choice model verifies three important properties—interval values (I), mean independence (MI), monotonicity (M)—then it is possible to obtain nonsharp bounds for the true coefficients of the model. The interval values (I) regressor assumption is trivially satisfied by discrete choice models with peer effects, since the average of the discrete choices in a peer group is bounded between 0 and 1. The mean independence (MI) is also quite natural in the peer effects model, since it implies that the width of the identification interval does not matter if one conditions on the true value of the average outcome. This assumption appears natural if the agents know the true values of the average choices in their peer groups even if the econometrician only observes the group with some missing data. The third assumption, monotonicity (M), implies that the average outcome of each agent is increasing with the average group outcome. This assumption is trivially satisfied in the parametric discrete choice models, and it can also be consistent with many semiparametric or nonparametric models. A minimum distance estimator is proposed. I also propose a bootstrap method to estimate confidence intervals for the true coefficients. A similar estimator can be easily applied to any parametric model with missing outcomes and interval regressors.
I then show a set of Monte Carlo exercises with fully observed information to characterize the accuracy of the peer effects estimators even if the identification assumptions are satisfied. The Monte Carlo exercises include a wide range of different group sizes and different sample sizes for both the logit and the linear case. The Monte Carlo simulations include estimators for the cases of closed peer groups (groups in which all members are peers of each other) and nonclosed groups (with each individual having peers from outside the group). Furthermore, I consider the case in which the individual is part of his own peer group and the case in which the individual is not part of its own peer group. The linear case is only shown for nonclosed groups (which is required for identification, as shown in Bramoullé, Djebbari and Fortin 2009).
I then apply the Manski–Tamer and Horowitz–Manski estimators to the logit peer effects models in the presence of missing outcomes. The results show that the Manski–Tamer estimator can be hundreds of times faster than the Horowitz–Manski estimator even with just a few missing values such as 10 missing outcomes. The computation time of the Horowitz–Manski estimator could be much larger with a few additional missing observations.
This work focuses on the case in which missing information on missing outcomes also implies missing information or an interval regressor for the peer effect in order to be clear about this effect. This approach could also be easily generalized to other cases that also include missing control variables for the peer group members and which would also imply missing regressors or interval regressors. The case for other missing regressors would merely imply more combinations of possible datasets for the missing values for the Horowitz and Manski (2006) and additional interval regressors for the Manski and Tamer (2002) approaches suggested in this article.
This article is organized as follows: Section 2 shows how the interval regressors approach of Manski and Tamer (2002) can be easily extended for a case that also has missing outcomes. Section 3 explains the calibration of the Monte Carlo exercises. Section 4 then summarizes the Monte Carlo results in the absence of missing data. The section starts by showing that the exogenous coefficients (given by the constant, exogenous variable affecting individual behavior, contextual effects group variable) have a fast convergence to the true parameter values, whether the model has endogenous peer effects or not. The same simulations show that the endogenous peer effect coefficient has a much slower convergence to its true value, presenting a high bias and standard deviation, even without any missing data. Section 5 shows the Monte Carlo exercises with missing data, analyzing the performance of the Horowitz–Manski and Manski–Tamer approaches. The results show that the Horowitz and Manski (2006) approach presents a considerable computational time. The section also summarizes the estimated interval results for all the coefficients, including both the endogenous peer effects parameter and the exogenous coefficients parameters. Finally, Sect. 6 summarizes the main results and an appendix shows the proofs of the main propositions.
Identification of discrete choice models with peer effects
A parametric discrete choice model
Let \(y_{i}\in \{0,1\}\) represent individual i outcomes, \(X_{i}\in R^{K}\) the individual exogenous variables, \(g=1,....,G\) denotes the groups, and \(Y_{g}\in R^{Q}\) is the set of exogenous variables for each group. I represent average group behavior as \(p_{i,g}=\dfrac{1}{n_{g(i)}} \sum \limits _{j=1,j\in g(i)}^{n}1(y_{j}=1)\), where \(n_{g(i)}=\sum \limits _{j=1}^{n}1(j\in g(i))\) is the number of people in agent i’s group.
Brock and Durlauf (2002) presented a parametric multinomial model of choice in the presence of social interactions, giving conditions for identification in the presence of fully observed data. Individual choice is determined by latent utility, \(V_{i}=h_{i}+Jp_{i,g(i)}\epsilon _{i}\). The term \(\epsilon _{i}\) represents an idiosyncratic term (such as an individual taste factor) unobserved to the econometrician. \(\epsilon _{i}\) has a known monotonic parametric distribution, \(F_{\epsilon }(.)\). In this specification, \(h_{i}\) represents the components of utility affected only by exogenous variables, \(h_{i}=k+bX_{i}+dY_{g(i)}\). The observable group variables \(Y_{g(i)}\) for the contextual peer effects, correlated group effects or neighborhood variables (Manski, 1993) can include the mean values of the individual variables of the other group members. One can further specify \(Y_{g(i)}=\dfrac{1}{n_{g(i)}} \sum \limits _{j=1,j\in g(i)}^{n}X_{j}\) in the case of individuals that are part of their own peer group, or in alternative, \(Y_{g(i)}=\dfrac{1}{n_{g(i)} 1}\sum \limits _{j=1,j\in g(i),j\ne i}^{n}X_{j}\) in the case in which the individual i is excluded from its own peer effect. In this model, the probability of choosing a positive outcomes is given by:
It is possible that several values of \(p_{i,g}\) might solve expression (1) due to multiple equilibria corresponding to selfconsistent behaviors in the population (Brock & Durlauf, 2002).
This discrete choice model is quite parsimonious and includes all the main features of peer effects models. The term \(Y_{g}\) is usually interpreted as “contextual group effects” (Manski, 1993), meaning the gains each member of the group has due to exogenous characteristics of the group. For example, students of a certain school could be doing well because the school has good facilities and teachers. The term \(p_{i,g}\) represents the “endogenous group effect” since it represents the feedback effect that group performance has on each one of its members. In this case, students of a certain school may be more likely to apply for college because the other students are also applying.
Brock and Durlauf (2002) show that this model is pointidentified by assuming two conditions:

(i)
\(X_{i}\), \(Y_{g(i)}\), \(p_{i,g}\) are not collinear and \(Y_{g}\) has unbounded support;

(ii)
\(\epsilon _{i}\) are independent and identical distributed across individuals and are independent of \(X_{i}\) and \(Y_{g(i)}\).
The independence assumption of \(\epsilon _{i}\) can be relaxed when the group g(i) of each individual is not entirely closed (for example, your neighbors have neighbors that are not neighbors of you). In this case, the “peers of your peers” provide extra variation that can be used for identification (Bramoullé et al. 2009). For now, this article will keep the assumption that the peer groups are closed and therefore \(j\in g(i)\) implies that \(i\in g(j)\).
Partial identification in the case of missing outcome information
If all the data are observed, one can estimate the parameters by using a maximum likelihood estimator (MLE):
Now assume \(z\in \{0,1\}\) determines when y is unobserved or observed. For simplicity, I assume that \(X_{i}\) and \(Y_{g(i)}\) are always observed, but the missing information on some outcomes \(y_{i}\) implies that \(p_{i,g}\) is not pointidentified, although \(p_{i,g}\) can be bounded within a sharp interval. The number of missing observations in the sample is given by \(n_{z=0} =\sum \limits _{j=1}^{n}1(z_{j}=0)\). Let \(m(i)=\sum \limits _{j=1}^{i}1(z_{j}=0)\) denote the order of \(y_{i}\) in the sample of observations with missing values, with \(m(i)=\varnothing\) if \(z_{i}=1\). I define \(y_{z=0}\equiv \{y_{m} ,(m=1,...,n_{z=0})\}\) as the vector collection of all missing outcome values in the sample. This vector belongs to the space given by \(\Xi \equiv \{0,1\}^{n_{z=0}}\) . Let \(a\in \Xi\) be a feasible vector for the missing outcome values. I define \(y_{i}^{a}=y_{i}\) if \(z_{i}=1\) and \(y_{i} ^{a}=a(m(i))\) if \(z_{i}=0\). In the same way, I define \(p_{i,g}^{a}=\dfrac{1}{n_{i}}\sum \limits _{j=1,j\in g(i)}^{n}1(y_{j}^{a}=1)\) as the group average outcome under the vector of missing values \(y_{z=0}=a\).
Let \(H(\theta )\) be the identified set of \(\theta\). Elements of this set can be identified by repeatedly plugging in feasible values for the missing data, \(a\in \Xi\) , and computing the parameters of interest. We can therefore study the set of values consistent with the observed data and the assumed model given any feasible distribution of the missing data. For a specific combination of the missing data values, \(y_{z=0}=a\in \Xi\), one can estimate the coefficients \({\hat{\theta }}(a)=\,\arg \max _{\theta } {\displaystyle \sum \limits _{i=1}^{N}} y_{i}^{a}\ln (F_{\epsilon }(h_{i}+Jp_{i,g}^{a}))+(1y_{i}^{a})\ln (1F_{\epsilon }(h_{i}+Jp_{i,g}^{a}))\). It is possible therefore to obtain \(H(\theta )\) as \({\hat{H}}(\theta )\equiv \{{\hat{\theta }}(a),\) for all \(a\in \Xi \}\). For finite samples, it is possible to obtain confidence intervals for the true coefficient parameters \(\theta\) by using the bootstrap procedure of Imbens and Manski (2004). Another alternative is to get confidence intervals for the identified set, \(H(\theta )\), by using a subsampling procedure described in Chernozhukov et al. (2007).
This plugin strategy is fairly general and easy to implement. It can essentially work on any model that is proven to be identified and solvable. The difficulty of this approach, however, is that it is computationally demanding. Notice that if \(\Pr (z=0)>0\), then the set of alternative values increases exponentially with N. Therefore, as N grows large this approach may require several alternative computations of \({\hat{\theta }}(a)\) to obtain a good estimate of \(H(\theta )\).
Here, I show that less computationally intensive strategies are possible. Note that the Brock and Durlauf model is monotonic in the group outcome, \(p_{i,g}\). Also, the regressor \(p_{i,g}\) of each individual can be estimated to be inside an interval. Assume for simplicity we keep the group intervals \(p_{i,g} \in [p_{i,g}^{L},p_{i,g}^{U}]\) fixed, but we allow the values of each missing individual outcome \(y_{i}\mid z_{i}=0\) vary between \(\{0,1\}\). This approach gives us nonsharp bounds for the coefficients \(\theta\), since we are not taking into account that changing the individual \(y_{i}\mid z_{i}=0\) also has an effect on the interval of \(p_{i,g}\). However, together with the monotonicity property of the peer effects model it is possible to specify a convenient estimator for these nonsharp bounds. For this reason, I generalize the approach of Manski and Tamer (2002) to the case of both missing outcomes and interval regressors.
Using monotonicity and mean independence assumptions
In the discrete choice case, outcomes y are bounded between 0 and 1. This guarantees the interval property (I) of the regressor, \(p_{i,g}\), so we have \(p_{i,g}\in [0,1]\). It is easy to specify sharp bounds for the group average, \(p_{i,g}=E[y\mid g(i)]\). Let us define \(p_{g(i)}^{L}=E_{L}[y\mid g(i)]=E[y\mid g,z=1]P(z=1\mid g)\) and \(p_{g(i)}^{U}=E_{U}[y\mid g(i)]=E[y\mid g,z=1]P(z=1\mid g)+P(z=0\mid g)\). Then, the law of total probability gives us sharp bounds for the value of the group average, \(p_{i,g}\), as expressed in Proposition (1):
Proposition 1
\(p_{g(i)}^{L}\le p_{i,g}\le p_{g(i)}^{U}\).
Now, I denote \(V_{x}^{g}=(p_{i,g},Y_{g},x)\), \(W_{x}^{g}=(p_{g(i)}^{L} ,p_{g(i)}^{U},Y_{g},x)\), \(W_{0}^{g}=(p_{g(i)}^{L},Y_{g},x)\), and \(W_{1} ^{g}=(p_{g(i)}^{U},Y_{g},x)\). I will show that under certain conditions it is possible to obtain a partial interval for \(E[y\mid V_{x}^{g}]\) by using \(E[y\mid W_{0}^{g}]\) and \(E[y\mid W_{1}^{g}]\).
Note that the model defined in expression (1) is mean independent of the missing data properties \(z_{j}\). This guarantees the following mean independence (MI) property:
(MI) \(F_{\epsilon }(.\mid W_{x}^{g},p_{i,g})=F_{\epsilon }(.\mid p_{i,g} ,Y_{g},x)=F_{\epsilon }(.)\), where the first equality is given by expression (1) and the second one is obtained after applying assumption (ii) which specifies \(\epsilon _{i}\) as independent and identical distributed across individuals and independent of \(X_{i}\) and \(Y_{g(i)}\).
The MI assumption is not testable, but seems realistic under many scenarios. If individuals actually observed average group behavior and act using this knowledge, then individual outcomes are not affected by missing data. For example, teenagers may know how many smokers or drug users exist in their group, even if the researcher does not. However, the MI assumption could be invalid if the individuals react more or less to the choices of their unreported peers.
Expression (1) also has the group endogenous effect, J, specified as a constant parameter. Since J is constant and \(F_{\epsilon }(.)\) is monotonic, this guarantees the monotonicity (M) property of \(E[y\mid V_{x}^{g}]\) in all its arguments. Therefore, the IMMI (interval, monotonicity, and mean independence property) described in Manski and Tamer (2002) holds for this model.
By applying the law of total probability, we get sharp bounds for \(E[y\mid W_{x}^{g}]\).
Proposition 2
\(E_{L}[y\mid W_{x}^{g}]\le E[y\mid W_{x}^{g}]\le E_{U}[y\mid W_{x}^{g}]\), where \(E_{L}[y\mid W_{x}^{g}]=E[y\mid W_{x}^{g},z=1]\) \(P(z=1\mid W_{x}^{g})\) and \(E_{U}[y\mid W_{x}^{g}]=E[y\mid W_{x}^{g},z=1]\) \(P(z=1\mid W_{x}^{g})+P(z=0\mid W_{x}^{g})\).
Proposition 3 shows that it is possible to achieve exact identification of our parameters if and only if some groups in the population have no missing data at all.
Proposition 3
Let \(\theta \equiv (k,b,d,J)\) and \(c\equiv (c_{1},c_{2} ,c_{3},c_{4})\in C\) . Denote also \(h_{i}^{c}=c_{1}+c_{2}^{\prime }X_{i} +c_{3}^{\prime }Y_{g(i)}\) , \(V_{i}^{c,U}=h_{i}^{c}+c_{4}p_{g(i)}^{U}\) and \(V_{i}^{c,L}=h_{i}^{c}+c_{4}p_{g(i)}^{L}\). Let
Then, \(\theta\) is only identified relative to c if and only if \(P[V(c)]>0\).
Now, I characterize the identification region and present a minimum distance estimator for the parameters’ identification set in the presence of missing outcome data. Lemma 1 forms the basis for the estimator. It shows that there is a parametric solution \(\theta\) to the problem that fits within the bounds of the data moments (\(E_{L}[y\mid W_{x}^{g}]\), \(E_{U}[y\mid W_{x}^{g}]\)), and therefore the solution is nonempty. It then characterizes the solution as being a convex interval and with a given expression that can be estimated. The importance of the interval solution being a convex region is important, because it implies that the problem is well behaved and many optimization methods for the econometric estimators only work with convex regions. Convexity implies that, for instance, if \(\theta _{L}\) and \(\theta _{U}\) (with \(\theta _{U}\ge \theta _{L}\)), then any other parameter given by \(\theta ^{*}=\theta _{L}(1\alpha )+\alpha\), with \(\alpha \in \left[ 0,1\right]\), is also a solution. This makes it convenient for optimization methods because it implies that there is a single convex identified interval, with\(\ \theta _{U} \in \left[ \theta _{L},\theta _{U}\right]\). If the identified sets were not convex, then empirical researchers could find several unconnected regions with blanks in between. It would even make it difficult to determine if the entire identified set had been found by the researcher, since there could be more identified regions in other areas.
Lemma 1
The identification region for \(\theta\) is nonempty, convex and equivalent to
where \(F_{\epsilon }(W_{1,i}^{g})=F_{\epsilon }(h_{i}+Jp_{g(i)}^{U} )=F_{\epsilon }(k+b^{\prime }X_{i}+d^{\prime }Y_{g(i)}+Jp_{g(i)}^{U})\) and \(F_{\epsilon }(W_{0,i}^{g})=F_{\epsilon }(h_{i}+Jp_{g(i)}^{L})=F_{\epsilon }(k+b^{\prime }X_{i}+d^{\prime }Y_{g(i)}+Jp_{g(i)}^{L})\) .
Proposition 4 gives the basic finding on identification of parametric regression models. It suggests that the identification region can be found by a minimum distance estimator \(H_{N}(\theta )\), which uses the empirical analogs of the bounds \(E_{L}^{N}[y\mid W_{x,i}^{g}]\) and \(E_{U}^{N}[y\mid W_{x,i} ^{g}]\) in order to find the identified set for \(\theta\). Note that it is relevant for the minimum distance optimizer to search for the parameter values that fit within the intervals given by \(E_{L}^{N}[y\mid W_{x,i}^{g}]\) and \(E_{U}^{N}[y\mid W_{x,i}^{g}]\). In general, the researcher cannot just obtain two estimators given by replacing the missing values with zeros or replacing the missing values with ones, because that could imply establishing that a correlation between \(p_{g(i)}^{U}\) and the other variables (\(X_{i}\) and \(Y_{g(i)}\)) is either very low or very high in order to explain a large amount of zeros and ones. The minimum distance estimator must therefore search for the parameter values until the region that fits the empirical analogs is found.
Proposition 4
A suggested estimator for the identification region \(H(\theta )\) would be
where \(E_{L}^{N}[y\mid W_{x,i}^{g}]\) and \(E_{U}^{N}[y\mid W_{x,i}^{g}]\) are consistent estimators of \(E_{L}[y\mid W_{x,i}^{g}]\) and \(E_{U}[y\mid W_{x,i}^{g}]\) , respectively. It is possible to take into account finitesample error in the estimates of these intervals by using the bootstrap technique described by Imbens and Manski (2004). A similar identification strategy is possible for the semiparametric peer effects model developed in Brock and Durlauf (2007), although such a discontinuous estimator does not have a convenient asymptotic distribution and therefore does not allow to obtain a finitesample confidence interval by using the bootstrap method of Imbens and Manski (2004).
Horowitz and Manski’s estimator for functionals of incomplete data
Let us define a parametric estimator
with \(H_{i}=(X_{i},Y_{g(i)},w_{i})\) being the vector of control variables that are completely observed. Again assume \(y_{i,z=1}\) is observed and \(y_{i,z=0}\) is not observed. The estimator obtained from the true dataset can be expressed as:
The econometrician does not observe the values of \(y_{i,z=0}\), but it knows that each value of y belongs to a finite set \(\Upsilon\) with V elements. One possible estimator can be obtained from one of the possible ways of imputing the missing dataset:
with \(v_{i,z=0}\in \Upsilon\) being a specified value for each possible missing observation i. Since there are V possibilities for each \(y_{i,z=0}\), there are \(MD=V^{N_{z=0}}\) possible datasets that could be validly imputed, with \(N_{z=0}= {\textstyle \sum _{i=1}^{N}} 1(z_{i}=0)\) denoting the number of missing observations. Then, if one computes the estimator \(\theta _{c}\) across all the possible imputations for \(y_{i,z=0}\), the econometrician will find that \(\theta \in \left\{ \theta _{1} ,.,\theta _{c},..,\theta _{MD}\right\}\); therefore, a sharp interval for \(\theta\) can be obtained as:
This sharp interval obtained from the infimum and supremum estimates obtained across all the possible realizations of the dataset can be easily applied to any parametric estimator (Horowitz & Manski 2006), such as the parametric discrete choice model or the linear model of peer effects exposed before. Note that the number of possible estimates MD increases rapidly with just a few missing observations. For instance, in the discrete choice setting (\(V=2\), since y can be 0 or 1), the number of possible datasets would reach \(MD=2^{15}=32768\) possible datasets with just 15 missing observations.
In the appendix, I show that similar econometric approaches can be applied to linear social interactions’ models that have bounded outcomes, assuming that identification is obtained through the observation of nonclosed groups.
Monte Carlo exercises
The simulated models
The discrete choice model of social interactions for simulation \(s=1,..,S\) is obtained as follows. The discrete peer effects model of each simulation s is specified to be a logit, \(\Pr (y_{i}(s)=1)=\Lambda (k+bX_{i}(s)+dY_{g(i)} (s)+Jp_{i,g}(s))\), with \(\Lambda (x)=\frac{\exp (x)}{1+\exp (x)}\). For simplicity, the simulations consider that the coefficients \(\left\{ k,b,d,J\right\}\) are constants for all simulations and that all the groups are the same, that is, \(n_{g(i)}=n_{g}\). Each simulated observation is obtained with the specified set of coefficients: \(k=2\), \(b=1\), \(d=0.5\), \(J=0.5\). The observable control variables \(X_{i}(s)\) and \(Y_{g(i)}(s)\) are simulated as independent pseudostandard normal numbers, with \(X_{i}(s)\) having different values for each individual i in each simulation s and \(Y_{g(i)}(s)\) having different values for each group g in each simulation s. For each simulation s, each observation is then obtained from pseudouniform numbers: \(y_{i}(s)=1(\varepsilon _{i}(s)\le k+bX_{i} (s)+dY_{g(i)}(s)+Jp_{i,g}(s))\), with \(\varepsilon _{i}(s)\) being pseudostandard logistic random numbers with mean 0 and standard deviation \(\pi /\sqrt{3}\).
The exercises consider an alternative with closed groups, with \(p_{i,g} (s)\) given entirely by the endogenous decisions of the members of each group \(g=1,..,G\), and an alternative with nonclosed groups with each individual i reporting a value \(w_{i}(s)\) for the peer effect of the members outside the group. The exercise considers \(w_{i}(s)\) as given by a pseudouniform number independent across i and s. Furthermore, the peer effects of the observed group members are considered in two versions with the first one considering the individual as parts of its own peer group, \(\sum \limits _{j=1,j\in g(i)}^{n}1(y_{j}(s)=1)/n_{g}\), while a second version considers that the individual is not part of its own peer effect \(\sum \limits _{j=1,j\in g(i),j\ne i}^{n}1(y_{j}(s)=1)/(n_{g}1)\). The reason for these two alternatives is that considering the individual as part of its own peer effect introduces an obvious problem, since \(p_{i,g}(s)\) includes \(y_{i}(s)\) which is a function of the unobserved idiosyncratic error \(\varepsilon _{i}(s)\). Therefore, it is likely that estimators that consider individuals as part of their own peer effect should present a bias due to the control variable being correlated with the unobserved error (Wooldridge 2010).
Therefore, the variable \(p_{i,g}(s)\) is implemented in four alternatives:

(i)
Closed groups with individual i as part of his own peer group, \(p_{i,g}(s)=\dfrac{\sum \limits _{j=1,j\in g(i)}^{n}1(y_{j}(s)=1)}{n_{g}}\);

(ii)
Closed groups with individuals excluded from their own peer group, \(p_{i,g}(s)=\dfrac{\sum \limits _{j=1,j\in g(i),j\ne i}^{n}1(y_{j}(s)=1)}{n_{g}1}\);

(iii)
Nonclosed groups with individual i as part of his own peer group and with the outside peer group of the same size as the group g, \(p_{i,g} (s)=\dfrac{n_{g}w_{i}(s)+\sum \limits _{j=1,j\in g(i)}^{n}1(y_{j}(s)=1)}{n_{g}+n_{g}}\);

(iv)
Nonclosed groups with individual i excluded from their own peer group and with the outside peer group of the same size as the group g, \(p_{i,g}(s)=\dfrac{n_{g}w_{i}(s)+\sum \limits _{j=1,j\in g(i),j\ne i} ^{n}1(y_{j}(s)=1)}{n_{g}+n_{g}1}\).
The Monte Carlo exercises consider several combinations of group size with \(n_{g}=5,10,25\) members and several numbers of groups with \(G=50,100,200,500,1000,2500\). The total sample size in terms of individuals is given by \(N=G\times n_{g}\). Since some exercises take a long time (in particular, the Horowitz–Manski estimator takes a longer time than the other with higher values of missing data), all the Monte Carlo exercises are done with just 50 simulations, \(s=1,..,S\), with \(S=50\).
To summarize the results from the Monte Carlo simulations, I denote \(\theta\) as the vector with the true value of the parameters, \(\theta =\left\{ k,b,d,J\right\}\), while \({\hat{\theta }}_{s}\) denotes the estimate obtained in each simulation. The average estimate across all the simulations is obtained as \({\bar{\theta }}=\frac{ {\textstyle \sum _{s}} {\hat{\theta }}_{s}}{S}\). The mean bias is therefore computed as \(\bar{\theta }\theta\), while the standard deviation (STD) is given by \(\sqrt{\frac{ {\textstyle \sum _{s}} ({\hat{\theta }}_{s}{\bar{\theta }})^{2}}{S1}}\) and the mean absolute deviation (MAD) is \(\frac{ {\textstyle \sum _{s}} \left {\hat{\theta }}_{s}\theta \right }{S}\). The mean absolute deviation (MAD) can be a better measure of the small sample performance of the estimators than the standard deviation (STD), especially because it is possible that some estimators have a considerable bias and the bias effect is not part of the standard deviation (STD).
All the Monte Carlo exercises were performed in a notebook with an Intel Core i79750H 2.60GHz, with 24.0 GB of RAM, 6 physical cores and 12 logical processors. The codes were implemented with a Stata 15.1 MP6 software license. All the codes are publicly available in the Mendeley Data repository: https://data.mendeley.com/datasets/zsbxdmhtj9/1.
Calibrating the missing observations
The missing observations are specified in terms of the number of missing outcomes (\(y_{i}(s)\)) in each simulation s. I create independent pseudouniform numbers \(zu_{i}(s)\), and then for each simulation s specify as missing observations those with the m lowest values of \(zu_{i}(s)\). For simplicity, all the other variables are observed (for instance, a variable X for family education or house type could be observed from administrative data), except for the endogenous variable \(y_{i}(s)\). I choose this option instead of a probability, because the Horowitz–Manski estimator would require a number of \(MD=V^{N_{z=0}}\) possible datasets, with V being the possible values of \(y_{i}(s)\) and \(N_{z=0}\) being the number of missing observations. This implies that even a small number of observations such as 15 would reach \(MD=2^{15}=32768\) possible datasets and a very large computational time. For this reason, I prefer to specify the number of missing values, rather than a probability of missing outcomes which would result in a random number of missing outcomes for each simulation. In the case of the logit model, I will show Monte Carlo exercises with 5 and 10 missing values.
Monte Carlo exercises without missing data
This section starts by presenting the results of the Monte Carlo exercises without missing data. Table 1 summarizes the mean bias, standard deviation (STD) and mean absolute deviation (MAD) of the estimated coefficients, excluding the J endogenous effect. I compare the logit model with only contextual group effects (that is, assuming \(J=0\)) with the logit endogenous peer effects model with closed groups (as suggested in Brock and Durlauf (2002)), although with the individuals excluded from their own peer effect. The contextual effects only model can be seen as a more traditional model, since there is no endogenous control variable and no correlation among individuals apart from the observable group effect \(Y_{g(i)}(s)\). The results show that the logit with only contextual effects converges quite quickly to the truth and the estimator presents accurate values even with just 5 members per group and 50 groups (therefore a total sample of 250 observations). However, the logit model with both endogenous and contextual effects also converges somewhat quickly toward the true values of the parameters. The same pattern appears with the logit model with nonclosed groups, which is shown in Table 2.
Table 3 shows how important it is for estimation of the logit endogenous peer effects coefficient (J) excluding the individuals from their own peer group g(i) and whether it is helpful or not to include peers from outside the group (nonclosed groups, which are essential for identification of the linear model). Models 1 and 2 show the case of closed groups, with individuals excluded and included from their own peer group, respectively. Models 3 and 4 show the case of nonclosed groups, with individuals excluded and included from their own peer group, respectively. The results show that it is very important to exclude individuals from their own peer effect in order to estimate J, because models M2 and M4 present large values for the mean bias and mean absolute deviation (MAD), with such values falling slowly as the number of group members increases (the number of group member reduces the effect of the individual in its own group \(\frac{1}{n_{g}}\), besides increasing the sample size) and with the number of groups (which increases the sample size). This shows it is not advisable in practice for empirical researchers to include individuals as part of their own peer group, even if the model is identified in theory. Both M1 and M3 present accurate estimations in the sense that both models exclude individuals from their own peer group. However, M3 also includes peer effects from outside the group \(\left(p_{i,g} (s)=\dfrac{n_{g}w_{i}(s)+\sum \limits _{j=1,j\in g(i),j\ne i}^{n}1(y_{j} (s)=1)}{n_{g}+n_{g}1}\right)\). The Monte Carlo exercise reveals that including peer effects outside of the group (model M3) can increase the mean bias, standard deviation (STD) and mean absolute deviation (MAD) for small sample sizes, such as just 50 groups. However, the model with nonclosed groups (M3) can present a lower bias for larger sample sizes, although with a larger standard deviation. It is only for large sample sizes (group size of 25 members and a number of groups of 500 or more, which implies a sample size equal or bigger than 12,500 observations) that the nonclosed groups model M3 represents a lower mean absolute deviation relative to the closed group model M1. This makes sense, since the additional control variable (the outside peer effects \(w_{i}(s)\)) represents an additional source of identification, but it also increases the dispersion in individual and group outcomes.
Monte Carlo exercises with missing data
Time performance of the interval estimators
This section summarizes the Monte Carlo results of the Horowitz–Manski and Manski–Tamer type of estimators. Table 4 compares the average computational time of each estimator. Note that the 5 missing observations correspond to a very small probability of missing outcomes, ranging from just 0.01% in the large sample cases to a maximum of 2% for the lowest samples. In the case of 7 and 10 missing observations, the corresponding probability of missing outcomes ranges from 0.06% to 2.8% and 0.02% to 4%, respectively. These are very low probabilities of missing data, since it is quite common to find survey datasets with more than 4% of missing data. For the case of just 5 missing observations, the Horowitz–Manski for the logit model takes between 0.7 and 30.6 seconds for the average across all simulations, while the Manski–Tamer type of estimator takes between 0.6 and 19.6 seconds, which can be 50% faster in some cases. For 10 missing observations, the time performance difference among the two estimators grows much larger, with the Horowitz–Manski type of estimator taking between 23 and 966 seconds, while the Manski–Tamer estimator keeps about the same computational time as with just 5 missing observations, with an average time between 0.5 and 19.7 seconds. The conclusion is that the number of combinations required to compute the Horowitz–Manski type of estimator increases exponentially with the number of observations (\(MD=V^{N_{z=0}}\)), while for the Manski–Tamer the calculation remains similar even as the number of missing outcomes increases.
Intervals of the interval estimators
Now, I summarize the mean intervals across all simulations of the Horowitz–Manski and the Manski–Tamer around the true parameter values. Table 5 shows the mean intervals for the case of the logit model with 5 missing values. For the case of the parameters k, b and d, the Horowitz–Manski type of interval estimator almost always contains the true parameter value in its average interval, although the intervals can be large in small samples such as 50 groups. However, for the case of the endogenous peer effects coefficient J, the Horowitz–Manski type of estimator often gives a biased interval that does not contain the true parameter value, as shown for the simulations with group sizes of 10 and 25 for samples with 500 groups or more. The Manski–Tamer always has a larger interval than the Horowitz–Manski, especially for small samples as 50 groups, but this difference becomes quite small for a number of groups of 100 or more. The bounds of the Horowitz–Manski and the Manski–Tamer estimators tend to be reasonably small for samples with 1000 or 2500 groups, although with a significant bias for the J parameter.
It is problematic that in a few cases the bounds of the Horowitz–Manski and Manski–Tamer estimators do not include the true parameter value for the endogenous peer effect parameter J. This happens only for large peer groups (a group size of 10 or 25 members) and only for a large number of groups (500 groups or more). It is not easy to clarify why this inconsistency of the interval estimators is happening, but the previous literature shows three factors that complicate the estimation of discrete choice models, particularly those with correlated observations. One factor is that all the nonlinear models (which includes the logit model) have a certain degree of bias in finite samples and this appears in the Monte Carlo exercises (Wooldridge, 2010). A second factor is that this small sample inconsistency of the discrete choice model is further exacerbated in settings with panel data (Heckman, 1981, Honoré & Tamer, 2006)^{Footnote 1}. A third factor is that the literature shows that misclassification of dependent variables in a discreteresponse model causes inconsistent coefficient estimates (Hausman et al., 1998). This is a very close example to the setting of this paper, since the interval estimators work by trying several possible options for the missing outcomes and the endogenous group averages, which is in effect working with many samples that are misclassified and only a single sample that represents the true outcomes.
It also happens sometimes for the other parameters k, b and d that the lower bound \({\bar{\theta }}_{\min }\) excludes the true parameter value, but the estimated interval is always very close to the true value and only fails to contain the true value by a small amount of 0.01 or less. Therefore, the estimated intervals of both the Horowitz–Manski and Manski–Tamer estimators appear to be valid.
The pattern is similar with 10 missing observations, as summarized in Table 6. The estimated intervals of the Horowitz–Manski and Manski–Tamer approaches tend to contain the true parameter value for the parameters k, b and d, and the intervals—while large with small samples such as 50 groups—tend to fall quickly as the sample sizes grow. The Manski–Tamer approach provides very similar bounds, except for low sample sizes such as 50 groups with a group size of 5 members. All the estimated intervals are bigger than in the case of the 5 missing observations, as expected. For the J parameter of endogenous social interactions, the intervals can be quite big in small sample sizes with just 50 and 100 groups, even for groups with 25 members. It is also found that the estimated intervals do not contain the true J parameter for the cases of samples with 1000 and 2500 groups, although the width of the intervals falls with the sample size. In general, all the estimated intervals are larger with 10 missing values (in Table 6) relative to just 5 missing values (Table 5) as expected, but with bigger differences for the small samples such as 50 and 100 groups.
Conclusions and possible extensions
This paper examines partial inference of the peer effects models in the presence of missing outcome data, with a special focus on the binary choice case. Most peer effects models use the average outcome of each group as an explanatory variable; therefore, missing outcome data imply that we face both a problem of missing outcome values and an undetermined regressor. Having information on the bounds of the outcome variable can, however, help us get partial identification bounds for the parameters (Manski & Tamer, 2002, Horowitz & Manski, 2006). I use this information to obtain identification of a family of parametric binary choice models with peer effects (Brock & Durlauf, 2002; 2007, Blume et al., 2010), although a similar approach can be suggested for the linear peer effects model for the case in which identification can be obtained through nonclosed peer groups. Other extensions of these results can easily be made by including a more general multinomial setting or semiparametric discrete choice peer effect models (Blume et al., 2010).
For the case of bounded variables, sharp bounds can be obtained for all group variables and outcomes by plugging in all possible combination of values of the missing variables (Horowitz & Manski, 2006). This method, however, is computationally difficult to implement, since the number of potential combinations increases exponentially with the number of groups and therefore quickly becomes a heavy computational exercise even for datasets of moderate size. An attractive alternative, however, can be developed by noticing this model has an interval (I), monotonicity (M) and mean independence (MI) properties, which can be summarized jointly as the IMMI assumption. Using these properties, a modified minimum distance (MMD) estimator is presented to obtain nonsharp bounds for the coefficients. While this approach is here suggested as a solution to the binary peer effects case, the same estimator can be easily applied to any parametric model with missing outcomes and interval regressors. In a set of Monte Carlo exercises, I show that the nonsharp bounds obtained through an interval estimator similar to Manski and Tamer (2002) provide results quite similar to the sharp bounds of the Horowitz and Manski (2006) approach, but at a much smaller cost in terms of computational time. The computational time of the Horowitz and Manski (2006) approach increases exponentially with the number of missing observations and can quickly become overwhelming with just 15 missing outcomes, but the nonsharp bounds proposed as an alternative with the IMMI assumption do not increase their computational time with additional missing outcomes and provide a good approximation for the sharp intervals (at least for the calibrated Monte Carlo exercises considered in this article). The Monte Carlo exercises also show that for the binary discrete choice model of peer effects there is not a significantly higher estimation accuracy for the case of nonclosed groups relative to the closed groups case.
The bounds of the interval estimators of peer effects in the specified exercises are still large. This is a case for future econometricians and applied economists to combine further realistic assumptions in order to obtain tighter bounds (Manski, 2003).
Availability of data and materials
The article does not use any source of data.
Notes
While panel data are not the same as peer effects, both cases are examples in which the observations are correlated among themselves through heterogeneity and endogeneity (Heckman, 1981, Honoré & Tamer, 2006). The heterogeneity comes from the random effect for the panel data and the contextual effect for the social interactions model. The endogeneity issue in these models comes from the dynamic effect of previous choices in the case of panel data and the effect of the endogenous peer choices in the social interactions model. Honoré and Tamer (2006) show that the dynamic discrete choice models are hard to identify; therefore, this should explain why the peer effects model is also harder to estimate as the group size grows larger, and therefore the observations become more correlated among themselves.
Abbreviations
 I:

Interval
 IMMI:

Interval, monotonicity, and mean independence
 M:

Monotonicity
 MI:

Mean independence
 MLE:

Maximum likelihood estimator
References
Advani, A., & Malde, B. (2018). Methods to identify linear network models: a review. Swiss Journal of Economics and Statistics, 154, 12.
Ammermueller, A., & Pischke, J. (2009). Peer effects in European primary schools: Evidence from the progress in international reading literacy study. Journal of Labor Economics, 27(3), 15–348.
Bailey, M., D. Johnston, M. Koenen, T. Kuchler, D. Russel, & J. Stroebel (2021). Social Networks shape beliefs and behavior: Evidence from social distancing during the COVID19 pandemic. NBER WP 28234.
Blume, L., Brock, W., Durlauf, S., & Ioannides, Y. (2010). Identification of social interactions. Handbook of Social Economics, 18, 853–964.
Bramoullé, Y., Djebbari, H., & Fortin, B. (2009). Identification of peer effects through social networks. Journal of Econometrics, 150, 41–55.
Brock, W., & Durlauf, S. (2002). A multinomialchoice model of neighborhood effects. American Economic Review, 92(2), 298–303.
Brock, W., & Durlauf, S. (2007). Identification of binary choice models with social interactions. Journal of Econometrics, 140(1), 52–75.
Bruhin, A., Goette, L., Haenni, S., & Jiang, L. (2020). Spillovers of prosocial motivation: Evidence from an intervention study on blood donors. Journal of Health Economics, 70, 102244.
Chernozhukov, V., Hong, H., & Tamer, E. (2007). Estimation and confidence regions for parameter sets in econometric models. Econometrica, 75(5), 1243–1284.
Cipollone, P., & Rosolia, A. (2007). Social interactions in high school: Lessons from an earthquake. American Economic Review, 97(3), 948–965.
Hausman, J. (2001). Mismeasured variables in econometric analysis: Problems from the right and problems from the left. Journal of Economic Perspectives, 15(4), 57–67.
Hausman, J., Abrevaya, J., & ScottMorton, F. (1998). Misclassification of the dependent variable in a discreteresponse setting. Journal of Econometrics, 87(2), 239–269.
Heckman, J. (1981). The incidental parameters problem and the problem of initial conditions in estimating a discrete timediscrete data stochastic process. In: C. F. Manski and D. McFadden (Eds.) Structural analysis of discrete panel data with econometric applications (pp. 179–195).
Honoré, B., & Tamer, E. (2006). Bounds on parameters in panel dynamic discrete choice models. Econometrica, 74(3), 611–629.
Horowitz, J., & Manski, C. (2006). Identification and estimation of statistical functionals using incomplete data. Journal of Econometrics, 132(2), 445–459.
Imbens, G., & Manski, C. (2004). Confidence intervals for partially identified parameters. Econometrica, 72(6), 1845–1857.
Kooreman, P., & Soetevent, A. (2007). A discretechoice model with social interactions: with an application to high school teen behavior. Journal of Applied Econometrics, 22, 599–624.
Krauth, B. (2006). Simulationbased estimation of peer effects. Journal of Econometrics, 133(1), 243–271.
Lalive, R., & Cattaneo, A. (2009). Social interactions and schooling decisions. Review of Economics and Statistics, 91(3), 457–477.
Madeira, C. (2018). Testing the rationality of expectations of qualitative outcomes. Journal of Applied Econometrics, 33(6), 837–852.
Manski, C. (1993). Identification of endogenous social effects: The reflection problem. Review of Economic Studies, 60(3), 531–542.
Manski, C. (2000). Economic analysis of Social Interactions. Journal of Economic Perspectives, 14(3), 115–136.
Manski, C. (2003). Partial Identification of Probability Distributions. Springer Series in Statistics.
Manski, C., & Tamer, E. (2002). Inference on regressions with interval data on a regressor or outcome. Econometrica, 70(2), 519–546.
Roth, A. (2020). How the provision of childcare affects attitudes towards maternal employment. Swiss Journal of Economics and Statistics, 156, 17.
Sacerdote, B. (2001). Peer effects with random assignment: Results for Dartmouth roommates. Quarterly Journal of Economics, 116(2), 681–704.
Slotwinski, M., Stutzer, A., & Uhlig, R. (2019). Are asylum seekers more likely to work with more inclusive labor market access regulations? Swiss Journal of Economics and Statistics, 155, 17.
Sojourner, A. (2013). Identification of peer effects with missing peer data: Evidence from project STAR. Economic Journal, 123(569), 574–605.
Wooldridge, J. (2010). Econometric analysis of cross section and panel data. MIT Press.
Acknowledgements
The article benefited only from comments and suggestions of fellow academic researchers and seminar participants.
Funding
This study was funded by Fundação Calouste Gulbenkian and Fundação para a Ciência e Tecnologia.
Author information
Authors and Affiliations
Contributions
The author Carlos Madeira is the single author of this article and its contributions. The author read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies with animals or humans performed by any of the authors.
Competing interests
The author Carlos Madeira declares that he has no conflict of interest or competing interests regarding this article.
Software and hardware used in the article
All the computational exercises in this work were performed with a Stata MP6 version 15.1 software. The hardware used is an Intel Core i79750H CPU with 6 physical cores (12 virtual processors), 2.60 GHz of speed and 24.0 GB of available physical RAM.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
I would like to express my enormous debt to Elie Tamer, Chuck Manski, Orazio Attanasio, plus seminar participants at Northwestern University and the Econometric Society World Congress. Financial support from Fundação Calouste Gulbenkian is gratefully acknowledged. Comments are welcome at cmadeira@bcentral.cl. All errors are my own.
Appendices
Appendix 1: Proofs
Proof of Propositions 1 and 2
Let v be any intervalvalued variable with \(v\in [v_{0},v_{1} ]\) (Assumption I). Let \(E[y\mid x,v]\) be weakly increasing in v (monotonicity—Assumption M). The law of iterated expectations and assumption mean independence (MI: \(E[y\mid x,v,v_{0},v_{1}]=\) \(E[y\mid x,v]\) ) yield
where the first equality is given by the law of iterated expectations and the second one by Assumption MI.
Assumptions I and M imply that for all constants \(V_{0}\le V_{1}\),
and assumption that \(y\in [y_{L},y_{U}]\) and the law of total probability give when there are missing outcome data,
where
Hence,
To prove the lower bound on \(E[y\mid x,v=V]\) , take any \(V_{1}\le V\) . It follows from A3 and from Assumption M that
Hence, the lower bound holds. To prove sharpness, view the bound as a function of V. This function is weakly increasing in V, so Assumption M holds. The proof of the sharp upper bound uses analogous reasoning. Therefore, we have proved that under Assumption IMMI, we have:
In the absence of other information, these bounds are sharp. Propositions 1 and 2 are just special cases of this result.
Proof for Proposition 3
Assumption IMMI gives us
For a parametric model, this inequality becomes
For the case of missing outcome data, \(E[y\mid x,v_{0},v_{1}]\) is not perfectly observed, but A.3.3 gives us \(E_{L}[y\mid x,v_{0},v_{1}]\le E[y\mid x,v_{0},v_{1}]\le E_{U}[y\mid x,v_{0},v_{1}]\) . A6 and A3 give us \(f(x,v_{0},\gamma )\le E_{U}[y\mid x,v_{0},v_{1}]\) and \(f(x,v_{1},\gamma )\ge E_{L}[y\mid x,v_{0},v_{1}]\) . Therefore, for a parametric model, inequalities A6 and A3 become:
It follows that c is equivalent to \(\gamma\) if and only if \(f(x,v_{0} ,c)\le E_{U}[y\mid x,v_{0},v_{1}]\) \({\displaystyle \bigcup }\) \(f(x,v_{1},c)\ge E_{L}[y\mid x,v_{0},v_{1}]\) , a.e. \((x,v_{0},v_{1})\) . \(\square\)
Proof for Lemma 1
This corollary allows us to characterize the identification region for the case of the monotone index form \(f(x,v,\gamma )=F(x\beta +\delta v)\) in the case of missing outcome data. The identification region of \(\gamma\) in our Proposition 3 is given by \(C^{*}\equiv \{c\in C:P(V(c))=0\}\) , where \(V(c)\equiv [\) \((x,v_{0},v_{1}):f(x,v_{0},c)\le E_{U}[y\mid x,v_{0},v_{1}]\) \({\displaystyle \bigcup }\) \(f(x,v_{1},c)\ge E_{L}[y\mid x,v_{0},v_{1}]\) \(\ ]\), as proved previously.
Let f have the monotoneindex form. Then:

(a)
\(C^{*}\) is nonempty and convex.

(b)
Assume that there exists no proper linear subspace of \(R^{k+1}\) having probability one under P(x, v). Assume that \(P(v_{0}=v_{1})>0\) and \(P(z=1\mid v_{0}=v_{1})=1\). Then, \(C^{*}=\gamma\) .
Proof
(a) The set \(C^{*}\) is nonempty because \(\gamma \in C^{*}\). To prove convexity, observe that the condition \(P(V(c))=0\), identifying \(c=(b,d)\) as a member of \(C^{*}\), holds if and only if
where \(s_{U}(x,v_{0},v_{1})\equiv F^{1}(E_{U}[y\mid x,v_{0},v_{1}])\) and \(s_{L}(x,v_{0},v_{1})\equiv F^{1}(E_{L}[y\mid x,v_{0},v_{1}])\) . Let \(c^{\prime }\) and \(c^{\prime \prime }\) be distinct elements of \(C^{*}\). Then,
Now consider \(c_{\alpha }\equiv \alpha c^{\prime }+(1\alpha )c^{\prime \prime }\) , where \(\alpha \in (0,1)\). It follows from the above that
Hence, \(c_{\alpha }\in C^{*}\) .
(b) Consider the subpopulation with \((v_{0}=v_{1})\). By assumption \(P(v_{0}=v_{1})>0\) and \(P(z=1\mid v_{0}=v_{1})=1\). Hence, \(c\in C^{*}\) must satisfy the inequality \(F(xb+dv)=E_{U}[y\mid x,v_{0},v_{1}]=E_{L}[y\mid x,v_{0},v_{1}]=E[y\mid x,v_{0},v_{1}]\) or equivalently \(xb+dv=F^{1}(E[y\mid x,v_{0},v_{1}])=s(x,v_{0},v_{1})\), a.e. \((v_{0}=v_{1})\) . The support condition on P(x, v) implies that \((\beta ,\delta )\) is the only parameter value that satisfies the equality almost everywhere \((v_{0}=v_{1})\). Hence, \(\gamma\) is identified. \(\square\)
Result (b) is equivalent to saying that we are able to pointidentify the parameters of the social interactions models if there is at least one group with no missing data. This is obviously a very strong to use in practice. Even if there is one or more groups with no missing data, we would need the sample size represented by these groups with no missing data to increase to infinity in order to avoid sampling imprecision in the estimation of the parameters.
Proof for Proposition 4
Let the estimator for the identification region be given by
where
with \(E_{L}^{N}[y\mid x_{i},v_{0i},v_{1i}]\) and \(E_{U}^{N}[y\mid x_{i} ,v_{0i},v_{1i}]\) being consistent estimates of \(E_{L}[y\mid x_{i} ,v_{0i},v_{1i}]\) and \(E_{U}[y\mid x_{i},v_{0i},v_{1i}]\) .
Proof
Manski and Tamer (2002) provide a proof that \(H_{N}(\gamma )\) is a consistent estimator for the identification region \(H(\gamma )\), which remains valid in this case with \(E_{U}^{N}[y\mid x_{i},v_{0i},v_{1i}]\) and \(E_{L} ^{N}[y\mid x_{i},v_{0i},v_{1i}]\) in the place of \(\eta _{N}(x_{i},v_{0i} ,v_{1i})=E^{N}[y\mid x_{i},v_{0i},v_{1i}]\), and therefore the proof is omitted here.
Appendix 2: Monte Carlo simulations for the linear model
Interval estimators for the linear case
For the linear social interactions case, I will assume an identified model in which the peer group is nonclosed:
with all the regressors (\(X_{i}\), \(Y_{g(i)}\), \({\bar{y}}_{i,g}+w_{i}\)) and the unobserved error term \(\epsilon _{i}\) being bounded. \({\bar{y}}_{i,g}\) represents the average outcomes among the peer group in the sample, while \(w_{i}\) represents the average outcomes of other peers of individual i but which are not peers of the other members of group g(i). I assume that both \(\bar{y}_{i,g}\) and \(w_{i}\) are observed (for instance, the individuals could selfreport the average outcomes of their other peers which are not common peers in the group g(i)). Since all the terms \(W_{i}\equiv ({\bar{y}} _{i,g}+w_{i},X_{i},Y_{g(i)},\epsilon _{i})\) are bounded, that is \(W_{i} \in \left[ v_{L},v_{U}\right]\) with \(v_{L},v_{U}\) being multivariate vectors of size \(2+K+Q\), the outcomes are bounded in an interval as well: \(y\in [y_{L},y_{U}]\).
Again assume y is observed when \(z=1\) and not observed when \(z=0\). For simplicity, I assume that \(X_{i}\), \(Y_{g(i)}\) and the outside peer effect of the individual \(w_{i}\) are always observed, but the missing information on some outcomes \(y_{i}\) implies that \({\bar{y}}_{i,g}\) is not pointidentified. Again, I denote \(V_{x}^{g}=(E(y\mid g),w,Y_{g},x)\), \(W_{x}^{g}=(E_{L}(y\mid g),E_{U}(y\mid g),w,Y_{g},x)\), \(W_{0}^{g}=(E_{L}(y\mid g),w,Y_{g},x)\), and \(W_{1}^{g}=(E_{U}(y\mid g),w,Y_{g},x)\). This is similar to the previous definition, which used \(p_{g(i)},p_{g(i)}^{L},p_{g(i)}^{U}\) instead of \(E(y\mid g),E_{L}(y\mid g),E_{U}(y\mid g)\). I also assume the standard location assumption, \(E[\epsilon _{i}\mid ({\bar{y}}_{i,g}+w_{i}),Y_{g},x]=0\).
This linear social interactions model complies with the IMMI assumptions, just like the previously exposed discrete choice model. In particular, the linear social interactions model satisfies: i) the interval assumption (I), because \(y\in [y_{L},y_{U}]\) and \({\bar{y}}_{i,g}\in [y_{L},y_{U}]\) ; ii) the weak monotonicity assumption (M), since \(E[y\mid W_{x^{\prime }}^{g\prime }]\) is weakly increasing in \({\bar{y}}_{i,g}\) due to J being a constant; iii) the mean independence assumption (MI), since \(E[y\mid E(y\mid g),W_{x} ^{g}]=E[y\mid V_{x}^{g}]=k+bX_{i}+dY_{g(i)}+J({\bar{y}}_{i,g}+w_{i})\).
Assumption I) and the law of total probability give us Proposition 2.B: \(E_{L}[y\mid W_{x}^{g}]\le E[y\mid W_{x}^{g}]\le\) \(E_{U}[y\mid W_{x}^{g}]\), where \(E_{L}[y\mid W_{x}^{g}]=E[y\mid W_{x}^{g},z=1]\) \(P(z=1\mid W_{x} ^{g})+y_{L}P(z=0\mid W_{x}^{g})\) and \(E_{U}[y\mid W_{x}^{g}]=E[y\mid W_{x} ^{g},z=1]\) \(P(z=1\mid W_{x}^{g})+y_{U}P(z=0\mid W_{x}^{g})\). This is similar to Proposition 2, which applied \(y_{L}=0\) and \(y_{U}=1\).
Then, by assumptions IMMI we get Proposition 4.B: Let \(G(W_{i})=k+bX_{i} +dY_{g(i)}+J({\bar{y}}_{i,g}+w_{i})\). A suggested estimator for the identification region \(H(\theta )\) would be
where \(E_{L}^{N}[y\mid W_{x,i}^{g}]\) and \(E_{U}^{N}[y\mid W_{x,i}^{g}]\) are consistent estimators of \(E_{L}[y\mid W_{x,i}^{g}]\) and \(E_{U}[y\mid W_{x,i}^{g}]\) , respectively.
Monte Carlo exercises
The linear peer effects model is simulated as follows. For each simulation s, the model is given by \(y_{i}=k+bX_{i}+dY_{g(i)}+J({\bar{y}}_{i,g}+w_{i} )+\epsilon _{i}\). Again, the simulations consider that the coefficients \(\left\{ k,b,d,J\right\}\) are constants for all simulations and that all the groups are the same, that is, \(n_{g(i)}=n_{g}\). Each simulated observation is obtained with the specified set of coefficients: \(k=1.5\), \(b=0.5\), \(d=0.3\), \(J=0.2\). The variables \(X_{i}(s)\), \(Y_{g(i)}(s)\), \(w_{i}(s)\) and \(\varepsilon _{i}(s)\) are simulated as independent pseudostandard normal numbers, with respective supports between \(\left[ 0,4\right]\), \(\left[ 0,1\right]\), \(\left[ 1.25,4.125\right]\) and \(\left[ 0.5,0.5\right]\). The reason why \(w_{i}(s)\) is expressed between 1.25 and 4.125 is to match the same support as the value of \(\sum \limits _{j=1,j\in g(i)}^{n}1(y_{j} (s)=1)/n_{g}\). For the variable \(p_{i,g}(s)\), I implement two alternatives:

(i)
nonclosed groups with individual i as part of his own peer group and with the outside peer group of the same size as the group g, \(p_{i,g} (s)=\dfrac{n_{g}w_{i}(s)+\sum \limits _{j=1,j\in g(i)}^{n}1(y_{j}(s)=1)}{n_{g}+n_{g}}\);

(ii)
nonclosed groups with individual i excluded from their own peer group and with the outside peer group of the same size as the group g, \(p_{i,g}(s)=\dfrac{n_{g}w_{i}(s)+\sum \limits _{j=1,j\in g(i),j\ne i} ^{n}1(y_{j}(s)=1)}{n_{g}+n_{g}1}\).
The reason why the OLS peer effects models do not consider closed groups is due to the wellknown identification problem of including endogenous effects in linear models with closed groups (Manski 1993, Bramoullé et al., 2009), and therefore the peer effects \(w_{i}(s)\) that are specific to each individual i are required for the identification. For the OLS model, I will consider the cases of 5 and 7 missing values, with the number of possible V outcomes being taken from a grid of 5 values: 1.25, 2.0, 2.7, 3.4 and 4.125. Specifying \(V=5\) in the linear case is an approximation, since in fact the outcome y is continuous and would require an infinite number of possible values for each outcome. Therefore, the linear case presents a lower bound for the computational demands of applying the Horowitz–Manski estimator.
Table 7 shows the performance of the linear endogenous peer effects model with nonclosed groups (which is required for the identification). For simplicity, I only present the results with individuals excluded from their own peer effects, since otherwise there could be a significant bias in the estimation due to the correlation between \(y_{i}(s)\) and the unobserved idiosyncratic error \(\varepsilon _{i}(s)\). The results show that there is a very rapid convergence of the OLS estimates for all the coefficients even for sample sizes as small as 50 groups and a small group size of just 5 members. Therefore, in the case of nonclosed groups, the convergence of the OLS estimator is much faster than for the logit model (model M3). Table 8 shows that for the linear model, the Horowitz–Manski type of estimator has an average performance time between 6.4 and 12.6 seconds for 5 missing observations, but this grows to an average time between 102 and 242 seconds with just 7 missing observations. However, the Manski–Tamer type of estimator keeps a similar time performance whether with 5 or 7 missing observations, with an average time between 0.3 and 1.7 seconds.
Finally, Table 9 shows the performance of the Horowitz–Manski and Manski–Tamer estimators for the linear peer effects model. In this case, both interval estimator approaches coincide perfectly, although perhaps this would not be the case with other calibrations or with a higher number of missing outcomes. It is possible that with a larger number of missing values, the interval estimates of the Manski–Tamer approach would be much worse than the sharp bounds of the Horowitz–Manski approach, although the Horowitz–Manski approach would certainly increase enormously its computational time due to the large number of possible missing datasets given by \(MD=V^{N_{z=0}}\). In general, the interval estimates contain the true parameter value for all the coefficients, including the endogenous peer coefficient J. The intervals are somewhat wider when the missing observations increase from 5 to 7, as expected. But the estimated intervals of the linear model fall substantially and become negligible for sample sizes of 200 groups or more. Therefore, the convergence of the intervals is much faster for the linear peer effects than in the discrete choice case. The Monte Carlo exercises show that, even in the case without any missing data, there are significant accuracy problems for estimating linear peer effects models that include the individuals as part of their own peer group, since this creates a problem of an endogenous regressor being correlated with the unobservable error term.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Madeira, C. Partial identification of nonlinear peer effects models with missing data. Swiss J Economics Statistics 158, 15 (2022). https://doi.org/10.1186/s41937022000935
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s41937022000935
Keywords
 Social interactions
 Binary choice
 Partial identification
JEL Classification
 C25
 C31
 Z13