Partial identification of nonlinear peer effects models with missing data

This paper examines inference on social interactions models in the presence of missing data on outcomes. In these models, missing data on outcomes imply an incomplete data problem on both the endogenous variable and the regressors. However, getting a sharp estimate of the partially identified coefficients is computationally difficult. Using a monotonicity property of the peer effects and a mean independence condition of individual decisions on the missing data, I show partial identification results for the binary choice peer effect model. A Monte Carlo exercise then summarizes the computational time and the accuracy performance of the interval estimators under some calibrations.

This work analyzes inference on nonlinear peer effects models in the presence of missing data. There are many situations (for example, drug use, teenage risk profiles, sexual behavior) where respondents might not be willing to reveal their personal experience, creating problems of missing data in the study of social interactions in these settings. Most social interaction studies use the average outcome of each group as an explanatory variable; therefore, missing outcome data imply that we face both a problem of missing outcome values and an undetermined regressor, aggravating the identification problem. It is, therefore, important to extend the robustness of the social interaction estimators to scenarios of missing data.
In the linear case, Manski (1993Manski ( , 2000 showed that it is difficult to distinguish between the effects of endogenous social interactions and the impact of measures of exogenous group quality. Several works analyze identification of peer effects in the linear case (Advani & Malde, 2018, Sojourner, 2013, Ammermueller & Pischke, 2009). These works analyze partial and point identification of the linear peer effects model with missing data on outcomes. Sojourner (2013) shows that if individuals are randomly assigned to each group; then, it is possible to point-identify the true coefficient for the peer effects variable. Ammermueller and Pischke (2009) show that missing data on peers create measurement error for the group variables and using an analysis similar to Hausman (2001) find upper and lower bounds for the true peer effect coefficient of the linear model. The authors then Page 2 of 18 Madeira Swiss Journal of Economics and Statistics (2022) 158:15 apply an instrument for the peer effects variable to obtain point identification. However, several economic decisions such as discrete choices require nonlinear models (Blume et al., 2010). Nonlinear settings for peer effects include smoking behavior (Krauth, 2006), high school truancy, cell phone ownership (Kooreman & Soetevent, 2007) or college life (Sacerdote, 2001). Brock and Durlauf (2007) present a very general model of peer effects in a discrete choice setting, showing that it is possible to identify asymptotically both exogenous and endogenous peer effects under the assumption of random group assignment and no missing data.
I extend the identification results of Durlauf (2002, 2007) to the case of missing outcomes. Using an incomplete data approach proposed by Horowitz and Manski (2006), it is possible to get sharp bounds for the coefficients of this model with missing data, but this method can be time-consuming for larger peer groups. Therefore, I propose an estimator to obtain non-sharp bounds for this model based on Manski and Tamer (2002) interval regressors' approach. My suggested approach extends the interval regressors approach of Manski and Tamer (2002) by showing that it can easily be extended for a case with both interval regressors and missing outcomes. If a discrete choice model verifies three important properties-interval values (I), mean independence (MI), monotonicity (M)-then it is possible to obtain non-sharp bounds for the true coefficients of the model. The interval values (I) regressor assumption is trivially satisfied by discrete choice models with peer effects, since the average of the discrete choices in a peer group is bounded between 0 and 1. The mean independence (MI) is also quite natural in the peer effects model, since it implies that the width of the identification interval does not matter if one conditions on the true value of the average outcome. This assumption appears natural if the agents know the true values of the average choices in their peer groups even if the econometrician only observes the group with some missing data. The third assumption, monotonicity (M), implies that the average outcome of each agent is increasing with the average group outcome. This assumption is trivially satisfied in the parametric discrete choice models, and it can also be consistent with many semi-parametric or nonparametric models. A minimum distance estimator is proposed. I also propose a bootstrap method to estimate confidence intervals for the true coefficients. A similar estimator can be easily applied to any parametric model with missing outcomes and interval regressors.
I then show a set of Monte Carlo exercises with fully observed information to characterize the accuracy of the peer effects estimators even if the identification assumptions are satisfied. The Monte Carlo exercises include a wide range of different group sizes and different sample sizes for both the logit and the linear case. The Monte Carlo simulations include estimators for the cases of closed peer groups (groups in which all members are peers of each other) and non-closed groups (with each individual having peers from outside the group). Furthermore, I consider the case in which the individual is part of his own peer group and the case in which the individual is not part of its own peer group. The linear case is only shown for non-closed groups (which is required for identification, as shown in Bramoullé, Djebbari and Fortin 2009).
I then apply the Manski-Tamer and Horowitz-Manski estimators to the logit peer effects models in the presence of missing outcomes. The results show that the Manski-Tamer estimator can be hundreds of times faster than the Horowitz-Manski estimator even with just a few missing values such as 10 missing outcomes. The computation time of the Horowitz-Manski estimator could be much larger with a few additional missing observations. This work focuses on the case in which missing information on missing outcomes also implies missing information or an interval regressor for the peer effect in order to be clear about this effect. This approach could also be easily generalized to other cases that also include missing control variables for the peer group members and which would also imply missing regressors or interval regressors. The case for other missing regressors would merely imply more combinations of possible datasets for the missing values for the Horowitz and Manski (2006) and additional interval regressors for the Manski and Tamer (2002) approaches suggested in this article.
This article is organized as follows: Section 2 shows how the interval regressors approach of Manski and Tamer (2002) can be easily extended for a case that also has missing outcomes. Section 3 explains the calibration of the Monte Carlo exercises. Section 4 then summarizes the Monte Carlo results in the absence of missing data. The section starts by showing that the exogenous coefficients (given by the constant, exogenous variable affecting individual behavior, contextual effects group variable) have a fast convergence to the true parameter values, whether the model has endogenous peer effects or not. The same simulations show that the endogenous peer effect coefficient has a much slower convergence to its true value, presenting a high bias and standard deviation, even without any missing data. Section 5 shows the Monte Carlo exercises with missing data, analyzing the performance of the Horowitz-Manski and Manski-Tamer approaches. The results show that the Horowitz and Manski (2006) approach presents a considerable computational time. The section also summarizes the Page 3 of 18 Madeira Swiss Journal of Economics and Statistics (2022) 158:15 estimated interval results for all the coefficients, including both the endogenous peer effects parameter and the exogenous coefficients parameters. Finally, Sect. 6 summarizes the main results and an appendix shows the proofs of the main propositions.
2 Identification of discrete choice models with peer effects

A parametric discrete choice model
Let y i ∈ {0, 1} represent individual i outcomes, X i ∈ R K the individual exogenous variables, g = 1, ...., G denotes the groups, and Y g ∈ R Q is the set of exogenous variables for each group. I represent average group behavior as ) is the number of people in agent i's group. Brock and Durlauf (2002) presented a parametric multinomial model of choice in the presence of social interactions, giving conditions for identification in the presence of fully observed data. Individual choice is determined by latent utility, V i = h i + Jp i,g(i) − ǫ i . The term ǫ i represents an idiosyncratic term (such as an individual taste factor) unobserved to the econometrician. ǫ i has a known monotonic parametric distribution, F ǫ (.) . In this specification, h i represents the components of utility affected only by exogenous variables, h i = k + bX i + dY g(i) . The observable group variables Y g(i) for the contextual peer effects, correlated group effects or neighborhood variables (Manski, 1993) can include the mean values of the individual variables of the other group members. One can further specify Y g(i) = 1 n g(i) n j=1,j∈g(i) X j in the case of individuals that are part of their own peer group, or in alternative, Y g(i) = 1 n g(i) − 1 n j=1,j∈g(i),j� =i X j in the case in which the individual i is excluded from its own peer effect. In this model, the probability of choosing a positive outcomes is given by: It is possible that several values of p i,g might solve expression (1) due to multiple equilibria corresponding to selfconsistent behaviors in the population (Brock & Durlauf, 2002). This discrete choice model is quite parsimonious and includes all the main features of peer effects models. The term Y g is usually interpreted as "contextual group effects" (Manski, 1993), meaning the gains each member of the (1) group has due to exogenous characteristics of the group. For example, students of a certain school could be doing well because the school has good facilities and teachers. The term p i,g represents the "endogenous group effect" since it represents the feedback effect that group performance has on each one of its members. In this case, students of a certain school may be more likely to apply for college because the other students are also applying. Brock and Durlauf (2002) show that this model is pointidentified by assuming two conditions: (i) X i , Y g(i) , p i,g are not collinear and Y g has unbounded support; (ii) ǫ i are independent and identical distributed across individuals and are independent of X i and Y g(i) .
The independence assumption of ǫ i can be relaxed when the group g(i) of each individual is not entirely closed (for example, your neighbors have neighbors that are not neighbors of you). In this case, the "peers of your peers" provide extra variation that can be used for identification (Bramoullé et al. 2009). For now, this article will keep the assumption that the peer groups are closed and therefore j ∈ g(i) implies that i ∈ g(j).

Partial identification in the case of missing outcome information
If all the data are observed, one can estimate the parameters by using a maximum likelihood estimator (MLE): Now assume z ∈ {0, 1} determines when y is unobserved or observed. For simplicity, I assume that X i and Y g(i) are always observed, but the missing information on some outcomes y i implies that p i,g is not point-identified, although p i,g can be bounded within a sharp interval. The number of missing observations in the sample is given by (2) Page 4 of 18 Madeira Swiss Journal of Economics and Statistics (2022) 158:15 Let H(θ) be the identified set of θ . Elements of this set can be identified by repeatedly plugging in feasible values for the missing data, a ∈ , and computing the parameters of interest. We can therefore study the set of values consistent with the observed data and the assumed model given any feasible distribution of the missing data. For a specific combination of the missing data values, y z=0 = a ∈ , one can estimate the coefficients . It is possible therefore to obtain H (θ) as Ĥ (θ) ≡ {θ(a), for all a ∈ } . For finite samples, it is possible to obtain confidence intervals for the true coefficient parameters θ by using the bootstrap procedure of Imbens and Manski (2004). Another alternative is to get confidence intervals for the identified set, H (θ) , by using a subsampling procedure described in Chernozhukov et al. (2007).
This plug-in strategy is fairly general and easy to implement. It can essentially work on any model that is proven to be identified and solvable. The difficulty of this approach, however, is that it is computationally demanding. Notice that if Pr(z = 0) > 0 , then the set of alternative values increases exponentially with N. Therefore, as N grows large this approach may require several alternative computations of θ (a) to obtain a good estimate of H(θ).
Here, I show that less computationally intensive strategies are possible. Note that the Brock and Durlauf model is monotonic in the group outcome, p i,g . Also, the regressor p i,g of each individual can be estimated to be inside an interval. Assume for simplicity we keep the group intervals p i,g ∈ [p L i,g , p U i,g ] fixed, but we allow the values of each missing individual outcome y i | z i = 0 vary between {0, 1} . This approach gives us non-sharp bounds for the coefficients θ , since we are not taking into account that changing the individual y i | z i = 0 also has an effect on the interval of p i,g . However, together with the monotonicity property of the peer effects model it is possible to specify a convenient estimator for these non-sharp bounds. For this reason, I generalize the approach of Manski and Tamer (2002) to the case of both missing outcomes and interval regressors.

Using monotonicity and mean independence assumptions
In the discrete choice case, outcomes y are bounded between 0 and 1. This guarantees the interval property (I) of the regressor, p i,g , so we have p i,g ∈ [0, 1] . It is easy to specify sharp bounds for the group average, p i,g = E[y | g(i)] . Let us define p L g(i) = E L [y | g(i)] = E[y | g, z = 1]P(z = 1 | g) and Then, the law of total probability gives us sharp bounds for the value of the group average, p i,g , as expressed in Proposition (1): . I will show that under certain conditions it is possible to obtain a partial interval for Note that the model defined in expression (1) is mean independent of the missing data properties z j . This guarantees the following mean independence (MI) property: , where the first equality is given by expression (1) and the second one is obtained after applying assumption (ii) which specifies ǫ i as independent and identical distributed across individuals and independent of X i and Y g(i) .
The MI assumption is not testable, but seems realistic under many scenarios. If individuals actually observed average group behavior and act using this knowledge, then individual outcomes are not affected by missing data. For example, teenagers may know how many smokers or drug users exist in their group, even if the researcher does not. However, the MI assumption could be invalid if the individuals react more or less to the choices of their unreported peers.
Expression (1) also has the group endogenous effect, J, specified as a constant parameter. Since J is constant and F ǫ (.) is monotonic, this guarantees the monotonicity (M) property of E[y | V g x ] in all its arguments. Therefore, the IMMI (interval, monotonicity, and mean independence property) described in Manski and Tamer (2002) holds for this model.
By applying the law of total probability, we get sharp bounds for E[y | W g x ].
Proposition 2 Proposition 3 shows that it is possible to achieve exact identification of our parameters if and only if some groups in the population have no missing data at all.
and V c,L i = h c i + c 4 p L g(i) . Let Page 5 of 18 Madeira Swiss Journal of Economics and Statistics (2022) 158:15 Then, θ is only identified relative to c if and only if P[V (c)] > 0. Now, I characterize the identification region and present a minimum distance estimator for the parameters' identification set in the presence of missing outcome data. Lemma 1 forms the basis for the estimator. It shows that there is a parametric solution θ to the problem that fits within the bounds of the data moments ( , and therefore the solution is non-empty. It then characterizes the solution as being a convex interval and with a given expression that can be estimated. The importance of the interval solution being a convex region is important, because it implies that the problem is well behaved and many optimization methods for the econometric estimators only work with convex regions. Convexity implies that, for instance, if θ L and θ U (with θ U ≥ θ L ), then any other parameter given by θ * = θ L (1 − α) + α , with α ∈ [0, 1] , is also a solution. This makes it convenient for optimization methods because it implies that there is a single convex identified interval, withθ U ∈ [θ L , θ U ] . If the identified sets were not convex, then empirical researchers could find several unconnected regions with blanks in between. It would even make it difficult to determine if the entire identified set had been found by the researcher, since there could be more identified regions in other areas.

Lemma 1 The identification region for θ is non-empty, convex and equivalent to
Proposition 4 gives the basic finding on identification of parametric regression models. It suggests that the identification region can be found by a minimum distance estimator H N (θ) , which uses the empirical analogs of the bounds ] in order to find the identified set for θ . Note that it is relevant for the minimum distance optimizer to search for the parameter values that fit within the intervals given by ] . In general, the researcher cannot just obtain two estimators given by replacing the missing values with zeros or replacing the missing values with ones, because that could imply establishing that a correlation between p U g(i) and the other variables ( X i and Y g(i) ) is either very low or very high in order to explain a large amount of zeros and ones. The minimum distance estimator must therefore search for the parameter values until the region that fits the empirical analogs is found.

Proposition 4 A suggested estimator for the identification region
respectively. It is possible to take into account finite-sample error in the estimates of these intervals by using the bootstrap technique described by Imbens and Manski (2004). A similar identification strategy is possible for the semi-parametric peer effects model developed in Brock and Durlauf (2007), although such a discontinuous estimator does not have a convenient asymptotic distribution and therefore does not allow to obtain a finite-sample confidence interval by using the bootstrap method of Imbens and Manski (2004).

Horowitz and Manski's estimator for functionals of incomplete data
Let us define a parametric estimator , w i ) being the vector of control variables that are completely observed. Again assume y i,z=1 is observed and y i,z=0 is not observed. The estimator obtained from the true dataset can be expressed as: The econometrician does not observe the values of y i,z=0 , but it knows that each value of y belongs to a finite set ϒ with V elements. One possible estimator can be obtained from one of the possible ways of imputing the missing dataset: with v i,z=0 ∈ ϒ being a specified value for each possible missing observation i. Since there are V possibilities for each y i,z=0 , there are MD = V N z=0 possible datasets that could be validly imputed, with N z=0 = N i=1 1(z i = 0) denoting the number of missing observations. Then, if one computes the estimator θ c across all the possible imputations for y i,z=0 , the econometrician will find that θ ∈ {θ 1 , ., θ c , .., θ MD } ; therefore, a sharp interval for θ can be obtained as: This sharp interval obtained from the infimum and supremum estimates obtained across all the possible realizations of the dataset can be easily applied to any parametric estimator (Horowitz & Manski 2006), such as the parametric discrete choice model or the linear model of peer effects exposed before. Note that the number of possible estimates MD increases rapidly with just a few missing observations. For instance, in the discrete choice setting ( V = 2 , since y can be 0 or 1), the number of possible datasets would reach MD = 2 15 = 32768 possible datasets with just 15 missing observations.
In the appendix, I show that similar econometric approaches can be applied to linear social interactions' models that have bounded outcomes, assuming that identification is obtained through the observation of non-closed groups.

The simulated models
The discrete choice model of social interactions for simulation s = 1, .., S is obtained as follows. The discrete peer effects model of each simulation s is specified to be a logit, 1+exp(x) . For simplicity, the simulations consider that the coefficients k, b, d, J are constants for all simulations and that all the groups are the same, that is, n g(i) = n g . Each simulated observation is obtained with the specified set of coefficients: k = −2 , b = 1 , d = 0.5 , J = 0.5 . The observable control variables X i (s) and Y g(i) (s) are simulated as independent pseudo-standard normal numbers, with X i (s) having different values for each individual i in each simulation s and Y g(i) (s) having different values for each group g in each simulation s. For each simulation s, each observation is then obtained from pseudo-uniform numbers: , with ε i (s) being pseudo-standard logistic random numbers with mean 0 and standard deviation π/ √ 3. The exercises consider an alternative with closed groups, with p i,g (s) given entirely by the endogenous decisions of the members of each group g = 1, .., G , and an alternative with non-closed groups with each individual i reporting a value w i (s) for the peer effect of the members outside the group. The exercise considers w i (s) as given by a pseudo-uniform number independent across i and s. Furthermore, the peer effects of the observed group members are considered in two versions with the first one considering the individual as parts of its own peer group, n j=1,j∈g(i) 1(y j (s) = 1)/n g , while a second version considers that the individual is not part of its own peer effect n j=1,j∈g(i),j� =i 1(y j (s) = 1)/(n g − 1) . The reason for these two alternatives is that considering the individual as part of its own peer effect introduces an obvious problem, since p i,g (s) includes y i (s) which is a function of the unobserved idiosyncratic error ε i (s) . Therefore, it is likely that estimators that consider individuals as part of their own peer effect should present a bias due to the control variable being correlated with the unobserved error (Wooldridge 2010). Therefore, the variable p i,g (s) is implemented in four alternatives: (i) Closed groups with individual i as part of his own peer group, p i,g (s) = n j=1,j∈g(i) 1(y j (s) = 1) n g ; Page 7 of 18 Madeira Swiss Journal of Economics and Statistics (2022) 158:15 (ii) Closed groups with individuals excluded from their own peer group, p i,g (s) = n j=1,j∈g(i),j� =i 1(y j (s) = 1) n g − 1 ; (iii) Non-closed groups with individual i as part of his own peer group and with the outside peer group of the same size as the group g, (iv) Non-closed groups with individual i excluded from their own peer group and with the outside peer group of the same size as the group g, The Monte Carlo exercises consider several combinations of group size with n g = 5, 10, 25 members and several numbers of groups with G = 50, 100, 200, 500, 1000, 2500 . The total sample size in terms of individuals is given by N = G × n g . Since some exercises take a long time (in particular, the Horowitz-Manski estimator takes a longer time than the other with higher values of missing data), all the Monte Carlo exercises are done with just 50 simulations, s = 1, .., S , with S = 50.
To summarize the results from the Monte Carlo simulations, I denote θ as the vector with the true value of the parameters, θ = k, b, d, J , while θ s denotes the estimate obtained in each simulation. The average estimate across all the simulations is obtained as θ = sθ s S . The mean bias is therefore computed as θ − θ , while the standard deviation (STD) is given by . The mean absolute deviation (MAD) can be a better measure of the small sample performance of the estimators than the standard deviation (STD), especially because it is possible that some estimators have a considerable bias and the bias effect is not part of the standard deviation (STD).
All the Monte Carlo exercises were performed in a notebook with an Intel Core i7-9750H 2.60GHz, with 24.0 GB of RAM, 6 physical cores and 12 logical processors. The codes were implemented with a Stata 15.1 MP-6 software license. All the codes are publicly available in the Mendeley Data repository: https:// data. mende ley. com/ datas ets/ zsbxd mhtj9/1.

Calibrating the missing observations
The missing observations are specified in terms of the number of missing outcomes ( y i (s) ) in each simulation s. I create independent pseudo-uniform numbers zu i (s) , and then for each simulation s specify as missing observations those with the m lowest values of zu i (s) . For simplicity, all the other variables are observed (for instance, a variable X for family education or house type could be observed from administrative data), except for the endogenous variable y i (s) . I choose this option instead of a probability, because the Horowitz-Manski estimator would require a number of MD = V N z=0 possible datasets, with V being the possible values of y i (s) and N z=0 being the number of missing observations. This implies that even a small number of observations such as 15 would reach MD = 2 15 = 32768 possible datasets and a very large computational time. For this reason, I prefer to specify the number of missing values, rather than a probability of missing outcomes which would result in a random number of missing outcomes for each simulation. In the case of the logit model, I will show Monte Carlo exercises with 5 and 10 missing values.

Monte Carlo exercises without missing data
This section starts by presenting the results of the Monte Carlo exercises without missing data. Table 1 summarizes the mean bias, standard deviation (STD) and mean absolute deviation (MAD) of the estimated coefficients, excluding the J endogenous effect. I compare the logit model with only contextual group effects (that is, assuming J = 0 ) with the logit endogenous peer effects model with closed groups (as suggested in Brock and Durlauf (2002)), although with the individuals excluded from their own peer effect. The contextual effects only model can be seen as a more traditional model, since there is no endogenous control variable and no correlation among individuals apart from the observable group effect Y g(i) (s) . The results show that the logit with only contextual effects converges quite quickly to the truth and the estimator presents accurate values even with just 5 members per group and 50 groups (therefore a total sample of 250 observations). However, the logit model with both endogenous and contextual effects also converges somewhat quickly toward the true values of the parameters. The same pattern appears with the logit model with non-closed groups, which is shown in Table 2. Table 3 shows how important it is for estimation of the logit endogenous peer effects coefficient (J) excluding the individuals from their own peer group g(i) and whether it is helpful or not to include peers from outside the group (non-closed groups, which are essential for identification of the linear model). Models 1 and 2 show the case of closed groups, with individuals excluded and included from their own peer group, respectively. Models 3 and 4 show the case of non-closed groups, with individuals excluded and included from their own peer group, respectively. The results show that it is very important to exclude individuals from their own peer effect in order to estimate J, because models M2 and M4 present large values for the mean bias and mean absolute deviation (MAD), with such values falling slowly as the number of group members increases (the number of group member reduces the effect of the individual in its own group 1 n g , besides increasing the sample size) and with the number of groups (which increases the sample size). This shows it is not advisable in practice for empirical researchers to include individuals as part of their own peer group, even if the model is identified in theory. Both M1 and M3 present accurate estimations in the sense that both models exclude individuals from their own peer group. However, M3 also includes peer effects from outside the group      p i,g (s) = n g w i (s) + n � j=1,j∈g(i),j� =i 1(y j (s) = 1) The Monte Carlo exercise reveals that including peer effects outside of the group (model M3) can increase the mean bias, standard deviation (STD) and mean absolute deviation (MAD) for small sample sizes, such as just 50 groups. However, the model with non-closed groups (M3) can present a lower bias for larger sample sizes, although with a larger standard deviation. It is only for large sample sizes (group size of 25 members and a number of groups of 500 or more, which implies a sample size equal or bigger than 12,500 observations) that the nonclosed groups model M3 represents a lower mean absolute deviation relative to the closed group model M1. This makes sense, since the additional control variable (the outside peer effects w i (s) ) represents an additional source of identification, but it also increases the dispersion in individual and group outcomes.

Time performance of the interval estimators
This section summarizes the Monte Carlo results of the Horowitz-Manski and Manski-Tamer type of estimators. Table 4 compares the average computational time of each estimator. Note that the 5 missing observations correspond to a very small probability of missing outcomes, ranging from just 0.01% in the large sample cases to a maximum of 2% for the lowest samples. In the case of 7 and 10 missing observations, the corresponding probability of missing outcomes ranges from 0.06% to 2.8% and 0.02% to 4%, respectively. These are very low probabilities of missing data, since it is quite common to find survey datasets with more than 4% of missing data. For the case of just 5 missing observations, the Horowitz-Manski for the logit model takes between 0.7 and 30.6 seconds for the average across all simulations, while the Manski-Tamer type of estimator takes between 0.6 and 19.6 seconds, which can be 50% faster in some cases. For 10 missing observations, the time performance difference among the two estimators grows much larger, with the Horowitz-Manski type of estimator taking between 23 and 966 seconds, while the Manski-Tamer estimator keeps about the same computational time as with just 5 missing observations, with an average time between 0.5 and 19.7 seconds. The conclusion is that the number of combinations required to compute the Horowitz-Manski type of estimator increases exponentially with the number of observations ( MD = V N z=0 ), while for the Manski-Tamer the calculation remains similar even as the number of missing outcomes increases.

Intervals of the interval estimators
Now, I summarize the mean intervals across all simulations of the Horowitz-Manski and the Manski-Tamer around the true parameter values. Table 5 shows the mean intervals for the case of the logit model with 5 missing values. For the case of the parameters k, b and d, the Horowitz-Manski type of interval estimator almost always contains the true parameter value in its average interval, although the intervals can be large in small samples such as 50 groups. However, for the case of the endogenous peer effects coefficient J, the Horowitz-Manski type of estimator often gives a biased interval that does not contain the true parameter value, as shown for the simulations with group sizes of 10 and 25 for samples with 500 groups or more. The Manski-Tamer always has a larger interval than the Horowitz-Manski, especially for small samples as 50 groups, but this difference becomes quite small for a number of groups of 100 or more. The bounds of the Horowitz-Manski and the Manski-Tamer estimators tend to be reasonably small for samples with 1000 or 2500 groups, although with a significant bias for the J parameter. It is problematic that in a few cases the bounds of the Horowitz-Manski and Manski-Tamer estimators do not include the true parameter value for the endogenous peer effect parameter J. This happens only for large peer groups (a group size of 10 or 25 members) and only for a large number of groups (500 groups or more). It is not easy to clarify why this inconsistency of the interval estimators is happening, but the previous literature shows three factors that complicate the estimation of discrete choice models, particularly those with correlated observations. One factor is that all the nonlinear models (which includes the logit model) have a certain degree of bias in finite samples and this appears in the Monte Carlo exercises (Wooldridge, 2010). A second factor is that this small sample inconsistency of the discrete choice model is further exacerbated in settings with panel data (Heckman, 1981, Honoré & Tamer, 2006 1 . A third factor is that the literature shows that misclassification of dependent variables in a discrete-response model causes inconsistent coefficient estimates (Hausman et al., 1998). This is a very close example to the setting of this paper, since the interval estimators work by trying several possible options for the missing outcomes and the endogenous group averages, which is in effect working with many samples that are misclassified and only a single sample that represents the true outcomes.
It also happens sometimes for the other parameters k, b and d that the lower bound θ min excludes the true parameter value, but the estimated interval is always very close Table 3 Bias, standard deviation and mean absolute deviation of the estimates for the endogenous effects coefficient (J) of the logit endogenous social interactions model Model 1: Closed groups, with individuals excluded from their own peer group. Model 2: Closed groups, with individuals as part of their own peer group. Model 3: Nonclosed groups, with individuals excluded from their own peer group. Model 4: Non-closed groups, with individuals as part of their own peer group. 50 Monte Carlo simulations

Group size
No. of groups Bias: θ − θ(θ = sθ s S ) STD:  Economics and Statistics (2022) 158:15 to the true value and only fails to contain the true value by a small amount of 0.01 or less. Therefore, the estimated intervals of both the Horowitz-Manski and Manski-Tamer estimators appear to be valid.
The pattern is similar with 10 missing observations, as summarized in Table 6. The estimated intervals of the Horowitz-Manski and Manski-Tamer approaches tend to contain the true parameter value for the parameters k, b and d, and the intervals-while large with small samples such as 50 groups-tend to fall quickly as the sample sizes grow. The Manski-Tamer approach provides very similar bounds, except for low sample sizes such as 50 groups with a group size of 5 members. All the estimated intervals are bigger than in the case of the 5 missing observations, as expected. For the J parameter of endogenous social interactions, the intervals can be quite big in small sample sizes with just 50 and 100 groups, even for groups with 25 members. It is also found that the estimated intervals do not contain the true J parameter for the cases of samples with 1000 and 2500 groups, although the width of the intervals falls with the sample size. In general, all the estimated intervals are larger with 10 missing values (in Table 6) relative to just 5 missing values (Table 5) as expected, but with bigger differences for the small samples such as 50 and 100 groups.

Conclusions and possible extensions
This paper examines partial inference of the peer effects models in the presence of missing outcome data, with a special focus on the binary choice case. Most peer effects models use the average outcome of each group as an explanatory variable; therefore, missing outcome data imply that we face both a problem of missing outcome values and an undetermined regressor. Having information on the bounds of the outcome variable can, however, help us get partial identification bounds for the parameters (Manski & Tamer, 2002, Horowitz & Manski, 2006. I use this information to obtain identification of a family of parametric binary choice models with peer effects (Brock & Durlauf, 2002;, Blume et al., 2010, although a similar approach can be suggested for the linear peer effects model for the case in which identification can be obtained through non-closed peer groups. Other extensions of these results can easily be made by including a more general multinomial setting or semi-parametric discrete choice peer effect models (Blume et al., 2010). Page 12 of 18 Madeira Swiss Journal of Economics and Statistics (2022) 158:15 For the case of bounded variables, sharp bounds can be obtained for all group variables and outcomes by plugging in all possible combination of values of the missing variables (Horowitz & Manski, 2006). This method, however, is computationally difficult to implement, since the number of potential combinations increases exponentially with the number of groups and therefore quickly becomes a heavy computational exercise even for datasets of moderate size. An attractive alternative, however, can be developed by noticing this model has an interval (I), monotonicity (M) and mean independence (MI) properties, which can be summarized jointly as the IMMI assumption. Using these properties, a modified minimum distance (MMD) estimator is presented to obtain non-sharp bounds for the coefficients. While this approach is here suggested as a solution to the binary peer effects case, the same estimator can be easily applied to any parametric model with missing outcomes and interval regressors. In a set of Monte Carlo exercises, I show that the non-sharp bounds obtained through an interval estimator similar to Manski and Tamer (2002) provide results quite similar to the sharp bounds of the Horowitz and Manski (2006) approach, but at a much smaller cost in terms of computational time. The computational time of the Horowitz and Manski (2006) approach increases exponentially with the number of missing observations and can quickly become overwhelming with just 15 missing outcomes, but the non-sharp bounds proposed as an alternative with the IMMI assumption do not increase their computational time with additional missing outcomes and provide a good approximation for the sharp intervals (at least for the calibrated Monte Carlo exercises considered in this article). The Monte Carlo exercises also show that for the binary discrete choice model of peer effects there is not a significantly higher estimation accuracy for the case of non-closed groups relative to the closed groups case.
The bounds of the interval estimators of peer effects in the specified exercises are still large. This is a case for future econometricians and applied economists to combine further realistic assumptions in order to obtain tighter bounds (Manski, 2003).

Proof of Propositions 1 and 2
Let v be any interval-valued variable with v ∈ [v 0 , v 1 ] (Assumption I). Let E[y | x, v] be weakly increasing in v (monotonicity-Assumption M). The law of iterated expectations and assumption mean independence (MI: Table 5 Minimum and maximum bounds around the true coefficients of the Horowitz-Manski and Manski-Tamer estimators of the logit endogenous social interactions model with non-closed groups and individuals excluded from their own peer group 50 Monte Carlo simulations, 5 missing values in each simulation Page 13 of 18 Madeira Swiss Journal of Economics and Statistics (2022) 158:15 where the first equality is given by the law of iterated expectations and the second one by Assumption MI. Assumptions I and M imply that for all constants V 0 ≤ V 1 , and assumption that y ∈ [y L , y U ] and the law of total probability give when there are missing outcome data,

Group size No. of groups HM MT
where Hence, To prove the lower bound on E[y | x, v = V ] , take any V 1 ≤ V . It follows from A3 and from Assumption M that   Economics and Statistics (2022) 158:15 Hence, the lower bound holds. To prove sharpness, view the bound as a function of V. This function is weakly increasing in V, so Assumption M holds. The proof of the sharp upper bound uses analogous reasoning. Therefore, we have proved that under Assumption IMMI, we have:

Group size No. of groups HM MT
In the absence of other information, these bounds are sharp. Propositions 1 and 2 are just special cases of this result.

Proof for Proposition 3
Assumption IMMI gives us For a parametric model, this inequality becomes For the case of missing outcome data, E[y | x, v 0 , v 1 ] is not perfectly observed, but A.3.3 gives us . Therefore, for a parametric model, inequalities A6 and A3 become:

Proof for Lemma 1
This corollary allows us to characterize the identification region for the case of the monotone index form f (x, v, γ ) = F (xβ + δv) in the case of missing outcome data. The identification region of γ in our Proposition 3 is given by C , as proved previously.
Let f have the monotone-index form. Then: (a) C * is non-empty and convex.
Result (b) is equivalent to saying that we are able to pointidentify the parameters of the social interactions models if there is at least one group with no missing data. This is obviously a very strong to use in practice. Even if there is one or more groups with no missing data, we would need the sample size represented by these groups with no missing data to increase to infinity in order to avoid sampling imprecision in the estimation of the parameters.

Proof for Proposition 4
Let the estimator for the identification region be given by Page 15 of 18 Madeira Swiss Journal of Economics and Statistics (2022) 158:15 where Proof Manski and Tamer (2002) provide a proof that H N (γ ) is a consistent estimator for the identification region H(γ ) , which remains valid in this case with , and therefore the proof is omitted here.

Interval estimators for the linear case
For the linear social interactions case, I will assume an identified model in which the peer group is non-closed: with all the regressors ( X i , Y g(i) , ȳ i,g + w i ) and the unobserved error term ǫ i being bounded. ȳ i,g represents the average outcomes among the peer group in the sample, while w i represents the average outcomes of other peers of individual i but which are not peers of the other members of group g(i). I assume that both ȳ i,g and w i are observed (for instance, the individuals could selfreport the average outcomes of their other peers which are not common peers in the group g(i)). Since all the terms W i ≡ (ȳ i,g + w i , X i , Y g(i) , ǫ i ) are bounded, that is (B.1) y i = k + bX i + dY g(i) + J (ȳ i,g + w i ) + ǫ i , 2 + K + Q , the outcomes are bounded in an interval as well: y ∈ [y L , y U ].
Again assume y is observed when z = 1 and not observed when z = 0 . For simplicity, I assume that X i , Y g(i) and the outside peer effect of the individual w i are always observed, but the missing information on some outcomes y i implies that ȳ i,g is not point-identified. Again, , . This is similar to the previous definition, which used p g(i) , p L g(i) , p U g(i) instead of E(y | g), E L (y | g), E U (y | g) . I also assume the standard location assumption, This linear social interactions model complies with the IMMI assumptions, just like the previously exposed discrete choice model. In particular, the linear social interactions model satisfies: i) the interval assumption (I), because y ∈ [y L , y U ] and ȳ i,g ∈ [y L , y U ] ; ii) the weak monotonicity assumption (M), since E[y | W g′ x ′ ] is weakly increasing in ȳ i,g due to J being a constant; iii) the mean independence assumption (MI), since Assumption I) and the law of total probability give us Proposition 2.B: where . This is similar to Proposition 2, which applied y L = 0 and y U = 1.
Then, by assumptions IMMI we get Proposition 4.B: Let G(W i ) = k + bX i + dY g(i) + J (ȳ i,g + w i

Monte Carlo exercises
The linear peer effects model is simulated as follows. For each simulation s, the model is given by y i = k + bX i + dY g(i) + J (ȳ i,g + w i ) + ǫ i . Again, the simulations consider that the coefficients k, b, d, J are constants for all simulations and that all the groups are the same, that is, n g(i) = n g . Each simulated observation is obtained with the specified set of coefficients: k = 1.5 , b = 0.5 , d = 0.3 , J = 0.2 . The variables X i (s) , Y g(i) (s) , w i (s) and ε i (s) are simulated as independent The reason why the OLS peer effects models do not consider closed groups is due to the well-known identification problem of including endogenous effects in linear models with closed groups (Manski 1993, Bramoullé et al., 2009, and therefore the peer effects w i (s) that are specific to each individual i are required for the identification. For the OLS model, I will consider the cases of 5 and 7 missing values, with the number of possible V outcomes being taken from a grid of 5 values: 1.25, 2.0, 2.7, 3.4 and 4.125. Specifying V = 5 in the linear case is an approximation, since in fact the outcome y is continuous and would require an infinite number of possible values for each outcome. Therefore, the linear case presents a lower bound for the computational demands of applying the Horowitz-Manski estimator. Table 7 shows the performance of the linear endogenous peer effects model with non-closed groups (which is required for the identification). For simplicity, I only present the results with individuals excluded from their  Page 17 of 18 Madeira Swiss Journal of Economics and Statistics (2022) 158:15 own peer effects, since otherwise there could be a significant bias in the estimation due to the correlation between y i (s) and the unobserved idiosyncratic error ε i (s) . The results show that there is a very rapid convergence of the OLS estimates for all the coefficients even for sample sizes as small as 50 groups and a small group size of just 5 members. Therefore, in the case of non-closed groups, the convergence of the OLS estimator is much faster than for the logit model (model M3). Table 8 shows that for the linear model, the Horowitz-Manski type of estimator has an average performance time between 6.4 and 12.6 seconds for 5 missing observations, but this grows to an average time between 102 and 242 seconds with just 7 missing observations. However, the Manski-Tamer type of estimator keeps a similar time performance whether with 5 or 7 missing observations, with an average time between 0.3 and 1.7 seconds.
Finally, Table 9 shows the performance of the Horowitz-Manski and Manski-Tamer estimators for the linear peer effects model. In this case, both interval estimator approaches coincide perfectly, although perhaps this would not be the case with other calibrations or with a higher number of missing outcomes. It is possible that with a larger number of missing values, the interval estimates of the Manski-Tamer approach would be much worse than the sharp bounds of the Horowitz-Manski approach, although the Horowitz-Manski approach would certainly increase enormously its computational time due to the large number of possible missing datasets given by MD = V N z=0 . In general, the interval estimates contain the true parameter value for all the coefficients, including the endogenous peer coefficient J. The intervals are somewhat wider when the missing observations increase from 5 to 7, as expected. But the estimated intervals of the linear model fall substantially and become