Balancing, Regression, Difference-In-Differences and Synthetic Control Methods: A Synthesis

Please download to get full document.

View again

of 27
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Maps

Published:

Views: 0 | Pages: 27

Extension: PDF | Download: 0

Share
Related documents
Description
Balancing, Regression, arxiv: v2 [stat.ap] 20 Sep 2017 Difference-In-Differences and Synthetic Control Methods: A Synthesis Nikolay Doudchenko Guido W. Imbens Current version September 2017 Abstract
Transcript
Balancing, Regression, arxiv: v2 [stat.ap] 20 Sep 2017 Difference-In-Differences and Synthetic Control Methods: A Synthesis Nikolay Doudchenko Guido W. Imbens Current version September 2017 Abstract In a seminal paper Abadie, Diamond, and Hainmueller [2010] (ADH), see also Abadie and Gardeazabal [2003], Abadie et al. [2014], develop the synthetic control procedure for estimating theeffect of atreatment, inthepresenceof asingle treated unitandanumberof control units, with pre-treatment outcomes observed for all units. The method constructs a set of weights such that selected covariates and pre-treatment outcomes of the treated unit are approximately matched by a weighted average of control units (the synthetic control). The weights are restricted to be nonnegative and sum to one, which is important because it allows the procedure to obtain unique weights even when the number of lagged outcomes is modest relative to the number of control units, a common setting in applications. In the current paper we propose a generalization that allows the weights to be negative, and their sum to differ from one, and that allows for a permanent additive difference between the treated unit and the controls, similar to difference-in-difference procedures. The weights directly minimize the distance between the lagged outcomes for the treated and the control units, using regularization methods to deal with a potentially large number of possible control units. Keywords: comparative study, synthetic control, difference-in-differences, matching, balancing, regularized regression, elastic net, best subset selection We are grateful for comments by seminar participants at the California Econometrics Conference and Tim Squires. Graduate School of Business, Stanford University, Professor of Economics, Graduate School of Business, Stanford University, SIEPR, and NBER, 1 Introduction We consider the problem of estimating the causal effect of an intervention in a panel data setting, where we observe the outcome of interest for a number of treated units (possibly only a single one), and a number of control units, for a number of periods prior to the receipt of the treatment, and for a number of periods after the receipt of the treatment. Two aspects of the problem make this different from standard analyses of causal effects using matching approaches (see Imbens and Wooldridge [2009] for a recent review). First, the key variables on which we try to match treated and control units are pre-treatment outcomes rather than qualitatively different characteristics. Second, in social science applications the setting is often one where the number of control units, as well as the number of pre-treatment periods for which we observe outcomes are modest, and of similar magnitude. In fact, a substantial number of applications has only a single, or very few, control units so that estimators motivated by consistency arguments that rely on a large number of control units can have poor properties. Many of the modern methods researchers have used in this setting can be divided into three broad groups. First, difference-in-differences (DID) methods (e.g., Ashenfelter and Card [1985], Card [1990], Meyer et al. [1995], Abadie [2005], Bertrand et al. [2004]) where the difference in average pre-treatment outcomes between treated and control units is subtracted from the difference in average post-treatment outcomes between treated and control units, with generalizations to multiple factor structures in Xu [2015] and Gobillon and Magnac [2013]. Second, matching methods where, for each treated unit, one or more matches are found among the controls, based on both pre-treatment outcomes and other covariates (e.g., Abadie and Imbens [2006], Diamond and Sekhon [2013], Rubin [2006], Heckman et al. [1997, 1998]). Third, synthetic control (SC) methods (Abadie and Gardeazabal [2003], Abadie et al. [2010, 2014], Hainmueller [2012]), where for each treated unit a synthetic control is constructed as a weighted average of control units such that the weighted averages matches pre-treatment outcomes and covariates for the treated units. In this paper we develop new methods for this setting. We make two specific contributions. First, we develop a framework that nests many of the existing approaches. In this framework we characterize the estimated counterfactual outcome for the treated unit as a linear combination of outcomes for the control units. This framework allows researchers to contrast the critical assumptions underlying the previously proposed methods. Substantive differences between ap- 1 plications, and differences in the data configurations may make some methods more appropriate in some cases than in others. For example, a key difference between DID on the one hand, and matching and SC approaches on the other hand, is that the DID approach allows for a non-zero intercept in this linear representation, corresponding to a permanent additive difference between the treatment and control units. Such an additive difference is often found to be important in empirical work. Second, DID methods restrict the weights on the control units to be equal, whereas matching and SC methods allow variation in weights to capture the notion that some control units make better matches for the treated unit than others. Furthermore, many of the current methods, including DID, matching (other than kernel matching with higher order kernels) and SC impose non-negativity of the weights. All restrict the weights to sum to one. In a second contribution we propose a new estimator that relaxes a number of the restrictions specific to, or common among, the DID, matching, and SC methods. Generalizing DID methods we allow the weights to vary. Generalizing SC and matching methods we allow for permanent additive effects. Generalizing DID, SC and nearest neighbor matching methods we allow the weights to be non-negative and do not restrict the weights to sum to one. Our proposed method can accommodate cases with many or few controls, and with many or few pre-treatment periods. In the latter case there is a need for regularization or shrinkage, although standard L 1 (lasso) type shrinkage towards zero is not necessarily appropriate in general, and in particular if we wish to impose a restriction on the sum of the weights. Specifically we recommend an approximate balancing method with an elastic net penalty term for the weights. We illustrate the proposed methods using three data sets used previously in this literature. 2 Notation We consider a panel data setting in which there are N + 1 cross-sectional units observed in time periods t = 1,...,T. There is a subset, possibly containing only a single unit, of treated units. For ease of exposition we focus on the case with a single treated unit, unit 0. From period T 0 onwards, for 1 T 0 T, this unit receives the treatment of interest. Using the potential outcome or Rubin Causal Model set up (Rubin [1974], Holland [1986], Imbens and Rubin [2015]), there are for the treated unit, in each of the periods t = T 0 through t = T a 2 pair of potential outcomes Y 0,t (0) and Y 0,t (1), corresponding to the outcome given the control and active treatment respectively. The causal effects for this unit for each time period are τ 0,t = Y 0,t (1) Y 0,t (0), for t = T 0 +1,...,T. Units i = 1,...,N are control units which do not receive the treatment in any of the time periods. For these units there is a control outcome Y i,t (0), but not necessarily a treated potential outcome. In many examples conceptualizing a treated outcome for the control units can be difficult for exampe, in one of the canonical synthetic control applications to the German re-unification, it is difficult to imagine the treated (re-unified with East Germany) state for countries other than Germany and we do not need to do so. The treatment received is denoted by W i,t, satisfying: 1 if i = 0, and t {T 0 +1,...,T}, W i,t = 0 otherwise. We are interested in the treatment effects for the unit who receives the treatment, during the period this unit receives the treatment, that is, τ 0,t, for t = T 0 +1,...,T. Y obs i,t : The researcher observes, for unit i in period t, the treatment W i,t and the realized outcome, Yi,t obs = Y i,t (W i,t ) = Y i,t (0) if W i,t = 0, Y i,t (1) if W i,t = 1. The researcher may also observe M time-invariant individual-level characteristics X i,1,...,x i,m for all units. In the following discussion we denote by X i the M 1 column vector (X i,1,...,x i,m ), for i = 0,...,N. This vector may also include some of the lagged outcomes, Y obs i,t, in periods t T 0. We denote by X c the N M matrix with the (i,m) th entry equal to X i,m, for i = 1,...,N and m = 1,...,M, excluding the treated unit, and denote by X t a M-row vector with the m th entry equal to X 0,m, and finally X = (X c,x t ). Similarly, for the outcome, Y obs i denotes the T 1 vector (Yi,T obs obs,...,yi,1 ). In addition Yc,pre obs denotes the N T 0 matrix with the (i,t)th entry equal to Yi,T obs obs 0 t+1, again excluding the treated unit, Yt,pre denotes a T 0 -vector with the t-th entry equal to Y obs 0,t, and similarly for Y obs c,post and Y obs t,post for the post-treatment period. The elements of 3 the three matrices Y obs c,post, Y obs t,pre, and Y obs c,pre consist of observations of the control outcome Y i,t (0), and Y obs t,post consists of observations of the treated outcome Y i,t (1). Combining these matrices we have Y obs = Y obs t,post Y obs c,post Y obs t,pre Y obs c,pre = Y t,post(1) Y c,post (0) Y t,pre (0) Y c,pre (0), and X = ( X t X c ). The causal effect of interest depends on the pair of matrices Y t,post (1) and Y t,post (0). The former is observed, but the latter is not. Putting aside for the moment the presence of covariates, the question is how to use the three different sets of control outcomes, Y c,post (0), Y t,pre (0), and Y c,pre (0), and specifically how to model their joint relation with the unobserved Y t,post (0) in order to impute the latter: Y (0) =? Y c,post (0). Y t,pre (0) Y c,pre (0) One approach is to model the relationship between Y t,pre (0) and Y c,pre (0), and assume that thisrelationisthesameasthatbetweeny t,post (0)andY c,post (0). Thisiswherethecurrentsetting is fundamentally different from that where the pre-treatment variables are fixed characteristics rather than pre-treatment outcomes: modelling the relation between covariates for the treated unit and the control units would not necessarily translate into a prediction for the post-treatment outcome for the treated unit given post-treatment outcomes for the control units. An alternative approach is to model the relationship between Y c,post (0) and Y c,pre (0), and assume that this relation is the same as that between Y t,post (0) and Y t,pre (0). To put the problem, as well as the estimators that we discuss in this paper in context, it is useful to bear in mind the relative magnitude of the different dimensions, the number of control units N and the number of pre-treatment periods T 0. Part of the motivation to pursue one particular identification strategy, rather than a different one, may be the relative magnitude of the different components of Y obs, and the corresponding ability, or lack thereof, to precisely estimate their relationship. Put differently, depending on these relative magnitudes there may be a need for regularization in the estimation strategy and a more compelling case to impose restrictions that are typically viewed as approximations at best. 4 Sometimes we have few pre-treatment time periods but relatively many control units, N T 0, e.g., Y (0) =? Y 1,3 (0) Y 2,3 (0) Y 3,3 (0) Y 4,3 (0)... Y N,3 (0) Y 0,2 (0) Y 1,2 (0) Y 2,2 (0) Y 3,2 (0) Y 4,2 (0)... Y N,2 (0). Y 0,1 (0) Y 1,1 (0) Y 2,1 (0) Y 3,1 (0) Y 4,1 (0)... Y N,1 (0) In this case it is difficult to estimate precisely the dependence structure between Y t,pre (0) and Y c,pre (0), relativetothedependencebetweeny c,post (0)andY c,pre (0). Inthiscasesimplematching methods are attractive. Matching methods suggest looking for one or more controls that are each similar to the treated unit. With T 0 small, there are few dimensions where the units need to be similar, and with N large, we have a large reservoir of controls to draw from. Other times the researcher may have relatively many pre-treatment periods but few control units, T 0 N, e.g., Y (0) =? Y 1,T0 +1(0) Y 2,T0 +1(0) Y 0,T0 (0) Y 1,T0 (0) Y 2,T0 (0) Y 0,T0 1(0) Y 1,T0 1(0) Y 2,T0 1(0).... Y 0,2 (0) Y 1,2 (0) Y 2,2 (0) Y 0,1 (0) Y 1,1 (0) Y 2,1 (0) In this case there is little chance of finding a control unit among the small reservoir of controls that is similar to the treated unit in all of the many pre-treatment periods. Instead it may be easier to estimate precisely the dependence structure between Y t,pre (0) andy c,pre (0), for example using an autoregressive model. This may motivate time-series approaches as in Brodersen et al. [2015] and von Brzeski et al. [2015]. In other cases the magnitudes of the cross-section and time series dimension may be similar, T 0 N. In that case the choice between strategies may be more difficult, and a regularization strategy for limiting the number of control units that enter into the estimation of Y 0,T0 +1(0) may 5 be crucial: Y (0) =? Y 1,T0 +1(0) Y 2,T0 +1(0)... Y N,T0 +1(0) Y 0,T0 (0) Y 1,T0 (0) Y 2,T0 (0)... Y N,T0 (0) Y 0,2 (0) Y 1,2 (0) Y 2,2 (0)... Y N,2 (0) Y 0,1 (0) Y 1,1 (0) Y 2,1 (0)... Y N,1 (0). Finding a cross-section unit that is similar to the treated unit in all of the pre-treatment periods is again likely to be difficult in this case. It may be easier to find a linear combination of controls that is similar to the treated unit in all pre-treatment outcomes. As a result combinations of cross-section approaches as in the traditional DID literature (e.g. Ashenfelter and Card [1985], Card [1990], Card and Krueger [1994], Meyer et al. [1995], Angrist and Krueger [2000], Bertrand et al. [2004], Imai and Kim [2016], Angrist and Pischke [2008], Athey and Imbens [2006]), and time-series approach as in Brodersen et al. [2015] and von Brzeski et al. [2015] may be useful, but some type of regularization may be called for. 3 Four Leading Applications To frame the discussion of the estimators discussed in Sections 4 and 5, let us briefly review four influential applications from the DID and SC literatures. In particular we wish to give a sense of the relative magnitudes of the control sample size N and the number of pre-intervention periods T 0, to make the point that primarily relying on consistency under large N or large T 0 may not be credible. 3.1 The Mariel Boatlift Study One of the classic applications of DID methods is the Mariel Boatlift study by Card [1990]. Card studies the effect of the influx of low-skilled labor into the Miami labor market on wages using data on labor markets in other metropolitan areas for comparison. Recently this study has been revisited using synthetic control methdos in Peri and Yasenov [2015]. The Peri and Yasenov [2015] study uses a single treated unit, N = 44 potential control units, T 0 = 7 pre-treatment 6 periods and T 1 = 6 post-treatment periods. 3.2 The New-Jersey Pennsylvania Minimum Wage Study In the seminal Card and Krueger [1994] study, the focus is on the causal effect of a change in the minimum wage in New Jersey. Card and Krueger use data from fast food restaurants in New Jersey and Pennsylvania. They use information on N = 78 control (Pennsylvania) units, 321 treated (new Jersey) units, one pre-treatment period, T 0 = 1, and one post treatment period, T 1 = The California Smoking Legislation Study In the seminal study on SC methods, Abadie et al. [2010] focus on estimating the effect of antismoking legislation in California. It uses smoking per capita as the outcome and uses a single treated unit (California) and N = 29 states without such anti-smoking measures as the set of potential controls. Abadie et al. [2010] use information on T 0 = 17 pre-program years and data on T 1 = 13 post-program years. 3.4 The German Re-Unification Study In another classic SC application, Abadie et al. [2014] study the effect on per capita Gross Domestic Product in West-Germany of the re-unification with East Germany. They use a single treated unit (West-Germany), N = 16 countries as potential controls and use T 0 = 30 years of data prior to re-unification and T 1 = 14 years of data post re-unification. 4 A Class of Estimators In this section we focus on the setting without covariates. The goals is to impute the unobserved control outcomes for the treated unit, Y t,post (0), on the basis of three sets of control outcomes, the pre-treatment period outcomes for both treated and control units, and the post-treatment period outcomes for the control units, Y c,post (0), Y t,pre (0), and Y c,pre (0). We then use these imputed values to estimate the causal effect τ 0,t of the receipt of the treatment on the outcome for unit 0 in time periods t = T 0 +1,...,T 0 +T 1. 7 4.1 A Common Structure Let us focus on the causal effect for unit 0 and for period T for the moment, τ 0,T = Y 0,T (1) Y 0,T (0). Because this unit receives the active treatment during these periods, it follows that Y obs 0,T = Y 0,T(1), and therefore the causal effect is equal to τ 0,T = Y obs 0,T Y 0,T(0), with only Y 0,T (0) unobserved. The first observation we make is that many of the estimators in the literature share the following linear structure for the imputation of the unobserved Y 0,T (0): Ŷ 0,T (0) = µ+ N i=1 ω i Y obs i,t. (4.1) In other words, the imputed control outcome for the treated unit is a linear combination of the control units, with intercept µ and weight ω i for control unit i. 1 The various methods differ in the way the parameters in this linear combination, the intercept µ and the weights ω, are chosen as a function of the outcomes Yc,post obs, Y t,pre obs obs obs, and Yc,pre (but typically not involving Yt,post ). One obvious way to choose the parameters µ and ω, given the characterization in (4.1), is to estimate them by least squares: (ˆµ ols, ˆω ols ) = argmin µ,ω T 0 s=1 ( Y obs 0,s µ N i=1 ω i Y obs 0,s ) 2. (4.2) This regression involves T 0 observations and N+1 predictors (the N potential control units and an intercept). This approach may be attractive in settings where the number of pre-treatment outcomes T 0 is large relative to the number of control units N, but would be less so in cases where they are of similar magntitude. As illustrated by the examples in Section 3, in practice T 0 and N are often of a magnitude that simply estimating this regression by least squares is not likely to have good properties. In its basic form it may not even be feasible if the number of control units is larger than the number of pre-treatment periods. Even if the number of pre-treatment periods is large enough to make this approach formally feasible, the resulting estimator may suffer from lack of precision. This leads to a need for some regularization for, or restrictions on, the weights ω. 1 OneexceptionistheChanges-In-Changes(CIC)method, anonlineargeneralizationofthelineardidmethod, developed in Athey and Imbens [2006]. Another exception is Brodersen et al. [2015] which develops a Bayesian method that allows for time-varying coefficients in the regression. 8 4.2 Four Restrictions on the Intercept and Weights Here we focus on the representation (4.1) of Ŷ0,T(0) as a linear combination of outcomes for the control units. We discuss four constraints on the parameters, both the intercept µ and the weights ω, that have been considered in the literature. In general none of these restrictions are likely to hold in practice and we propose relaxing all four of them. However, relaxing all of them can create problems with statistical precision, leading to a need for statistical regularization. The four constraints we consider are: µ = 0, (NO-INTERCEPT)
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x