This content downloaded from on Fri, 15 Nov :18:30 AM All use subject to JSTOR Terms and Conditions

Please download to get full document.

View again

of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Calendars

Published:

Views: 0 | Pages: 10

Extension: PDF | Download: 0

Share
Related documents
Description
Optimal Matching for Observational Studies Author(s): Paul R. Rosenbaum Source: Journal of the American Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp Published by: American Statistical
Transcript
Optimal Matching for Observational Studies Author(s): Paul R. Rosenbaum Source: Journal of the American Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp Published by: American Statistical Association Stable URL: Accessed: 15/11/ :18 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. PAUL R. ROSENBAUM* Optimal Matching for Observational Studies Matching is a common method of adjustment in observational studies. Currently, matched samples are constructed usin greedy heuristics (or stepwise procedures) that produce, in general, suboptimal matchings. With respecto a particular criterion, a matched sample is suboptimal if it could be improved by changing the controls assigned to specific treated units, that is, if it could be improved with the data at hand. Here, optimal matched samples are obtained using network flow theory. In addition to providing optimal matched-pair samples, this approach yields optimal constructions for several statistical matching problems that have not been studied previously, including the construction of matched samples with multiple controls, with a variable number of controls, and the construction of balanced matched samples that combine features of pair matching and frequency matching. Computational efficiency is discussed. Extensive use is made of ideas from two essentially disjoint literatures, namely statistical matching observational studies and graph algorithms for matching. The article contains brief reviews of both topics. KEY WORDS: Graph algorithms; Network flow; Propensity score; Statistical computing. 1. INTRODUCTION be seen to differ prior to treatment with respecto various recorded measurements, or they may be suspected to dif- 1.1 Two Literatures on Matching fer in ways that have not been recorded. Observed pre- There are two essentially disjoint literatures on match- treatment differences are controlled by adjustments, for ing. The first is the statistical literature on the construction example by matched sampling, the methodiscussed here. of matched samples for observational studies. The second Even after adjustments have been made for recorded preis the literature in discrete mathematics, computer science, treatment differences, there is always a concern that some and operations research on matching graphs and net- important differences were not recorded, so no adjustworks. This article uses ideas from the second literature ments could be made. See Rosenbaum (1987a,b) and the as they relate to problems in the first. references given there for discussion of methods for ad- The article is organized as follows. Section 1.2 reviews dressing unobserved pretreatment differences. certain statistical aspects of matching in observational Pretreatment measurements are available for N treated studies. Section 1.3 discusses a tangiblexample that il- units, numbered n = 1,..., N, and a reservoir of M lustrates the difference between an optimal matching and potential controls, numbered m = 1,..., M. Often M a matching constructed by the greedy heuristics that are is much larger than N, but this is not essential, and it is currently used by statisticians. The key point is that two assumed only that M? N. Each unit has a vector of or more treated units may have the same control as their pretreatment measurements, say x, for the nth treated unit best match, and conventional heuristics resolve this bot- and wm for the mth potential control. A matched pair is tleneck in an arbitrary way, typicallyielding a suboptimal an ordered pair (n, m) with 1 ' n ' N and 1 ' m ' M, match, that is, a matched sample that could be improved indicating that the nth treated unit is matched with the with the data at hand. Greedy and optimal matching are mth potential control. A complete matched-pair sample is compared in Section 1.4. Relevant network flow theory is a set Z5 of N disjoint matched pairs, that is, N matched briefly reviewed in Section 2, with extensive references. pairs in which each treated unit appears once, and each Network flow methods are used to solve a series of sta- control appears either once or not at all. An incomplete tistical matching problems in Section 3, including matching matched-pair sample is a set of N disjoint matched pairs; with multiple controls, matching with a variable number however, there are strong reasons for avoiding incomplete of controls, and balanced matching. Computational con- matched-pair samples (Rosenbaum and Rubin 1985a), and siderations are discussed in Section 4. little attention will be given to them here. There are two notions of a good complete matched- 1.2 Constructing Matched Samples in pair sample. The first involves closely matched individual Observational Studies: A Short Review pairs, and the second involves balanced treated and con- An observational study is an attempto estimate the trol groups. A pair (n, m) is closely matched if xn is effects of a treatment when, for ethical or practical rea- in some sense close to wm, for instance, close in terms sons, it is not possible to randomly assign units to treat- of some distance, 5(xn, Wm). When there is only a single ment or control; see Cochran (1965) for a review of issues pretreatment measurement or covariate (i.e., when xn that arise in such studies. The central problem in obser- and wm are scalars), the distance typically studied is the vational studies is that treated and control units may not absolute difference in their values (e.g., Rubin 1973). be comparable prior to treatment, so differences in out- When there are several covariates, various distances have comes in treated and control groups may or may not in- been used, including the Euclidean distance based on standicate effects actually caused by the treatment. This prob- dardized coordinates and the Mahalanobis distance-(carlem has two aspects: The treated and control groups may? 1989 American Statistical Association * Paul R. Rosenbaum is Associate Professor, Department of Statistics, Journal of the American Statistical Association Wharton School, University of Pennsylvania, Philadelphia, PA December 1989, Vol. 84, No. 408, Theory and Methods 1024 Rosenbaum: Optimal Matching for Observational Studies 1025 penter 1977; Cochran and Rubin 1973, sec. 6; Rubin 1980; method and the results obtained by other methods are Smith, Kark, Cassel, and Spears 1977). Another possibility deferred to later sections. The example selected was large involves replacing coordinates by their ranks. Alterna- enough to be interesting, but small enough to permit direct tively, one mightry to weight different coordinates by examination using a single table of distances. In particular, some measure of importance. (Although the distance must the exampl exhibits the bottleneck problem that optimal be nonnegative, it need not be a true distance; it need not matching methods resolve in the best possible way. satisfy the triangle inequality.) The total distance between The example uses the now-familiar data on 26 U.S. light matched pairs, 1(n,m)ez 5(xn, Wm), is one measure of the water nuclear power plants, as collected by W. E. Mooz quality of the matched sample. and as reported by Cox and Snell (1981).(Excluded are When there are more than a few covariates, genuinely the six partial turnkey plants, whose costs may contain close matched pairs will be rare. This motivates the second hidden subsidies.) Seven of the plants were constructed notion of a good matched-pair sample, namely covariate on a site at which a light water reactor had previously balance. There is covariate balance if within the matched existed; they are the treated units. Each such unit will be sample the distributions of xn and wm are similar for matched matched with two controls from among the remaining 19 units, that is, for units with (n, m) E Zs. For instance, the plants. A comparison of the costs of the treated and control vector difference d in covariate means in the matched plants might be the basis for thinking about the advantages treated and control groups is one of many measures of or disadvantages of building a new plant at an existing site. covariate imbalance. Note that d may be small (i.e., close (Of course, such an analysis would involve analytical issues to the 0 vector) even if some individual matched pairs beyond the construction of the matched sample discussed exhibit large differences, xn - win, because the differences here; e.g., see Rosenbaum 1988a,b.) in different pairs may cancel. Balance is therefore a weaker Table 1 is a matrix of distances between treated and condition than close matching within each pair, and since control power plants, with the 7 treated plants as the colit is weaker it can often be attained when close matching umns and the 19 potential controls as the rows. For easy within pairs is not possible. identification, the plant numbers are those of Cox and One way to obtain balanced matched samples is by Snell (1981), so plant 3 is the firstreated plant, plant 1 matching on the propensity score, as discussed by Rosen- is the first potential control, and the distance between baum and Rubin (1983). Under a stochastic model for the these two plants is 28. The distance between two plants is assignment of units to treatment or control, the propensity defined in terms of two covariates: the date the construcscore is the conditional probability of being assigned to tion permit was issued and the capacity of the power plant. treatment, given the observed covariates. In practice, the propensity score is estimated from the data using a model Table 1. Distances Between Treated and Control Power Plants such as a logit model. As the propensity score is a scalar, it is often easy to obtain close matches on it, and theo- Treated plants retical arguments show that the resulting matched sample Control plants 3 5* * 24 will tend to balance all of the covariates used to construct the propensity score; that is, the dimensionality of x,, and 1* Wm ceases to be a major problem. The balance obtained 2 [ in this way is stochastic, that is, in expectation and with 4* probability 1 as N tends to oo, but in any given matched 6* sample some imbalances will remain. An empirical inves tigation (Rosenbaum and Rubin 1985b) compared the per m formance of three greedy matching methods as applied to a data set. The best of these three picked the closest match 10* in terms of the Mahalanobis metric from a restricted subset or caliper of potential controls who were close to the treated 12* unit on the propensity score E Optimal matching within propensity score calipers is discussed in Sections 3.4 and 4.2; arguably, it is the method El of choice A Motivating Example: Optimal Matching Versus a Greedy Heuristic In this section, a greedy match is contrasted with an optimal match obtained by one of the methods discussed in later sections. The matching algorithms in the statistical literature are essentially greedy algorithms; they do not generally find a matched sample that minimizes the total distance between matched pairs. The details of the optimal 17* E h [4I NOTE: An optimal match is indicated by a box. *Plants constructed in the northeastern part of the United States. 1026 Journal of the American Statistical Association, December 1989 These two covariates were replaced by their ranks (1, a small opportunity: It adds match (n, m) = (22, 26) at a..., 26), with average ranks used in case of ties. The cost of 12 units of distance, whereas a new match could distance between two plants is the sum of the two absolute have been added at a cost of 11 by removing match (20, differences in theiranks on the two covariates. A distance 15), thereby freeing control 15, and adding matches (20, of 0 indicates two plants had identical tied values for both 21) and (22, 15), for a total cost of = 11. covariates, whereas the maximum possible difference is At step 13, greedy misses a somewhat larger opportunity, (26-1) + (26-1) = 50. The actual differences range and it is now behind by a cost of 6 units of distance. At from 0 to 40. (I have had some unpleasant experiences step 14, with two controls per treated unit, greedy misses using standardeviations to scale covariates in multivari- another small opportunity, and has paid a total price that ate matching, and I am inclined to think that eithe ranks is (79-71)/71 = 11% higher than necessary. or some more resistant measure of spread should routinely be used instead.) 1.4 Comparing Greedy and Optimal Matching A greedy algorithm divides a problem such as matching into N separate decisions, makes those decisionsequen- Why prefer the optimal match? There are several reatially without revision or reconsideration, and each decisons. First, there is the obvious point that the optimal sion is best among the choices then available. (Stepwise match is always as good as and often better than the greedy regression by forward selection is a familiar example of a match. In the example, the loss due to greedy was 11%; greedy algorithm.) The greedy algorithm starts with a match not a disaster, but worth avoiding. of minimum distance, in this case one of the zero distances, The second point, though distinct, closely related to removing that row from further consideration. For inthe first. Although a greedy algorithm (like forward stepstance, it might match treated plant 3 with potential conwise regression) may provide a tolerable answer, it rarely trol 2, removing plant 2 from further consideration. The comes with a guarantee tha the answer is in fac tolerable. process repeats on the reduced array. For instance, plant In particular, greedy matching can be arbitrarily poor com- 5 might be matched with plant 4. As soon as a treated pared to optimal matching. To see this, consider a case plant has two matched controls, the corresponding column with N = M = 2, and the following 2 x 2 distance matrix. is also deleted. Treated Table 2 contrasts the performance of the greedy and 1 2 optimal matching procedures. Each step in Table 2 is the Control 1 0 E addition of one control to the matched sample, so at step 2 e o k there is a partial match consisting of k controls, with at most two controls matched to each treated unit. The total For 0 e oo, greedy grabs the (1, 1) match at a cost of distance within these k pairs is used to evaluate the match 0, can never reconsider, and is forced to pay a cost of oo at step k. The greedy algorithm performs perfectly for the for the (2, 2) match. Of course, the optimal match is first 11 steps; no partial match with k controls is better (1, 2) and (2, 1) with a cost of 2E. There is no simple way than greedy's choice for k ' 11. At step 12, greedy misses to be sure you are not paying an intolerably high price using greedy. In short, even if the goal is a tolerable rather than an optimal match, greedy comes with no guarantee Table 2. Comparison of Greedy and Optimal Match Construction that it will find a tolerable match when it exists. Korte and Greedy Optimal Hausmann (1978) evaluated greedy heuristics for maximum similarity and minimum distance matching; surpris- Total Total Step Action distance Action distance ingly, these cases turn out to be quite different. Consider a second larger example that permits a fairly 1 Add (3, 2) 0 Add (3, 2) 0 complet evaluation. Suppose there are N treated units 2 Add (5, 4) 0 Add (5, 4) 0 3 Add (18, 13) 0 Add (18, 13) 0 having covariate values 2,4,..., 2N and an equal number 4 Add (20, 14) 0 Add (20, 14) 0 of potential controls having covariate values 1 - c, 3-5 Add (18, 8) 2 Add (18, 8) 2?,..., 2N c, where? 0 is vanishingly small. 6 Add (20, 15) 4 Add (20, 15) 4 7 Add (9, 7) 8 Add (9, 7) 8 Suppose that the absolute difference in the covariate val- 8 Add (24, 23) 12 Add (24, 23) 12 ues is used as the measure of distance. Then, 2 is slightly 9 Add (22, 17) 17 Add (22, 17) 17 closer to 3 - c than to 1 - c, and so forth. Greedy pairs 10 Add (9, 10) 22 Add (9, 10) Add (24, 25) 30 Add (24, 25) 30 2 with 3 - c, 3 with 4 - c, and so on, and is finally forced 12 Add (22, 26) 42 Delete (20, 15) to pair 2N with the only unmatched treated unit, namely Add (20, 21) 1 - E. If the covariate were age, this would mean matching Add (22, 15) Add (5, 21) 58 Delete (9, 7) the oldest treated unit to the youngest control. Since? is Add (9, 16) vanishingly small, the total absolute difference within the Add (5, 7) 52 N pairs is AG = (N - 1) + (2N - 1). In contrast, the 14 Add (3, 19) 79 Delete (22, 15) Delete (20, 21) optimal procedure pairs 2 with 1 - c, 3 with 2 -?, and Add (22, 26) so on, for a total distance of A0O - N. The percent increase Add (20, 15) in distance due to using greedy rather than optimal match- Add (3, 21) 71 ing is loo(ag - A0)/A0 = 100{2 - (2/N)}-*200% as Rosenbaum: Optimal Matching for Observational Studies 1027 N -* oo, so greedy can be quite poor in large problems as 2. NETWORK FLOW THEORY: A SHORT REVIEW well as small ones. Note, however, that if e is negative and 2.1 Graphs, Networks, Flows, Maximum Flows, vanishingly small, then greedy yields the optimal match. and Minimum Cost Flows In other words, greedy's performance relative to optimal matching is sensitive to small changes in the covariate One version of optimal matching is a standard problem values. that is known to be equivalento finding a flow of mini- The two previous examples concern pair matching when mum cost in a network. Section 2.1 reviews various def- N = M, so every control is matched. As a result, the initions, and Section 2.2 discusses the matching problem. marginal distribution of the covariate among the M con- Minimum cost flows have been discussed in many standard trols is unchanged by matching. As a final example, con- references, including Ford and Fulkerson (1962, sec. 3), sider pair matching with M = N + 1, so one control is Lawler (1976, sec. 4), Carre (1979, sec. 6), and Papadito be left unmatched. For simplicity, consider the same mitriou and Steiglitz (1982, sec. 7). situation as in the previous paragraph but with one ad- A (directed) graph is a set of vertices V and a set E of ditional control having covariate value 3N, so this addi- (directed) edges consisting of ordered pairs e = (V1, V2) tional control is in effect an outlier in the covariate. For of distinct vertices, so E is a subset of V x V. In the vanishingly small e 0, greedy pairs 2 with 3-8, and so discussion here, V is not empty and contains finitely many on, finally pairing 2N with 3N since 2N is closer to 3N elements. One draws a picture of a graph by drawing a than to 1 - e. So greedy unnecessarily uses the outlier, point for each vertex v E V and, for each edge e = (v1, yielding a total absolute distance of N N = 2N - V2) E F, an arrow from v1 to v2. An edge e = (V1, V2) is 1. The optimal match is unchanged from the previous para- said to be from v1 to v2, or to leave v1 and enter v2. Let graph, with a total distance of N. Consider the treated- i(v) and 0(v) be the sets of all edges entering and leaving minus-control difference in covariate means for the N vertex v respectively. Here, i is for in and 0 is for out. matched pairs. For the greedy match the difference in For us, a network is a graph with two distinguished means is - (2N - 1)IN whereas for the optimal match it vertices, a source s E V and a sink t E V, with i(s) = 0 is 1, so greedy matching increased the absolute value of and 0(t) = 0. The structures associated w
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x