Propensity Score Analysis With Missing Data

Published on Jan 1, 2016in Psychological Methods
· DOI :10.1037/met0000076
Heining Cham15
Estimated H-index: 15
(Fordham University),
Stephen G. West68
Estimated H-index: 68
(ASU: Arizona State University)
Propensity score analysis is a method that equates treatment and control groups on a comprehensive set of measured confounders in observational (nonrandomized) studies. A successful propensity score analysis reduces bias in the estimate of the average treatment effect in a nonrandomized study, making the estimate more comparable with that obtained from a randomized experiment. This article reviews and discusses an important practical issue in propensity analysis, in which the baseline covariates (potential confounders) and the outcome have missing values (incompletely observed). We review the statistical theory of propensity score analysis and estimation methods for propensity scores with incompletely observed covariates. Traditional logistic regression and modern machine learning methods (e.g., random forests, generalized boosted modeling) as estimation methods for incompletely observed covariates are reviewed. Balance diagnostics and equating methods for incompletely observed covariates are briefly described. Using an empirical example, the propensity score estimation methods for incompletely observed covariates are illustrated and compared. (PsycINFO Database Record(c) 2016 APA, all rights reserved). Language: en
  • References (73)
  • Citations (15)
📖 Papers frequently viewed together
16.3k Citations
2,942 Citations
1,618 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Felix Thoemmes (Cornell University)H-Index: 19
#2Karthika Mohan (UCLA: University of California, Los Angeles)H-Index: 8
Rubin’s classic missingness mechanisms are central to handling missing data and minimizing biases that can arise due to missingness. However, the formulaic expressions that posit certain independencies among missing and observed data are difficult to grasp. As a result, applied researchers often rely on informal translations of these assumptions. We present a graphical representation of missing data mechanism, formalized in Mohan, Pearl, and Tian (2013). We show that graphical models provide a t...
23 CitationsSource
#1Heining Cham (Fordham University)H-Index: 15
#2Jan N. Hughes (A&M: Texas A&M University)H-Index: 41
Last. Myung Hee Im (A&M: Texas A&M University)H-Index: 8
view all 4 authors...
Abstract This study investigated the effect of grade retention in elementary school on students' motivation for educational attainment in grade 9. We equated retained and promoted students on 67 covariates assessed in grade 1 through propensity score weighting. Retained students (31.55%, n retained = 177) and continuously promoted students (68.45%, n promoted = 384) were compared on the bifactor model of motivation for educational attainment (Cham, Hughes, West & Im, 2014). This model consists o...
17 CitationsSource
#1Stephen G. West (ASU: Arizona State University)H-Index: 68
#2Heining Cham (Fordham University)H-Index: 15
Last. Matthias Weiler (FU: Free University of Berlin)H-Index: 3
view all 6 authors...
A propensity score is the probability that a participant is assigned to the treatment group based on a set of baseline covariates. Propensity scores provide an excellent basis for equating treatment groups on a large set of covariates when randomization is not possible. This article provides a nontechnical introduction to propensity scores for clinical researchers. If all important covariates are measured, then methods that equate on propensity scores can achieve balance on a large set of covari...
44 CitationsSource
#1Brad J. Sagarin (NIU: Northern Illinois University)H-Index: 21
#2Stephen G. West (ASU: Arizona State University)H-Index: 68
Last. Edward J. Hansen (NIU: Northern Illinois University)H-Index: 2
view all 6 authors...
: Treatment noncompliance in randomized experiments threatens the validity of causal inference and the interpretability of treatment effects. This article provides a nontechnical review of 7 approaches: 3 traditional and 4 newer statistical analysis strategies. Traditional approaches include (a) intention-to-treat analysis (which estimates the effects of treatment assignment irrespective of treatment received), (b) as-treated analysis (which reassigns participants to groups reflecting the treatm...
16 CitationsSource
Randomized longitudinal designs are commonly used in psychological and medical studies to investigate the treatment effect of an intervention or an experimental drug. Traditional linear mixed-effects models for randomized longitudinal designs are limited to maximum-likelihood methods that assume data are missing at random (MAR). In practice, because longitudinal data are often likely to be missing not at random (MNAR), the traditional mixed-effects model might lead to biased estimates of treatme...
12 CitationsSource
Apr 2, 2014 in AISTATS (International Conference on Artificial Intelligence and Statistics)
#1Karthika Mohan (UCLA: University of California, Los Angeles)H-Index: 8
#2Judea PearlElias Bareinboim (UCLA: University of California, Los Angeles)H-Index: 84
Graphical models that depict the process by which data are lost are helpful in recovering information from missing data. We address the question of whether any such model can be submitted to a statistical test given that the data available are corrupted by missingness. We present sucient conditions for testability in missing data applications and note the impediments for testability when data are contaminated by missing entries. Our results strengthen the available tests for MCAR and MAR and fur...
16 Citations
#1Lisa Doove (Katholieke Universiteit Leuven)H-Index: 5
#2S. van Buuren (TNO: Netherlands Organisation for Applied Scientific Research)H-Index: 12
Last. Elise Dusseldorp (Katholieke Universiteit Leuven)H-Index: 24
view all 3 authors...
Standard approaches to implement multiple imputation do not automatically incorporate nonlinear relations like interaction effects. This leads to biased parameter estimates when interactions are present in a dataset. With the aim of providing an imputation method which preserves interactions in the data automatically, the use of recursive partitioning as imputation method is examined. Three recursive partitioning techniques are implemented in the multiple imputation by chained equations framewor...
57 CitationsSource
#1Alexander Hapfelmeier (TUM: Technische Universität München)H-Index: 27
#2Torsten Hothorn (LMU: Ludwig Maximilian University of Munich)H-Index: 51
Last. Carolin Strobl (UZH: University of Zurich)H-Index: 20
view all 4 authors...
Random forests are widely used in many research fields for prediction and interpretation purposes. Their popularity is rooted in several appealing characteristics, such as their ability to deal with high dimensional data, complex interactions and correlations between variables. Another important feature is that random forests provide variable importance measures that can be used to identify the most important predictor variables. Though there are alternatives like complete case analysis and impu...
68 CitationsSource
#1Greg RidgewayH-Index: 29
#2Dan McCareyH-Index: 1
Last. Beth Ann GrinH-Index: 1
view all 5 authors...
The Toolkit for Weighting and Analysis of Nonequivalent Groups, twang, contains a set of functions and procedures to support causal modeling of observational data through the estimation and evaluation of propensity scores and associated weights. This package was developed in 2004. After extensive use, it received a major update in 2012. This tutorial provides an introduction to twang and demonstrates its use through illustrative examples. The foundation to the methods supported by twang is the p...
132 Citations
Dec 5, 2013 in NeurIPS (Neural Information Processing Systems)
#1Karthika Mohan (UCLA: University of California, Los Angeles)H-Index: 8
#2Judea PearlElias Bareinboim (UCLA: University of California, Los Angeles)H-Index: 84
Last. Jin Tian (Iowa State University)H-Index: 16
view all 3 authors...
We address the problem of recoverability i.e. deciding whether there exists a consistent estimator of a given relation Q, when data are missing not at random. We employ a formal representation called 'Missingness Graphs' to explicitly portray the causal mechanisms responsible for missingness and to encode dependencies between these mechanisms and the variables being measured. Using this representation, we derive conditions that the graph should satisfy to ensure recoverability and devise algorit...
60 Citations
Cited By15
#1Donna L. Coffman (TU: Temple University)H-Index: 20
#2Jiangxiu Zhou (GSK: GlaxoSmithKline)
Last. Xizhen Cai (Williams College)
view all 3 authors...
BACKGROUND Causal effect estimation with observational data is subject to bias due to confounding, which is often controlled for using propensity scores. One unresolved issue in propensity score estimation is how to handle missing values in covariates. METHOD Several approaches have been proposed for handling covariate missingness, including multiple imputation (MI), multiple imputation with missingness pattern (MIMP), and treatment mean imputation. However, there are other potentially useful ap...
#1Francesco Zaccardi (University of Leicester)H-Index: 27
#2Melanie J. Davies (University of Leicester)H-Index: 73
Last. Kamlesh Khunti (University of Leicester)H-Index: 69
view all 3 authors...
The last decade has witnessed an exponential growth in the opportunities to collect and link health-related data from multiple resources, including primary care, administrative, and device data. The availability of these "real-world," "big data" has fuelled also an intense methodological research into methods to handle them and extract actionable information. In medicine, the evidence generated from "real-world data" (RWD), which are not purposely collected to answer biomedical questions, is com...
#1Sonali ParbhooH-Index: 6
#2Mario WieserH-Index: 4
Last. Volker RothH-Index: 34
view all 4 authors...
Estimating the effects of an intervention from high-dimensional observational data is a challenging problem due to the existence of confounding. The task is often further complicated in healthcare applications where a set of observations may be entirely missing for certain patients at test time, thereby prohibiting accurate inference. In this paper, we address this issue using an approach based on the information bottleneck to reason about the effects of interventions. To this end, we first trai...
1 CitationsSource
#1Sonali Parbhoo (Harvard University)
#1Sonali Parbhoo (Harvard University)
Last. Volker RothH-Index: 34
view all 4 authors...
Estimating the causal effects of an intervention from high-dimensional observational data is difficult due to the presence of confounding. The task is often complicated by the fact that we may have a systematic missingness in our data at test time. Our approach uses the information bottleneck to perform a low-dimensional compression of covariates by explicitly considering the relevance of information. Based on the sufficiently reduced covariate, we transfer the relevant information to cases wher...
#1Shigenori Masaki (Memorial Hospital of South Bend)H-Index: 1
#2Takashi Kawamoto (Memorial Hospital of South Bend)H-Index: 1
Background The long-term outcomes of artificial nutrition in older people with dysphagia remain uncertain. Enteral nutrition via percutaneous endoscopic gastrostomy (PEG) is one of the major methods of artificial nutrition. Enteral feeding is indicated for patients with a functional gastrointestinal tract. However, total parenteral nutrition (TPN) is often inappropriately chosen for artificial nutrition in Japan, even in patients with a functional gastrointestinal tract, as PEG has recently been...
2 CitationsSource
: Propensity score analysis is a statistical method that balances pre-existing differences across treatment conditions achieving a similar condition as randomization and thus, allowing the estimation of causal effects in non-randomized experimental designs. The four stages in propensity score analysis are (1) propensity score estimation, (2) equating or balancing procedures, (3) balance checking, and (4) outcome analysis. Each stage is explained followed by a step-by-step tutorial of applying pr...
1 CitationsSource
#1Yicun Wang (NU: Nanjing University)H-Index: 3
#1Yicun Wang (NU: Nanjing University)
Last. Nirong Bao (NU: Nanjing University)
view all 6 authors...
Abstract Background Morbid obesity is an important risk factor for arthroplasty and also closely associated with worse postoperative outcomes. Bariatric surgery is effective in losing weight and decreasing comorbidities associated with obesity. However, no study had demonstrated the influence of bariatric surgery on the outcome of arthroplasty in a large population. Methods We used 2006-2014 discharge records from the Nationwide Inpatient Sample (NIS), and identified study population and inpatie...
2 CitationsSource
#1Annemarieke Blankestein (Radboud University Nijmegen)
Last. Robert Didden (Radboud University Nijmegen)H-Index: 38
view all 9 authors...
textabstractBackground: An adaptation of multisystemic therapy (MST) was piloted to find out whether it would yield better outcomes than standard MST in families where the adolescent not only shows antisocial or delinquent behaviour, but also has an intel‐ lectual disability. Method: To establish the comparative effectiveness of MST‐ID (n = 55) versus stand‐ ard MST (n = 73), treatment outcomes were compared at the end of treatment and at 6‐month follow‐up. Pre‐treatment differences were control...
#1Jessika Golle (University of Tübingen)H-Index: 6
#2Norman Rose (University of Tübingen)H-Index: 10
Last. Benjamin Nagengast (University of Tübingen)H-Index: 34
view all 11 authors...
According to the social-investment principle, entering new environments is associated with new social roles that influence people’s behaviors. In this study, we examined whether young adults’ personality development is differentially related to their choice of either an academic or a vocational pathway (i.e., entering an academic-track school or beginning vocational training). The personality constructs of interest were Big Five personality traits and vocational-interest orientations. We used a ...
1 CitationsSource
#1Ferdynand Hebal (NU: Northwestern University)
#2Yue Yung Hu (NU: Northwestern University)H-Index: 14
Last. Mehul V. Raval (NU: Northwestern University)H-Index: 3
view all 3 authors...
Abstract Clinical registries provide a valuable opportunity to study specific diagnoses or conditions with a broader scope than possible using individual center-based series and with more clinical detail than typically available in administrative data sources. These registries amass structured data with uniform definitions, thus facilitating reliable adoption and consistent use across contributing sites. By compiling granular data from a multitude of geographically diverse sites, clinical regist...