A Simple and Effective Model-Based Variable Importance Measure.

Published on Jan 1, 2018in arXiv: Machine Learning
Brandon M. Greenwell4
Estimated H-index: 4
Bradley C. Boehmke4
Estimated H-index: 4
+ 0 AuthorsAndrew J. McCarthy1
Estimated H-index: 1
In the era of "big data", it is becoming more of a challenge to not only build state-of-the-art predictive models, but also gain an understanding of what's really going on in the data. For example, it is often of interest to know which, if any, of the predictors in a fitted model are relatively influential on the predicted outcome. Some modern algorithms---like random forests and gradient boosted decision trees---have a natural way of quantifying the importance or relative influence of each feature. Other algorithms---like naive Bayes classifiers and support vector machines---are not capable of doing so and model-free approaches are generally used to measure each predictor's importance. In this paper, we propose a standardized, model-based approach to measuring predictor importance across the growing spectrum of supervised learning algorithms. Our proposed method is illustrated through both simulated and real data examples. The R code to reproduce all of the figures in this paper is available in the supplementary materials.
  • References (17)
  • Citations (7)
#1Pierre Geurts (University of Liège)H-Index: 27
#2Damien Ernst (University of Liège)H-Index: 27
Last.Louis Wehenkel (University of Liège)H-Index: 38
view all 3 authors...
Cited By7
#1Topi Paananen (TKK: Helsinki University of Technology)
#2Michael Riis Andersen (DTU: Technical University of Denmark)H-Index: 3
Last.Aki Vehtari (TKK: Helsinki University of Technology)H-Index: 27
view all 0 authors...
#1Jonathan J. Maynard (ARS: Agricultural Research Service)H-Index: 9
#2Travis W. Nauman (USGS: United States Geological Survey)H-Index: 6
Last.Joel R. Brown (USDA: United States Department of Agriculture)H-Index: 23
view all 7 authors...
View next paperIntervention in prediction measure: a new approach to assessing variable importance for random forests