Match!

Do we need hundreds of classifiers to solve real world classification problems

Published on Jan 1, 2014in Journal of Machine Learning Research4.09
· DOI :10.1117/1.JRS.11.015020
Manuel Fernández-Delgado14
Estimated H-index: 14
(University of Santiago de Compostela),
E. Cernadas13
Estimated H-index: 13
(University of Santiago de Compostela)
+ 1 AuthorsDinani Amorim2
Estimated H-index: 2
Abstract
We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively).
Figures & Tables
  • References (110)
  • Citations (975)
References110
Newest
#2Torsten HothornH-Index: 49
Last.Friedrich Leisch (BOKU: University of Natural Resources and Life Sciences, Vienna)H-Index: 40
view all 3 authors...
#1Manuel Fernández-Delgado (University of Santiago de Compostela)H-Index: 14
#2E. Cernadas (University of Santiago de Compostela)H-Index: 13
Last.José Neves (University of Minho)H-Index: 52
view all 5 authors...
#1Houtao Deng (Intuit)H-Index: 11
#2George C. Runger (ASU: Arizona State University)H-Index: 34
#1Joaquin Vanschoren (Katholieke Universiteit Leuven)H-Index: 15
#2Hendrik Blockeel (Katholieke Universiteit Leuven)H-Index: 35
Last.Geoff Holmes (University of Waikato)H-Index: 39
view all 4 authors...
Apr 1, 2012 in SMC (Systems, Man and Cybernetics)
#1Guang-Bin Huang (NTU: Nanyang Technological University)H-Index: 57
#2Hongming Zhou (NTU: Nanyang Technological University)H-Index: 9
Last.Rui Zhang (NTU: Nanyang Technological University)H-Index: 1
view all 4 authors...
Cited By975
Newest
#1Rakesh Katuwal (NTU: Nanyang Technological University)H-Index: 3
#2P. N. Suganthan (NTU: Nanyang Technological University)H-Index: 2
Last.Le Zhang (Agency for Science, Technology and Research)H-Index: 14
view all 3 authors...
#1Andrea Bommert (Technical University of Dortmund)H-Index: 2
#2Xudong Sun (LMU: Ludwig Maximilian University of Munich)H-Index: 2
Last.Michel Lang (Technical University of Dortmund)H-Index: 9
view all 5 authors...
#1Richard Azu Crabbe (UNE: University of New England (Australia))H-Index: 1
#2David Lamb (UNE: University of New England (Australia))H-Index: 47
Last.Clare Edwards (UNE: University of New England (Australia))H-Index: 2
view all 3 authors...
#1Paul D. Macintyre (UWA: University of Western Australia)H-Index: 2
#2Adriaan van Niekerk (Stellenbosch University)H-Index: 12
Last.Ladislav Mucina (Stellenbosch University)H-Index: 32
view all 3 authors...
#1Julian F. Miller (Ebor: University of York)H-Index: 38
#2Dennis G. Wilson (University of Toulouse)H-Index: 5
Last.Sylvain Cussat-Blanc (University of Toulouse)H-Index: 10
view all 3 authors...
#1D. Dickel (MSU: Mississippi State University)
#2D.K. Francis (MSU: Mississippi State University)
Last.C.D. Barrett (MSU: Mississippi State University)
view all 3 authors...
#1Choo-Yee Ting (MMU: Multimedia University)H-Index: 8
#2Nicholas Yu-Zhe Tan (MMU: Multimedia University)
view all 5 authors...
#1Luca Oneto (UniGe: University of Genoa)H-Index: 19
#1Dmitry S. Bulgarevich (National Institute for Materials Science)H-Index: 7
#2Susumu Tsukamoto (National Institute for Materials Science)H-Index: 14
Last.Makoto Watanabe (UTokyo: University of Tokyo)H-Index: 3
view all 5 authors...
#1Ram Narayan Patro (IIIT: Indian Institutes of Information Technology)H-Index: 1
#2Subhashree Subudhi (IIIT: Indian Institutes of Information Technology)H-Index: 1
Last.Harish Kumar Sahoo (VSSUT: Veer Surendra Sai University of Technology)H-Index: 4
view all 5 authors...
View next paperGreedy function approximation: A gradient boosting machine.