Match!
Gaël Varoquaux
French Institute for Research in Computer Science and Automation
Machine learningCognitionPattern recognitionMathematicsComputer science
185Publications
34H-index
20kCitations
What is this?
Publications 202
Newest
#1Russell A. Poldrack (Stanford University)H-Index: 97
#2Grace Huckins (Stanford University)H-Index: 1
Last. Gaël Varoquaux (IRIA: French Institute for Research in Computer Science and Automation)H-Index: 34
view all 3 authors...
Importance Great interest exists in identifying methods to predict neuropsychiatric disease states and treatment outcomes from high-dimensional data, including neuroimaging and genomics data. The goal of this review is to highlight several potential problems that can arise in studies that aim to establish prediction. Observations A number of neuroimaging studies have claimed to establish prediction while establishing only correlation, which is an inappropriate use of the statistical meaning of p...
14 CitationsSource
#1Gabriel Brat (BIDMC: Beth Israel Deaconess Medical Center)H-Index: 11
#2Griffin M. Weber (Harvard University)H-Index: 23
view all 84 authors...
INTRODUCTION: The Coronavirus Disease 2019 (COVID-19) epidemic has caused extreme strains on health systems, public health infrastructure, and economies of many countries. A growing literature has identified key laboratory and clinical markers of pulmonary, cardiac, immune, coagulation, hepatic, and renal dysfunction that are associated with adverse outcomes. Our goal is to consolidate and leverage the largely untapped resource of clinical data from electronic health records of hospital systems ...
Source
Statistical models usually require vector representations of categorical variables, using for instance one-hot encoding. This strategy breaks down when the number of categories grows, as it creates high-dimensional feature vectors. Additionally, for string entries, one-hot encoding does not capture information in their representation. Here, we seek low-dimensional encoding of high-cardinality string categorical variables. Ideally, these should be: scalable to many categories; interpretable to en...
Source
#2Russell A. Poldrack (Stanford University)H-Index: 97
Last. Gaël VaroquauxH-Index: 34
view all 8 authors...
Reaching a global view of brain organization requires assembling evidence on widely different mental processes and mechanisms. The variety of human neuroscience concepts and terminology poses a fundamental challenge to relating brain imaging results across the scientific literature. Existing meta-analysis methods perform statistical tests on sets of publications associated with a particular concept. Thus, large-scale meta-analyses only tackle single terms that occur frequently. We propose a new ...
2 CitationsSource
#1Kamalaker DadiH-Index: 2
#2Gaël VaroquauxH-Index: 34
Last. Arthur Mensch ('ENS Paris': École Normale Supérieure)H-Index: 5
view all 7 authors...
Population imaging markedly increased the size of functional-imaging datasets, shedding new light on the neural basis of inter-individual differences. Analyzing these large data entails new scalability challenges, computational and statistical. For this reason, brain images are typically summarized in a few signals, for instance reducing voxel-level measures with brain atlases or functional modes. A good choice of the corresponding brain networks is important, as most data analyses start from th...
#1Jérôme Dockès (Université Paris-Saclay)
#2Russell A. Poldrack (Stanford University)H-Index: 97
Last. Gaël Varoquaux (Université Paris-Saclay)H-Index: 1
view all 8 authors...
Source
#2Russell A. Poldrack (Stanford University)H-Index: 97
Last. Gaël VaroquauxH-Index: 34
view all 8 authors...
Reaching a global view of brain organization requires assembling evidence on widely different mental processes and mechanisms. The variety of human neuroscience concepts and terminology poses a fundamental challenge to relating brain imaging results across the scientific literature. Existing meta-analysis methods perform statistical tests on sets of publications associated with a particular concept. Thus, large-scale meta-analyses only tackle single terms that occur frequently. We propose a new ...
#2Nicolas Prost (École Polytechnique)H-Index: 1
Last. Gaël VaroquauxH-Index: 34
view all 5 authors...
We consider building predictors when the data have missing values. We study the seemingly-simple case where the target to predict is a linear function of the fully-observed data and we show that, in the presence of missing values, the optimal predictor may not be linear. In the particular Gaussian case, it can be written as a linear function of multiway interactions between the observed data and the various missing-value indicators. Due to its intrinsic complexity, we study a simple approximatio...
#1Xavier BouthillierH-Index: 6
Last. Gaël Varoquaux (IRIA: French Institute for Research in Computer Science and Automation)H-Index: 34
view all 2 authors...
How do machine-learning researchers run their empirical validation? In the context of a push for improved reproducibility and benchmarking, this question is important to develop new tools for model comparison. This document summarizes a simple survey about experimental procedures, sent to authors of published papers at two leading conferences, NeurIPS 2019 and ICLR 2020. It gives a simple picture of how hyper-parameters are set, how many baselines and datasets are included, or how seeds are used...
Dec 8, 2019 in NeurIPS (Neural Information Processing Systems)
#1meyer scetbon (École normale supérieure de Cachan)H-Index: 1
#1M. Scetbon (École normale supérieure de Cachan)
Last. Gaël Varoquaux (IRIA: French Institute for Research in Computer Science and Automation)H-Index: 34
view all 2 authors...
Are two sets of observations drawn from the same distribution? This problem is a two-sample test. Kernel methods lead to many appealing properties. Indeed state-of-the-art approaches use the L^2distance between kernel-based distribution representatives to derive their test statistics. Here, we show that L^pdistances (with p\geq 1 between these distribution representatives give metrics on the space of distributions that are well-behaved to detect differences between distributions as they...
12345678910