seagull: lasso, group lasso and sparse-group lasso regularisation for linear regression models via proximal gradient descent

Published on Feb 14, 2020in bioRxiv
· DOI :10.1101/2020.02.13.947473
Jan Klosa1
Estimated H-index: 1
(Leibniz Association),
Noah Simon14
Estimated H-index: 14
(UW: University of Washington)
+ 2 AuthorsDörte Wittenburg7
Estimated H-index: 7
(Leibniz Association)
Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalisation approaches are often the methods of choice. They are especially useful in case of multicollinearity which appears if the number of explanatory variables exceeds the number of ob-servations or for some biological reason. Then, the model goodness of fit is penalised by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realised by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, the package produces complete regularisation paths.
  • References (7)
  • Citations (0)
📖 Papers frequently viewed together
2 Citations
1 Author (Xiaohui Chen)
1 Citations
7 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Christopher G. Bell (QMUL: Queen Mary University of London)H-Index: 23
#2Robert Lowe (QMUL: Queen Mary University of London)H-Index: 17
Last. Vardhman K. Rakyan (QMUL: Queen Mary University of London)H-Index: 34
view all 21 authors...
Epigenetic clocks comprise a set of CpG sites whose DNA methylation levels measure subject age. These clocks are acknowledged as a highly accurate molecular correlate of chronological age in humans and other vertebrates. Also, extensive research is aimed at their potential to quantify biological aging rates and test longevity or rejuvenating interventions. Here, we discuss key challenges to understand clock mechanisms and biomarker utility. This requires dissecting the drivers and regulators of ...
12 CitationsSource
#1Daniel A. Petkovich (Brigham and Women's Hospital)H-Index: 1
#2Dmitriy I. Podolskiy (Brigham and Women's Hospital)H-Index: 5
Last. Vadim N. Gladyshev (Brigham and Women's Hospital)H-Index: 83
view all 6 authors...
Summary The DNA methylation levels of certain CpG sites are thought to reflect the pace of human aging. Here, we developed a robust predictor of mouse biological age based on 90 CpG sites derived from partial blood DNA methylation profiles. The resulting clock correctly determines the age of mouse cohorts, detects the longevity effects of calorie restriction and gene knockouts, and reports rejuvenation of fibroblast-derived iPSCs. The data show that mammalian DNA methylomes are characterized by ...
63 CitationsSource
As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a s...
14 CitationsSource
#1Patrik Waldmann (BOKU: University of Natural Resources and Life Sciences, Vienna)H-Index: 14
#2Gábor Mészáros (Edin.: University of Edinburgh)H-Index: 10
Last. Johann Sölkner (BOKU: University of Natural Resources and Life Sciences, Vienna)H-Index: 36
view all 5 authors...
The number of publications performing genome-wide association studies (GWAS) has increased dramatically. Penalized regression approaches have been developed to overcome the challenges caused by the high dimensional data, but these methods are relatively new in the GWAS field. In this study we have compared the statistical performance of two methods (the least absolute shrinkage and selection operator—lasso and the elastic net) on two simulated data sets and one real data set from a 50 K genome-w...
67 CitationsSource
#1Noah Simon (Stanford University)H-Index: 14
#2Jerome H. Friedman (Stanford University)H-Index: 63
Last. Robert Tibshirani (Stanford University)H-Index: 128
view all 4 authors...
For high-dimensional supervised learning problems, often using problem-specific assumptions can lead to greater accuracy. For problems with grouped covariates, which are believed to have sparse effects both on a group and within group level, we introduce a regularized model for linear regression with l1 and l2 penalties. We discuss the sparsity and other regularization properties of the optimal fit for this model, and show that it has the desired effect of group-wise and within group sparsity. W...
516 CitationsSource
#1Ming Yuan (Georgia Institute of Technology)H-Index: 34
#2Yi Lin (UW: University of Wisconsin-Madison)H-Index: 18
Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor analysis-of-variance problem as the most important and well-known example. Instead of selecting factors by stepwise backward elimination, we focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection. The lasso, the LARS...
4,394 CitationsSource
Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). ...
7,850 CitationsSource
Cited By0