Peter J. Rousseeuw

Katholieke Universiteit Leuven

252Publications

52H-index

41.4kCitations

Publications 252

Newest

#1Pieter Segaert (Katholieke Universiteit Leuven)H-Index: 3

#2Marta B. Lopes (IST: Instituto Superior Técnico)H-Index: 12

Last.Peter J. Rousseeuw (Katholieke Universiteit Leuven)H-Index: 52

view all 5 authors...

Correct classification of breast cancer subtypes is of high importance as it directly affects the therapeutic options. We focus on triple-negative breast cancer which has the worst prognosis among breast cancer types. Using cutting edge methods from the field of robust statistics, we analyze Breast Invasive Carcinoma transcriptomic data publicly available from The Cancer Genome Atlas data portal. Our analysis identifies statistical outliers that may correspond to misdiagnosed patients. Furthermo...

#1Bart De KetelaereH-Index: 21

#2Mia HubertH-Index: 33

Last.Iwein Vranckx

view all 0 authors...

Modern industrial machines can generate gigabytes of data in seconds, frequently pushing the boundaries of available computing power. Together with the time criticality of industrial processing this presents a challenging problem for any data analytics procedure. We focus on the deterministic minimum covariance determinant method (DetMCD), which detects outliers by fitting a robust covariance matrix. We construct a much faster version of DetMCD by replacing its initial estimators by two new meth...

#1Jakob Raymaekers (Katholieke Universiteit Leuven)H-Index: 2

#2Peter J. Rousseeuw (Katholieke Universiteit Leuven)H-Index: 52

The well-known spatial sign covariance matrix (SSCM) carries out a radial transform which moves all data points to a sphere, followed by computing the classical covariance matrix of the transformed data. Its popularity stems from its robustness to outliers, fast computation, and applications to correlation and principal component analysis. In this paper we study more general radial functions. It is shown that the eigenvectors of the generalized SSCM are still consistent and the ranks of the eige...

#1Kris Boudt (Vrije Universiteit Brussel)H-Index: 14

#2Peter J. Rousseeuw (Katholieke Universiteit Leuven)H-Index: 52

Last.Tim Verdonck (Katholieke Universiteit Leuven)H-Index: 10

view all 4 authors...

The minimum covariance determinant (MCD) approach estimates the location and scatter matrix using the subset of given size with lowest sample covariance determinant. Its main drawback is that it cannot be applied when the dimension exceeds the subset size. We propose the minimum regularized covariance determinant (MRCD) approach, which differs from the MCD in that the scatter matrix is a convex combination of a target matrix and the sample covariance matrix of the subset. A data-driven procedure...

#1Peter J. Rousseeuw (Katholieke Universiteit Leuven)H-Index: 52

#2Domenico PerrottaH-Index: 8

Last.Mia Hubert (Katholieke Universiteit Leuven)H-Index: 33

view all 4 authors...

Time series often contain outliers and level shifts or structural changes. These unexpected events are of the utmost importance in fraud detection, as they may pinpoint suspicious transactions. The presence of such unusual events can easily mislead conventional time series analysis and yield erroneous conclusions. A unified framework is provided for detecting outliers and level shifts in short time series that may have a seasonal pattern. The approach combines ideas from the FastLTS algorithm fo...

#1Jakob RaymaekersH-Index: 2

#2Peter J. RousseeuwH-Index: 52

Last.Iwein Vranckx

view all 3 authors...

This is an invited comment on the discussion paper "The power of monitoring: how to make the most of a contaminated multivariate sample" by A. Cerioli, M. Riani, A. Atkinson and A. Corbellini that will appear in the journal Statistical Methods & Applications.

#1Erich Schubert (Technical University of Dortmund)H-Index: 20

#2Peter J. RousseeuwH-Index: 1

Clustering non-Euclidean data is difficult, and one of the most used algorithms besides hierarchical clustering is the popular algorithm PAM, partitioning around medoids, also known as k-medoids. In Euclidean geometry the mean--as used in k-means--is a good estimator for the cluster center, but this does not hold for arbitrary dissimilarities. PAM uses the medoid instead, the object with the smallest dissimilarity to all others in the cluster. This notion of centrality can be used with any (dis-...

#1Ana Helena TavaresH-Index: 2

#2Jakob RaymaekersH-Index: 2

Last.Vera AfreixoH-Index: 10

view all 5 authors...

In this work we seek clusters of genomic words in human DNA by studying their inter-word lag distributions. Due to the particularly spiked nature of these histograms, a clustering procedure is proposed that first decomposes each distribution into a baseline and a peak distribution. An outlier-robust fitting method is used to estimate the baseline distribution (the `trend'), and a sparse vector of detrended data captures the peak structure. A simulation study demonstrates the effectiveness of the...

MacroPCA: An all-in-one PCA method allowing for missing values as well as cellwise and rowwise outliers.

#1Mia HubertH-Index: 33

#2Peter J. RousseeuwH-Index: 52

Last.Wannes Van den BosscheH-Index: 3

view all 0 authors...

Multivariate data are typically represented by a rectangular matrix (table) in which the rows are the objects (cases) and the columns are the variables (measurements). When there are many variables one often reduces the dimension by principal component analysis (PCA), which in its basic form is not robust to outliers. Much research has focused on handling rowwise outliers, i.e. rows that deviate from the majority of the rows in the data (for instance, they might belong to a different population)...

#1Mia Hubert (Katholieke Universiteit Leuven)H-Index: 33

#2Michiel Debruyne (Dexia)H-Index: 6

Last.Peter J. Rousseeuw (Katholieke Universiteit Leuven)H-Index: 52

view all 3 authors...

The Minimum Covariance Determinant (MCD) method is a highly robust estimator of multivariate location and scatter, for which a fast algorithm is available. Since estimating the covariance matrix is the cornerstone of many multivariate statistical methods, the MCD is an important building block when developing robust multivariate techniques. It also serves as a convenient and efficient tool for outlier detection. The MCD estimator is reviewed, along with its main properties such as affine equivar...

12345678910