Denoising large-scale biological data using network filters

Published on Mar 14, 2020in bioRxiv
· DOI :10.1101/2020.03.12.989244
Andrew J Kavran (CU: University of Colorado Boulder), Aaron Clauset33
Estimated H-index: 33
(CU: University of Colorado Boulder)
Large-scale biological data sets, e.g., transcriptomic, proteomic, or ecological, are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation. Here we describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or "filtered" to better recover an underlying biological signal. Similar to the process of denoising an image, a single network filter may be applied to an entire system, or the system may be first decomposed into distinct modules and a different filter applied to each. Applied to synthetic data with known network structure and signal, network filters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network filtering prior to training increases accuracy up to 58% compared to using unfiltered data. These results indicate the broad potential utility of network-based filters to applications in systems biology.
  • References (37)
  • Citations (0)
📖 Papers frequently viewed together
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Aaron McKenna (Dartmouth College)H-Index: 1
#2James A. Gagnon (UofU: University of Utah)H-Index: 17
ABSTRACT Every animal grows from a single fertilized egg into an intricate network of cell types and organ systems. This process is captured in a lineage tree: a diagram of every cell9s ancestry back to the founding zygote. Biologists have long sought to trace this cell lineage tree in individual organisms and have developed a variety of technologies to map the progeny of specific cells. However, there are billions to trillions of cells in complex organisms, and conventional approaches can only ...
6 CitationsSource
#1Amir Ghasemian (CU: University of Colorado Boulder)H-Index: 3
#2Homa Hosseinmardi (SC: University of Southern California)H-Index: 10
Last. Aaron Clauset (CU: University of Colorado Boulder)H-Index: 33
view all 3 authors...
A common graph mining task is community detection, which seeks an unsupervised decomposition of a network into groups based on statistical regularities in network connectivity. Although many such algorithms exist, community detection's No Free Lunch theorem implies that no algorithm can be optimal across all inputs. However, little is known in practice about how different algorithms over or underfit to real networks, or how to reliably assess such behavior across algorithms. Here, we present a b...
28 CitationsSource
#1Ievgenia Pastushenko (ULB: Université libre de Bruxelles)H-Index: 8
#2Cédric Blanpain (ULB: Université libre de Bruxelles)H-Index: 65
Epithelial–mesenchymal transition (EMT) is a process in which epithelial cells acquire mesenchymal features. In cancer, EMT is associated with tumor initiation, invasion, metastasis, and resistance to therapy. Recently, it has been demonstrated that EMT is not a binary process, but occurs through distinct cellular states. Here, we review the recent studies that demonstrate the existence of these different EMT states in cancer and the mechanisms regulating their functions. We discuss the differen...
53 CitationsSource
#1Leto Peel (UCL: Université catholique de Louvain)H-Index: 11
#2Jean-Charles Delvenne (UCL: Université catholique de Louvain)H-Index: 22
Last. Renaud Lambiotte (University of Oxford)H-Index: 41
view all 3 authors...
Assortative mixing in networks is the tendency for nodes with the same attributes, or metadata, to link to each other. It is a property often found in social networks, manifesting as a higher tendency of links occurring between people of the same age, race, or political belief. Quantifying the level of assortativity or disassortativity (the preference of linking to nodes with different attributes) can shed light on the organization of complex networks. It is common practice to measure the level ...
10 CitationsSource
#1Leto Peel (UCL: Université catholique de Louvain)H-Index: 11
#2Daniel B. Larremore (SFI: Santa Fe Institute)H-Index: 14
Last. Aaron Clauset (CU: University of Colorado Boulder)H-Index: 33
view all 3 authors...
Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system’s components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities becaus...
126 CitationsSource
#1Avrum Spira (BU: Boston University)H-Index: 42
#2Matthew B. Yurgelun (Harvard University)H-Index: 18
Last. Scott M. Lippman (UCSD: University of California, San Diego)H-Index: 96
view all 20 authors...
Cancer development is a complex process driven by inherited and acquired molecular and cellular alterations. Prevention is the holy grail of cancer elimination, but making this a reality will take a fundamental rethinking and deep understanding of premalignant biology. In this Perspective, we propose a national concerted effort to create a Precancer Atlas (PCA), integrating multi-omics and immunity – basic tenets of the neoplastic process. The biology of neoplasia caused by germline mutations ha...
39 CitationsSource
Lineage analyses of multicellular organisms provide key insights into developmental mechanisms and how these developmental trajectories go awry in diverse diseases. This Review discusses the features, technical challenges and latest opportunities of an evolving range of sophisticated genetic techniques for tracking cell lineages in organisms. These strategies include methods for prospective tracking using engineered genetic constructs, as well as retrospective tracking based on naturally occurri...
82 CitationsSource
#1Jan Daniel Rudolph (TAU: Tel Aviv University)H-Index: 2
#2Marjo de Graauw (LEI: Leiden University)H-Index: 14
Last. Roded Sharan (TAU: Tel Aviv University)H-Index: 57
view all 5 authors...
Summary Phosphoproteomic experiments typically identify sites within a protein that are differentially phosphorylated between two or more cell states. However, the interpretation of these data is hampered by the lack of methods that can translate site-specific information into global maps of active proteins and signaling networks, especially as the phosphoproteome is often undersampled. Here, we describe PHOTON, a method for interpreting phosphorylation data within their signaling context, as ca...
17 CitationsSource
Nov 13, 2016 in HiPC (IEEE International Conference on High Performance Computing, Data, and Analytics)
#1Maksudul Alam (VT: Virginia Tech)H-Index: 5
#2Maleq Khan (VT: Virginia Tech)H-Index: 15
Last. Madhav V. Marathe (VT: Virginia Tech)H-Index: 44
view all 4 authors...
Many real-world systems and networks are modeled and analyzed using various random graph models. These models must incorporate relevant properties such as degree distribution and clustering coefficient. Many models, such as the Chung-Lu (CL), stochastic Kronecker, stochastic block model (SBM), and block two-level Erdős-Renyi (BTER) models have been devised to capture those properties. However, the generative algorithms for these models are mostly sequential and take prohibitively long time to ge...
5 CitationsSource
#1Mark Newman (UM: University of Michigan)H-Index: 91
#2Aaron Clauset (CU: University of Colorado Boulder)H-Index: 33
Analysis of network structure is usually based on knowledge of connections alone, ignoring additional information such as gender or age of individuals in social networks. Here the authors devise an approach that incorporates such metadata and uses it to improve the detection of network communities.
142 CitationsSource
Cited By0