Cloud-Assisted Read Alignment and Privacy

Published on Jan 1, 2017
· DOI :10.1007/978-3-319-60816-7_27
Maria Fernandes1
Estimated H-index: 1
(University of Luxembourg),
Jeremie Decouchant4
Estimated H-index: 4
(University of Luxembourg)
+ 1 AuthorsPaulo Veríssimo38
Estimated H-index: 38
(University of Luxembourg)
Thanks to the rapid advances in sequencing technologies, genomic data is now being produced at an unprecedented rate. To adapt to this growth, several algorithms and paradigm shifts have been proposed to increase the throughput of the classical DNA workflow, e.g. by relying on the cloud to perform CPU intensive operations. However, the scientific community raised an alarm due to the possible privacy-related attacks that can be executed on genomic data. In this paper we review the state of the art in cloud-based alignment algorithms that have been developed for performance. We then present several privacy-preserving mechanisms that have been, or could be, used to align reads at an incremental performance cost. We finally argue for the use of risk analysis throughout the DNA workflow, to strike a balance between performance and protection of data.
  • References (27)
  • Citations (0)
📖 Papers frequently viewed together
2011ICPP: International Conference on Parallel Processing
4 Authors (Romeo Kienzler, ..., Nesime Tatbul)
12 Citations
2011CCS: Computer and Communications Security
5 Authors (Kehuan Zhang, ..., Yaoping Ruan)
146 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Yan Guo (Vandy: Vanderbilt University)H-Index: 40
#2Shaneda Warren Andersen (Vandy: Vanderbilt University)H-Index: 7
Last. Wei Zheng (Vandy: Vanderbilt University)H-Index: 105
view all 111 authors...
BACKGROUND: Observational epidemiological studies have shown that high body mass index (BMI) is associated with a reduced risk of breast cancer in premenopausal women but an increased risk in postmenopausal women. It is unclear whether this association is mediated through shared genetic or environmental factors. METHODS: We applied Mendelian randomization to evaluate the association between BMI and risk of breast cancer occurrence using data from two large breast cancer consortia. We created a w...
117 CitationsSource
#1Mina Namazi (University of Vigo)H-Index: 1
#2Juan Ramón Troncoso-Pastoriza (University of Vigo)H-Index: 11
Last. Fernando Pérez-González (University of Vigo)H-Index: 26
view all 3 authors...
The field of genomic research has considerably grown in the recent years due to the unprecedented advances brought about by Next Generation Sequencing (NGS) and the need and increasing widespread use of outsourced processing. But this rapid increase also poses severe privacy risks due to the inherently sensitive nature of genomic information. In this work, we address privacy-preserving genetic susceptibility tests outsourced to an untrustworthy party, enhancing previous approaches in terms of co...
4 CitationsSource
#1Effy Vayena (Harvard University)H-Index: 22
#2Urs Gasser (Harvard University)H-Index: 16
With the prospect of genomic data becoming ever more easily available, Effy Vayena and Urs Gasser discuss how we could balance making the most of its benefits with reducing its risks to privacy.
23 CitationsSource
Oct 12, 2015 in WPES (Workshop on Privacy in the Electronic Society)
#1Vinicius V. Cogo (University of Lisbon)H-Index: 4
#2Alysson Bessani (University of Lisbon)H-Index: 23
Last. Paulo Veríssimo (University of Luxembourg)H-Index: 38
view all 4 authors...
Finding the balance between privacy protection and data sharing is one of the main challenges in managing human genomic data nowadays. Novel privacy-enhancing technologies are required to address the known disclosure threats to personal sensitive genomic data without precluding data sharing. In this paper, we propose a method that systematically detects privacy-sensitive DNA segments coming directly from an input stream, using as reference a knowledge database of known privacy-sensitive nucleic ...
13 CitationsSource
#1S DoveEdward (McGill University)H-Index: 17
#2Yann Joly (McGill University)H-Index: 19
Last. Bartha Maria Knoppers (McGill University)H-Index: 49
view all 4 authors...
The biggest challenge in twenty-first century data-intensive genomic science, is developing vast computer infrastructure and advanced software tools to perform comprehensive analyses of genomic data sets for biomedical research and clinical practice. Researchers are increasingly turning to cloud computing both as a solution to integrate data from genomics, systems biology and biomedical data mining and as an approach to analyze data to solve biomedical problems. Although cloud computing provides...
49 CitationsSource
#1Muhammad Naveed (UIUC: University of Illinois at Urbana–Champaign)H-Index: 31
#2Erman Ayday (Bilkent University)H-Index: 19
Last. XiaoFeng Wang (IU: Indiana University Bloomington)H-Index: 35
view all 8 authors...
Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, ...
81 CitationsSource
Aug 31, 2015 in VLDB (Very Large Data Bases)
#1Alysson Bessani (University of Lisbon)H-Index: 23
#2Jörgen Brandt (Humboldt University of Berlin)H-Index: 4
Last. Karin Zimmermann (Charité)H-Index: 9
view all 17 authors...
Biobanks store and catalog human biological material that is increasingly being digitized using next-generation sequencing (NGS). There is, however, a computational bottleneck, as existing software systems are not scalable and secure enough to store and process the incoming wave of genomic data from NGS machines. In the BiobankCloud project, we are building a Hadoop-based platform for the secure storage, sharing, and parallel processing of genomic data. We extended Hadoop to include support for ...
12 CitationsSource
Display Omitted We categorized pre-existing problems and corresponding solutions.We make our classifications in more understandable and convenient way.We have also included open privacy problems. Recently, the rapid advance in genome sequencing technology has led to production of huge amount of sensitive genomic data. However, a serious privacy challenge is confronted with increasing number of genetic tests as genomic data is the ultimate source of identity for humans. Lately, privacy threats an...
28 CitationsSource
Data sharing in genetics is essential to ensure research progress. However, concerns about the impact on privacy of data originators have been raised. This Review summarizes privacy breaching strategies and potential mitigation methods for privacy-preserving dissemination of sensitive data, and highlights different cases that are relevant to genetic applications.
203 CitationsSource
#1Aisling O' Driscoll (CIT: Cork Institute of Technology)H-Index: 6
#2Jurate Daugelaite (CIT: Cork Institute of Technology)H-Index: 2
Last. Roy D. Sleator (CIT: Cork Institute of Technology)H-Index: 36
view all 3 authors...
Graphical abstractDisplay Omitted Ever improving next generation sequencing technologies has led to an unprecedented proliferation of sequence data.Biology is now one of the fastest growing fields of big data science.Cloud computing and big data technologies can be used to deal with biology's big data sets.The Apache Hadoop project, which provides distributed and parallelised data processing are presented.Challenges associated with cloud computing and big data technologies in biology are discuss...
241 CitationsSource
Cited By0