Cloud-Assisted Read Alignment and Privacy

Published on Jan 1, 2017
· DOI :10.1007/978-3-319-60816-7_27
Maria Fernandes1
Estimated H-index: 1
(University of Luxembourg),
Jérémie Decouchant3
Estimated H-index: 3
(University of Luxembourg)
+ 1 AuthorsPaulo Veríssimo39
Estimated H-index: 39
(University of Luxembourg)
Thanks to the rapid advances in sequencing technologies, genomic data is now being produced at an unprecedented rate. To adapt to this growth, several algorithms and paradigm shifts have been proposed to increase the throughput of the classical DNA workflow, e.g. by relying on the cloud to perform CPU intensive operations. However, the scientific community raised an alarm due to the possible privacy-related attacks that can be executed on genomic data. In this paper we review the state of the art in cloud-based alignment algorithms that have been developed for performance. We then present several privacy-preserving mechanisms that have been, or could be, used to align reads at an incremental performance cost. We finally argue for the use of risk analysis throughout the DNA workflow, to strike a balance between performance and protection of data.
  • References (26)
  • Citations (0)
Published on Aug 23, 2016in PLOS Medicine
Yan Guo13
Estimated H-index: 13
(Vandy: Vanderbilt University),
Shaneda Warren Andersen7
Estimated H-index: 7
(Vandy: Vanderbilt University)
+ 108 AuthorsJenny Chang-Claude89
Estimated H-index: 89
(DKFZ: German Cancer Research Center)
BACKGROUND: Observational epidemiological studies have shown that high body mass index (BMI) is associated with a reduced risk of breast cancer in premenopausal women but an increased risk in postmenopausal women. It is unclear whether this association is mediated through shared genetic or environmental factors. METHODS: We applied Mendelian randomization to evaluate the association between BMI and risk of breast cancer occurrence using data from two large breast cancer consortia. We created a w...
Published on Jun 20, 2016
Mina Namazi1
Estimated H-index: 1
(University of Vigo),
Juan Ramón Troncoso-Pastoriza9
Estimated H-index: 9
(University of Vigo),
Fernando Pérez-González23
Estimated H-index: 23
(University of Vigo)
The field of genomic research has considerably grown in the recent years due to the unprecedented advances brought about by Next Generation Sequencing (NGS) and the need and increasing widespread use of outsourced processing. But this rapid increase also poses severe privacy risks due to the inherently sensitive nature of genomic information. In this work, we address privacy-preserving genetic susceptibility tests outsourced to an untrustworthy party, enhancing previous approaches in terms of co...
Published on Jan 12, 2016in PLOS Medicine
Effy Vayena21
Estimated H-index: 21
(Harvard University),
Urs Gasser16
Estimated H-index: 16
(Harvard University)
With the prospect of genomic data becoming ever more easily available, Effy Vayena and Urs Gasser discuss how we could balance making the most of its benefits with reducing its risks to privacy.
Published on Oct 12, 2015 in WPES (Workshop on Privacy in the Electronic Society)
Vinicius V. Cogo4
Estimated H-index: 4
(University of Lisbon),
Alysson Neves Bessani21
Estimated H-index: 21
(University of Lisbon)
+ 1 AuthorsPaulo Veríssimo39
Estimated H-index: 39
(University of Luxembourg)
Finding the balance between privacy protection and data sharing is one of the main challenges in managing human genomic data nowadays. Novel privacy-enhancing technologies are required to address the known disclosure threats to personal sensitive genomic data without precluding data sharing. In this paper, we propose a method that systematically detects privacy-sensitive DNA segments coming directly from an input stream, using as reference a knowledge database of known privacy-sensitive nucleic ...
Published on Oct 1, 2015in European Journal of Human Genetics3.65
S DoveEdward15
Estimated H-index: 15
(McGill University),
Yann Joly20
Estimated H-index: 20
(McGill University)
+ 31 AuthorsJane Kaye31
Estimated H-index: 31
The biggest challenge in twenty-first century data-intensive genomic science, is developing vast computer infrastructure and advanced software tools to perform comprehensive analyses of genomic data sets for biomedical research and clinical practice. Researchers are increasingly turning to cloud computing both as a solution to integrate data from genomics, systems biology and biomedical data mining and as an approach to analyze data to solve biomedical problems. Although cloud computing provides...
Published on Aug 31, 2015 in VLDB (Very Large Data Bases)
Alysson Neves Bessani21
Estimated H-index: 21
(University of Lisbon),
Jörgen Brandt3
Estimated H-index: 3
(Humboldt University of Berlin)
+ 14 AuthorsMahmoud Ismail3
Estimated H-index: 3
(KTH: Royal Institute of Technology)
Biobanks store and catalog human biological material that is increasingly being digitized using next-generation sequencing (NGS). There is, however, a computational bottleneck, as existing software systems are not scalable and secure enough to store and process the incoming wave of genomic data from NGS machines. In the BiobankCloud project, we are building a Hadoop-based platform for the secure storage, sharing, and parallel processing of genomic data. We extended Hadoop to include support for ...
Published on Aug 1, 2015in Journal of Biomedical Informatics2.95
Mete Akgün10
Estimated H-index: 10
A. Osman Bayrak1
Estimated H-index: 1
+ 1 AuthorsM. Şamil Sağıroğlu1
Estimated H-index: 1
Display Omitted We categorized pre-existing problems and corresponding solutions.We make our classifications in more understandable and convenient way.We have also included open privacy problems. Recently, the rapid advance in genome sequencing technology has led to production of huge amount of sensitive genomic data. However, a serious privacy challenge is confronted with increasing number of genetic tests as genomic data is the ultimate source of identity for humans. Lately, privacy threats an...
Published on Oct 1, 2013in Journal of Biomedical Informatics2.95
Aisling O’Driscoll6
Estimated H-index: 6
(CIT: Cork Institute of Technology),
Jurate Daugelaite2
Estimated H-index: 2
(CIT: Cork Institute of Technology),
Roy D. Sleator36
Estimated H-index: 36
(CIT: Cork Institute of Technology)
Graphical abstractDisplay Omitted Ever improving next generation sequencing technologies has led to an unprecedented proliferation of sequence data.Biology is now one of the fastest growing fields of big data science.Cloud computing and big data technologies can be used to deal with biology's big data sets.The Apache Hadoop project, which provides distributed and parallelised data processing are presented.Challenges associated with cloud computing and big data technologies in biology are discuss...
Published on Jan 18, 2013in Science41.04
Melissa Gymrek21
Estimated H-index: 21
Amy L. McGuire40
Estimated H-index: 40
(BCM: Baylor College of Medicine)
+ 2 AuthorsYaniv Erlich23
Estimated H-index: 23
(MIT: Massachusetts Institute of Technology)
Sharing sequencing data sets without identifiers has become a common practice in genomics. Here, we report that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. We show that a combination of a surname with other types of metadata, such as age and state, can be used to triangulate the identity of the target. A key feature of this technique is that it entirely relies on free, public...
Published on Jan 1, 2012 in NDSS (Network and Distributed System Security Symposium)
Yangyi Chen5
Estimated H-index: 5
(IU: Indiana University Bloomington),
Bo Peng2
Estimated H-index: 2
(IU: Indiana University Bloomington)
+ 1 AuthorsHaixu Tang38
Estimated H-index: 38
(IU: Indiana University)
An operation preceding most human DNA analyses is read mapping, which aligns millions of short sequences (called reads) to a reference genome. This step involves an enormous amount of computation (evaluating edit distances for millions upon billions of sequence pairs) and thus needs to be outsourced to low-cost commercial clouds. This asks for scalable techniques to protect sensitive DNA information, a demand that cannot be met by any existing techniques (e.g., homomorphic encryption, secure mul...
Cited By0
View next paperAn Efficient Incremental Clustering Method for Incremental Cloud Data