Match!

Convolutional neural networks for classification of alignments of non-coding RNA sequences

Published on Jul 7, 2018 in ISMB (Intelligent Systems in Molecular Biology)
· DOI :10.1093/bioinformatics/bty228
Genta Aoki1
Estimated H-index: 1
(Keio: Keio University),
Yasubumi Sakakibara29
Estimated H-index: 29
(Keio: Keio University)
Abstract
Motivation: The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides has successfully derived position weight matrices on the learned kernels that corresponded to sequence motifs such as protein-binding sites. Results: We propose a novel application of CNNs to classification of pairwise alignments of sequences for accurate clustering of sequences and show the benefits of the CNN method of inputting pairwise alignments for clustering of non-coding RNA (ncRNA) sequences and for motif discovery. Classification of a pairwise alignment of two sequences into positive and negative classes corresponds to the clustering of the input sequences. After we combined the distributed representation of RNA nucleotides with the secondary-structure information specific to ncRNAs and furthermore with mapping profiles of next-generation sequence reads, the training of CNNs for classification of alignments of RNA sequences yielded accurate clustering in terms of ncRNA families and outperformed the existing clustering methods for ncRNA sequences. Several interesting sequence motifs and secondary-structure motifs known for the snoRNA family and specific to microRNA and tRNA families were identified.
Figures & Tables
  • References (22)
  • Citations (5)
References22
Newest
#1Milad Miladi (University of Freiburg)H-Index: 3
#2Alexander Junge (UCPH: University of Copenhagen)H-Index: 5
Last.Rolf Backofen (UCPH: University of Copenhagen)H-Index: 43
view all 7 authors...
#1David R. Kelley (Harvard University)H-Index: 19
#2Jasper Snoek (Harvard University)H-Index: 17
Last.John L. Rinn (Harvard University)H-Index: 71
view all 3 authors...
#1Haoyang Zeng (MIT: Massachusetts Institute of Technology)H-Index: 6
#2Matthew D. Edwards (MIT: Massachusetts Institute of Technology)H-Index: 8
Last.David K Gifford D K (MIT: Massachusetts Institute of Technology)H-Index: 59
view all 4 authors...
#1Kengo Sato (AIST: National Institute of Advanced Industrial Science and Technology)H-Index: 16
#2Yuki Kato (AIST: National Institute of Advanced Industrial Science and Technology)H-Index: 11
Last.Yasubumi Sakakibara (AIST: National Institute of Advanced Industrial Science and Technology)H-Index: 29
view all 5 authors...
Cited By5
Newest
#1Alexander Shein (HSE: National Research University – Higher School of Economics)
#2Anton Zaikin (HSE: National Research University – Higher School of Economics)
Last.Maria Poptsova (HSE: National Research University – Higher School of Economics)
view all 3 authors...
#1Hilal Tayara (CBNU: Chonbuk National University)H-Index: 5
#2Muhammad Tahir (CBNU: Chonbuk National University)H-Index: 1
Last.Kil To Chong (CBNU: Chonbuk National University)H-Index: 12
view all 3 authors...
#1Muhammad Tahir (CBNU: Chonbuk National University)H-Index: 1
#2Hilal Tayara (CBNU: Chonbuk National University)H-Index: 5
Last.Kil To Chong (CBNU: Chonbuk National University)H-Index: 12
view all 3 authors...
#1Hilal Tayara (CBNU: Chonbuk National University)H-Index: 5
#2Muhammad Tahir (CBNU: Chonbuk National University)H-Index: 1
Last.Kil To Chong (CBNU: Chonbuk National University)H-Index: 12
view all 3 authors...
#1Muhammad Tahir (CBNU: Chonbuk National University)H-Index: 1
#2Hilal Tayara (CBNU: Chonbuk National University)H-Index: 5
Last.Kil To Chong (CBNU: Chonbuk National University)H-Index: 12
view all 3 authors...
View next paperProtein Remote Homology Detection using Motifs made with Genetic Programming