Processing Oxford Nanopore Long Reads Using Amazon Web Services

Main Article Content

V.V. Shapovalova
S.P. Radko
K.G. Ptitsyn
G.S. Krasnov
K.V. Nakhod
O.S. Konash
M.A. Vinogradina
E.A. Ponomarenko
D.S. Druzhilovskiy
A.V. Lisitsa

Abstract

Studies of genomes and transcriptomes are performed using sequencers that read the sequence of nucleotide residues of genomic DNA, RNA, or complementary DNA (cDNA). The analysis consists of an experimental part (obtaining primary data) and bioinformatic processing of primary data. The bioinformatics part is performed with different sets of input parameters. The selection of the optimal values of the parameters, as a rule, requires significant computing power. The article describes a protocol for processing transcriptome data by virtual computers provided by the cloud platform Amazon Web Services (AWS) using the example of the recently emerging technology of long DNA and RNA sequences (Oxford Nanopore Technology). As a result, a virtual machine and instructions for its use have been developed, thus allowing a wide range of molecular biologists to independently process the results obtained using the "Oxford nanopore".

Article Details

How to Cite
Shapovalova, V., Radko, S., Ptitsyn, K., Krasnov, G., Nakhod, K., Konash, O., Vinogradina, M., Ponomarenko, E., Druzhilovskiy, D., & Lisitsa, A. (2020). Processing Oxford Nanopore Long Reads Using Amazon Web Services. Biomedical Chemistry: Research and Methods, 3(4), e00131. https://doi.org/10.18097/BMCRM00131
Section
PROTOCOLS OF EXPERIMENTS, USEFUL MODELS, PROGRAMS AND SERVICES

References

  1. Van der Auwera, G. A., O’Connor, B. D. (2020) Genomic in the Cloud: Using Docker, GATK, and WDL in Terra.
  2. Tyanova, S., Temu, T., Cox, J. (2016) The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protoc, 11(12), 2301–19. DOI
  3. Forsberg, E. M., Huan, T., Rinehart, D., Benton, H. P., Warth, B., Hilmers, B., Siuzdak, G. (2018) Data processing, multi-omic pathway mapping, and metabolite activity analysis using XCMS Online. Nat. Protoc, 13(4), 633–51. DOI
  4. Li, B., Dewey, C. N. (2011) RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics, 12, 323(2011) DOI
  5. Langmead, B., Trapnell, C., Pop, M., Salzberg, S. L. (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol., 10, R25(2009). DOI
  6. Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., Lesin, V. M., Nikolenko, S. I., Pham, S., Prjibelski, A. D., Pyshkin, A. V., Sirotkin, A. V., Vyahhi, N., Tesler, G., Alekseyev, M. A., Pevzner, P. A. (2012) SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol., 19(5), 455–77. DOI
  7. Direct RNA Sequencing. Oxford Nanopore Technologies. Retrieved September 1, 2020, from: https://store.nanoporetech.com/media/wysiwyg/pdfs/SQK-RNA002/Direct_RNA_sequencing_SQK-RNA002_-minion.pdf
  8. Ilgisonis, E., Lisitsa, A., Kudryavtseva, V., Ponomarenko, E. (2018) Creation of Individual Scientific Concept-Centered Semantic Maps Based on Automated Text-Mining Analysis of PubMed. Adv Bioinformatics, 2018, 4625394. DOI
  9. Boža, V., Perešíni, P., Brejová, B., Vinař, T. (2020) DeepNano-blitz: a fast base caller for MinION nanopore sequencers. Bioinformatics, 36(14), 4191–4192. DOI
  10. Makałowski, W., Shabardina, V. (2020) Bioinformatics of nanopore sequencing. J. Hum. Genet., 65, 61–67. DOI
  11. Wick, R. R., Judd, L. M., Holt, K. E. (2019) Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol, 20, 129(2019). DOI
  12. Lanfear, R., Schalamun, M., Kainer, D., Wang, W., Schwessinger, B. (2019) MinIONQC: Fast and simple quality control for MinION sequencing data. Bioinformatics, 35(3), 523–525. DOI
  13. Li, H. (2018) Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics, 34(18), 3094–3100. DOI
  14. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16), 2078–2079. DOI
  15. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A., Kingsford, C. (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods, 14(4), 417–419. DOI
  16. Soneson, C., Yao, Y., Bratus-Neuenschwander, A., Patrignani, A., Robinson, M. D., Hussain, S. (2019) A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes. Nat Commun, 10, 3359(2019). DOI
  17. Workman, R. E., Tang, A. D., Tang, P. S., Jain, M., Tyson, J. R., Razaghi, R. et al. (2019) Nanopore native RNA sequencing of a human poly(A) transcriptome. Nat Methods, 16(12), 1297–1305. DOI
  18. Zhang, P., Hung, L. H., Lloyd, W., Yeung, K. Y. (2018) Hot-starting software containers for STAR aligner. Gigascience, 7(8), giy092. DOI
  19. Pratt, B., Howbert, J. J., Tasman, N. I., Nilsson, E. J. (2012) Mr-Tandem: Parallel x!Tandem using Hadoop MapReduce on Amazon web services. Bioinformatics, 28(1), 136–137. DOI
  20. Data files produced by the GENCODE project. Retrieved September 1, 2020, from: ftp://ftp.ebi.ac.uk/pub/databases/gencode/_README.TXT
  21. Salmon Output File Formats. Retrieved September 1, 2020, from: https://salmon.readthedocs.io/en/latest/file_formats.html#fileformats