De novo sequencing of proteins and peptides: algorithms, applications, perspectives

  • K.V. Vyatkina Saint Petersburg National Research Academic University of the Russian Academy of Sciences; 8/3 Khlopina st., St Petersburg 194021, Russia; Saint Petersburg State University; 7-9 Universitetskaya nab., St Petersburg, 199034 Russia
Keywords: mass spectrometry; de novo sequencing; proteins; peptides; amino acid sequence

Abstract

Determination of the primary structure of proteins and peptides constitutes an important step in studying their properties. Currently, mass spectrometry is commonly applied to this end. The results of mass spectrometric measurements can be interpreted by means of either database search or de novo sequencing methods. The appeal of the latter is due to their applicability to investigating unknown proteins, as well as the ones that cannot be analyzed with genomics or transcriptomics methods. In this paper we briefly review the existing approaches to de novo sequencing of proteins and peptides, along with the problems that can be solved using those, and indicate directions and perspectives for their further development.

References

  1. Edman P. (1949) A method for the determination of amino acid sequence in peptides. Arch. Biochem., 22(3):475-476.

  2. Edman P. (1950) Method for determination of the amino acid sequence in peptides. Acta Chem. Scand., 4:283-293.

  3. Eng J. K., McCormack A. L., Yates J. R. (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database . J. Am. Soc. Mass Spectrom., 5(11):976-989. DOI

  4. Perkins D. N., Pappin D. J. C., Creasy D. M., Cottrell J. S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 20(18):3551-3567. DOI

  5. Kim S., Gupta N., Pevzner P. A. (2008) Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J. Proteome Res., 7 (8):3354-3363. DOI

  6. Kim S., Pevzner P. A. (2014) MS-GF+ makes progress towards a universal database search tool for proteomics. Nat. Commun., 5: 5277. DOI

  7. Cox J., Neuhauser N., Michalski A., Scheltema R. A., Olsen J. V., Mann M. (2011) Andromeda: A peptide search engine integrated into the MaxQuant environment. J. Proteome Res., 10 (4):1794-1805. DOI

  8. LeDuc R. D., Taylor G. K., Kim Y. B., Januszyk T. E., Bynum L. H., Sola J. V., Garavelli J. S., Kelleher N. L. (2004) ProSight PTM: an integrated environment for protein identification and characterization by top-down mass spectrometry. Nucleic Acids Res., 32(Web Server issue):W340-W345. DOI

  9. Zamdborg L., LeDuc R. D., Glowacz K. J., Kim Y. B., Viswanathan V., Spaulding I. T., Early B. P., Bluhm E. J., Babai S., Kelleher N. L. (2007) ProSight PTM 2.0: improved protein identification and characterization for top down mass spectrometry. Nucleic Acids Res., 35(Web Server issue):W701-W706. DOI

  10. Liu X., Sirotkin Y., Shen Y., Anderson G., Tsai Y. S., Ting Y. S., Goodlett D. R., Smith R. D., Bafna V., Pevzner P. A. (2012) Protein identification using top-down spectra. Mol. Cell Proteomics, 11(6):M111.008524. DOI

  11. Kou Q., Xun L., Liu X. (2016) TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization. Bioinformatics, 2(22):3495-3497. DOI

  12. Mann M., Wilm M. (1994) Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem., 66 (24):4390–4399. DOI

  13. Taylor J. A., Johnson R. S. (2011) Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Anal. Chem., 73(11):2594-2604. DOI

  14. Tabb D. L., Saraf A., Yates J. R. (2003) GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. Anal. Chem., 75(23):6415–6421. DOI

  15. Sunyaev S., Liska A. J., Golod A., Shevchenko A., Shevchenko A. (2003) MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. Anal. Chem., 75(6):1307-1315. DOI

  16. Searle B. C., Dasari S., Turner M., Reddy A. P., Choi D., Wilmarth P. A., McCormack A. L., David L. L., Nagalla S. R. (2004) High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results Anal. Chem., 76(8):2220–2230. DOI

  17. Savitski M. M., Nielsen M. L., Zubarev R. A. (2005) New data base-independent, sequence tag-based scoring of peptide MS/MS data validates Mowse scores, recovers below threshold data, singles out modified peptides, and assesses the quality of MS/MS techniques. Mol Cell. Proteomics, 4(8):1180-1188. DOI

  18. Frank A., Tanner S., Bafna V., Pevzner P. (2005) Peptide sequence tags for fast database search in mass-spectrometry. J. Proteome Res., 4(4):1287–1295. DOI

  19. Cao X., Nesvizhskii A. I. (2008) Improved sequence tag generation method for peptide identification in tandem mass spectrometry. J. Proteome Res., 7(10):4422–4434. DOI

  20. Na S., Jeong J., Park H., Lee K. J., Paek E. (2008) Unrestrictive identification of multiple post-translational modifications from tandem mass spectrometry using an error-tolerant algorithm based on an extended sequence tag approach. Mol. Cell Proteomics., 7(12):2452-2463. DOI

  21. Shen Y., Tolic N., Hixson K. K., Purvine S. O., Anderson G. A., Smith R. D. (2008) De novo sequencing of unique sequence tags for discovery of post-translational modifications of proteins. Anal. Chem., 8 (20):7742–7754. DOI

  22. Tabb D. L., Ma Z.-Q., Martin D. B., Ham A.-J. L., Chambers M. C. (2008) DirecTag: Accurate sequence tags from peptide MS/MS through statistical scoring. J. Proteome Res., 7(9):3838–3846. DOI

  23. Pan C., Park B. H., McDonald W. H., Carey P. A., Banfield J. F., VerBerkmoes N. C., Hettich R. L., Samatova N. F. (2010) A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry. BMC Bioinformatics, 11:118. DOI

  24. Liu W. T., Kersten R. D., Yang Y. L., Moore B. S., Dorrestein P. C. (2011) Imaging mass spectrometry and genome mining via short sequence tagging identified the anti-infective agent arylomycin in Streptomyces roseosporus. J. Am. Chem, Soc., 133(45):18010-18013. DOI

  25. Kersten R. D., Yang Y. L., Xu Y., Cimermancic P., Nam S. J., Fenical W., Fischbach M. A., Moore B. S., Dorrestein P. C. (2011) Natural product peptidogenomics: A mass spectrometry-guided genome mining approach. Nat. Chem. Biol. 7(11):794-802. DOI

  26. Taylor J. A., Johnson R. S. (1997) Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom.,11(9):1067-75. DOI

  27. Bartels C. (1990) Fast algorithm for peptide sequencing by mass spectroscopy. Biol. Mass Spectrom., 19:363–368. DOI

  28. Ma B., Zhang K., Hendrie C., Liang C., Li M., Doherty-Kirby A., Lajoie G. (2003) PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17(20):2337-2342. DOI

  29. Frank A., Pevzner P. (2005) PepNovo: De novo peptide sequencing via probabilistic network modeling. Anal. Chem. 77(4):964-73. DOI

  30. Vyatkina K., Wu S., Dekker L. J. M., VanDuijn M. M., Liu X., Tolic N., Dvorkin M., Alexandrova S., Luider T. M., Pasa-Tolic L., Pevzner P. A. (2015) De novo sequencing of peptides from top-down tandem mass spectra. J. Proteome Res. 14(11):4450-62. DOI

  31. Vyatkina K., Wu S., Dekker L. J. M., VanDuijn M. M., Liu X., Tolic N., Luider T. M., Pasa-Tolic L., Pevzner P. A. (2016) Top-down analysis of protein samples by de novo sequencing techniques. Bioinformatics, 32(18):2753-2759. DOI

  32. Vyatkina K. (2017) De novo sequencing of top-down tandem mass spectra: A next step towards retrieving a complete protein sequence. Proteomes, 5(1): 6. DOI

  33. Vyatkina K., Dekker L. J. M., Wu S., VanDuijn M. M., Liu X., Tolic N., Luider T. M., Pasa-Tolic L. (2017) De novo sequencing of peptides from high-resolution bottom-up tandem mass spectra using top-down intended methods. Proteomics, 17(23-24). DOI

  34. Ma B. (2015) Novor: Real-time peptide de novo sequencing software. J. Am. Soc. Mass Spectrom. 26(11):1885-1894. DOI

  35. Elias J. E., Gygi S. P. (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods, 4(3):207-214. DOI

  36. Artemenko K.A., Samgina T.YU., Lebedev A.T. (2006) Mass-spektrometricheskoe de novo sekvenirovanie peptidov. Mass-spektrometriya, 3(4):225-254.

  37. Lebedev A.T., Artemenko K.A., Samgina T.YU. (2012) Osnovy mass-spektrometrii belkov i peptidov, M.: Tekhnosfera, 176 s.

  38. Lebedev A.T, Artemenko K.A., Samgina T. (2015) Mass-spektrometriya v organicheskoj himii (2-e izd.), M.: Tekhnosfera, 704 s.

  39. Taylor J. A., Johnson R. S. (2001) Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Anal. Chem., 73(11):2594-2604. DOI

  40. Dancik V., Addona T. A., Clauser K. R., Vath J. E., Pevzner P. A. (1999) De novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 6(3-4):327-42. DOI

  41. Frank A. M., Savitski M. M., Nielsen M. L., Zubarev R. A., Pevzner P. A. (2007) De novo peptide sequencing and identification with precision mass spectrometry. J. Proteome Res., 6(1):114-123. DOI

  42. Frank A. M. (2009) A ranking-based scoring function for peptide-spectrum matches. J. Proteome Res., 8(5):2241-2252. DOI

  43. Frank A. M. (2009) Predicting intensity ranks of peptide fragment ions. J. Proteome Res., 8(5): 2226-2240. DOI

  44. Fischer B., Roth V., Roos F., Grossmann J., Baginsky S., Widmayer P., Gruissem W., Buhmann J. M. (2005) NovoHMM: a hidden Markov model for de novo peptide sequencing. Anal Chem., 77(22):7265-7273. DOI

  45. Chi H., Sun R. X., Yang B., Song C. Q., Wang L. H., Liu C., Fu Y., Yuan Z. F., Wang H. P., He S. M., Dong M. Q. (2010) pNovo: De novo peptide sequencing and identification using HCD spectra. J. Proteome Res., 9(5):2713-2724. DOI

  46. Jeong K., Kim S., Pevzner P. A. (2013) UniNovo: a universal tool for de novo peptide sequencing. UniNovo: a universal tool for de novo peptide sequencing. Bioinformatics, 29(16):1953-1962.DOI

  47. Olsen J. V., Macek B., Lange O., Makarov A., Horning S., Mann M. (2007) Higher-energy C-trap dissociation for peptide modification analysis. Nat. Methods, 4(9):709-712. DOI

  48. Syka J. E., Coon J. J., Schroeder M. J., Shabanowitz J., Hunt D. F. (2004) Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proc. Natl. Acad. Sci. USA, 101(26):9528-33. DOI

  49. Zubarev R. A., Kelleher N. L., McLafferty, F. W. (1998) Electron capture dissociation of multiply charged protein cations. A nonergodic process. J. Am. Chem. Soc., 120(13):3265–3266. DOI

  50. Frese C. K., Altelaar A. F., van den Toorn H., Nolting D., Griep-Raming J., Heck A. J., Mohammed S. (2012) Toward full peptide sequence coverage by dual fragmentation combining electron-transfer and higher-energy collision dissociation tandem mass spectrometry. Anal. Chem., 84(22):9668-9673. DOI

  51. Madsen J. A., Boutz D. R., Brodbelt J. S. (2010) Ultrafast ultraviolet photodissociation at 193 nm and its applicability to proteomic workflows. J. Proteome Res., 9(8):4205-4214. DOI

  52. Robotham S. A., Horton A. P., Cannon J. R., Cotham V. C., Marcotte E. M., Brodbelt J. S. (2016) UVnovo: A de novo sequencing algorithm using single series of fragment ions via chromophore tagging and 351 nm ultraviolet photodissociation mass spectrometry. Anal. Chem., 88(7):3990–3997. DOI

  53. Chi H., Chen H., He K., Wu L., Yang B., Sun R. X., Liu J., Zeng W. F., Song C. Q., He S. M., Dong M. Q. (2013) pNovo+: De novo peptide sequencing using complementary HCD and ETD tandem mass spectra. J. Proteome Res., 12(2):615-625. DOI

  54. He L., Ma B. (2010) ADEPTS: advanced peptide de novo sequencing with a pair of tandem mass spectra. J. Bioinform. Comput. Biol., 8(6):981-994. DOI

  55. Savitski M. M., Nielsen M. L., Zubarev R. A. (2005) New data base-independent, sequence tag-based scoring of peptide MS/MS data validates Mowse scores, recovers below threshold data, singles out modified peptides, and assesses the quality of MS/MS techniques. Mol. Cell Proteomics, 4(8):1180-1188. DOI

  56. Savitski M. M., Nielsen M. L., Kjeldsen F., Zubarev R. A. (2005) Proteomics-grade de novo sequencing approach. J. Proteome Res., 4(6):2348-2354. DOI

  57. Bertsch A., Leinenbach A., Pervukhin A., Lubeck M., Hartmer R., Baessmann C., Elnakady Y. A., Muller R., Bocker S., Huber C. G., Kohlbacher O. (2009) De novo peptide sequencing by tandem MS using complementary CID and electron transfer dissociation. Electrophoresis, 30(21):3736-47. DOI

  58. Datta R., Bern M. (2009) Spectrum fusion: using multiple mass spectra for de novo peptide sequencing. J. Comput. Biol., 16(8):1169-1182. DOI

  59. Guthals A., Clauser K. R., Frank A. M., Bandeira N. (2013) Sequencing-grade de novo analysis of MS/MS Triplets (CID/HCD/ETD) from overlapping peptides. J. Proteome Res., 12(6):2846-2857. DOI

  60. Horton A. P., Robotham S. A., Cannon J. R., Holden D. D., Marcotte E. M., Brodbelt J. S. (2017) Comprehensive de novo peptide sequencing from MS/MS pairs generated through complementary collision induced dissociation and 351 nm ultraviolet photodissociation. Anal. Chem., 89 (6):3747-3753. DOI

  61. Bandeira N., Tang H., Bafna V., Pevzner P. (2004) Shotgun protein sequencing by tandem mass spectra assembly. Anal Chem., 76(24):7221-7233. DOI

  62. Bandeira N., Clauser K. R., Pevzner P. A. (2007) Shotgun protein sequencing: assembly of peptide tandem mass spectra from mixtures of modified proteins. Mol. Cell Proteomics, 6(7):1123-1134. DOI

  63. Bandeira N., Pham V., Pevzner P., Arnott D., Lill J. R. (2008) Automated de novo protein sequencing of monoclonal antibodies. Nat. Biotechnol., 26(12):1336-1338. DOI

  64. Castellana N. E., Pham V., Arnott D., Lill J. R., Bafna V. (2010) Template proteogenomics: sequencing whole proteins using an imperfect database. Mol. Cell Proteomics, 9(6):1260-1270. DOI

  65. Liu X., Han Y., Yuen D., Ma B. (2009) Automated protein (re)sequencing with MS/MS and a homologous database yields almost full coverage and accuracy. Bioinformatics, 25(17):2174-80. DOI

  66. Blank-Landeshammer B., Kollipara L., Bi? K., Pfenninger M., Malchow S., Shuvaev K., Zahedi R. P., Sickmann A. (2017) Combining de novo peptide sequencing algorithms, a synergistic approach to boost both identifications and confidence in bottom-up proteomics. J. Proteome Res., 16(9):3209-3218. DOI

  67. Yang H., Chi H., Zhou W.-J., Zeng W.-F., He K., Liu C., Sun R.-X., He S.-M. (2017) Open-pNovo: De novo peptide sequencing with thousands of protein modifications. J. Proteome Res., 16(2):645-654. DOI

  68. Creasy, D. M.; Cottrell, J. S. (2004) Unimod: Protein modifications for mass spectrometry. Proteomics, 4(6):1534-1536. DOI

  69. Gorshkov V., Hotta S. Y. K., Verano?Braga T., Kjeldsen F. (2016) Peptide de novo sequencing of mixture tandem mass spectra. Proteomics, 16(18):2470-2479. DOI

  70. Horn D. M., Zubarev R. A., McLafferty, F. W. (2000) Automated de novo sequencing of proteins by tandem high-resolution mass spectrometry. Proc. Natl. Acad. Sci. USA, 97(19):10313-10317. DOI

  71. Liu X., Dekker L. J. M., Wu S., VanDuijn M. M., Luider T. M., Tolic N., Kou Q., Dvorkin M., Alexandrova S., Vyatkina K., Pasa-Tolic L., Pevzner P. A. (2014) De novo protein sequencing by combining top-down and bottom-up tandem mass spectra. J. Proteome Res., 13(7):3241-3248. DOI

  72. Liu X., Inbar Y., Dorrestein P. C., Wynne C., Edwards N., Souda P., Whitelegge J. P., Bafna V., Pevzner P. A. (2010) Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach. Mol. Cell Proteomics, 9(12):2772-2782. DOI

  73. Ecker D. M., Jones S. D., Levine H. L. (2015) The therapeutic monoclonal antibody market. MAbs, 7(1):9-14. DOI

  74. Tran N. H., Rahman M. Z., He L., Xin L., Shan B., Li M. (2016) Complete de novo assembly of monoclonal antibody sequences. Sci. Rep., 6:31730. DOI

  75. Guthals A., Gan Y., Murray L., Chen Y., Stinson J., Nakamura G., Lill J. R., Sandova W., Bandeira N. (2017) De novo MS/MS sequencing of native human antibodies. J. Proteome Res., 16 (1):45-54. DOI

  76. Vonk F. J., Casewell N. R., Henkel C. V., Heimberg A. M., Jansen H. J., McCleary R. sJ., Kerkkamp H. M., Vos R. A., Guerreiro I., Calvete J. J., Wuster W., Woods A. E., Logan J. M., Harrison R. A., Castoe T. A., de Koning A. P., Pollock D. D., Yandell M., Calderon D., Renjifo C., Currier R. B., Salgado D., Pla D., Sanz L., Hyder A. S., Ribeiro J. M., Arntzen J. W., van den Thillart G. E., Boetzer M., Pirovano W., Dirks R-P., Spaink H. P., Duboule D., McGlinn E., Kini R. M., Richardson M. K. (2013) The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system. Proc. Natl. Acad. Sci. USA, 110:20651-20656. DOI

  77. Petras D., Heiss P., Harrison R. A., Sussmuth R. D., Calvete J. J. (2016) Top-down venomics of the East African green mamba, Dendroaspis angusticeps, and the black mamba, Dendroaspis polylepis, highlight the complexity of their toxin arsenals. J. Proteomics, 46:148-164. DOI

  78. Bhatia S., Kil Y. J., Ueberheide B., Chait B. T., Tayo L., Cruz L., Lu B., Yates III J. R., Bern M. (2012) Constrained de novo sequencing of conotoxins. J. Proteome Res., 11(8): 4191-4200. DOI

  79. Pukala T. L., Bowie J. H., Maselli V. M., Musgrave I. F., Tyler M. J. (2006) Host-defence peptides from the glandular secretions of amphibians: structure and activity. Nat. Prod. Rep., 23(3):368-393. DOI

  80. Samgina T. Yu., Artemenko K. A., Gorshkov V. A., Ogourtsov S. V., Zubarev R. A., Lebedev A. T. (2008) De novo sequencing of peptides secreted by the skin glands of the Caucasian Green Frog Rana ridibunda. Rapid Commun Mass Spectrom., 22(22):3517-3525. DOI

  81. Lebedev A., Samgina T. (2013) O chem mogut rasskazat' lyagushki? Izuchenie peptidnogo sostava kozhnogo sekreta amfibij. Analitika, 5(12):38-47.

  82. Simmaco M., Mignogna G., Barra D., Bossa F. (1994) Antimicrobial peptides from skin secretions of Rana esculenta. Molecular cloning of cDNAs encoding esculentin and brevinins and isolation of new active peptides. J. Biol. Chem., 269(16):11956-11961.

  83. Terterov I., Vyatkina K., Kononikhin A. S., Boitsov V., Vyazmin S., Popov I. A., Nikolaev E. N., Pevzner P., Dubina M. (2014) Application of de novo sequencing tools to study abiogenic peptide formations by tandem mass spectrometry. The case of homo?peptides from glutamic acid complicated by substitutions of hydrogen by sodium or potassium atoms. Rapid Commun Mass Spectrom., 28(1):33-41. DOI

  84. Robidart J., Callister S. J., Song P., Nicora C. D., Wheat C. G., Girguis P. R. (2013) Characterizing microbial community and geochemical dynamics at hydrothermal vents using osmotically driven continuous fluid samplers. Environ. Sci. Technol., 47(9):4399-4407. DOI

  85. Menschaert G., Vandekerckhove T. T., Baggerman G., Landuyt B., Sweedler J. V., Schoofs L., Luyten W., Van Criekinge W. (2010) A hybrid, de novo based, genome-wide database search approach applied to the sea urchin neuropeptidome. J. Proteome Res., 9(2):990-996. DOI

  86. Carrasco M. A., Buechler S. A., Arnold R. J., Sformo T., Barnes B. M., Duman J. G. (2011) Elucidating the biochemical overwintering adaptations of larval Cucujus clavipes puniceus, a nonmodel organism, via high throughput proteomics. J. Proteome Res., 10(10):4634-4646. DOI

  87. Laskay U.A., Srzentic K., Monod M., Tsybin Y.O. (2014) Extended bottom-up proteomics with secreted asparatic protease Sap9. J. Proteomics, 110:20-31. DOI

  88. Srzentic K., Fornelli L., Laskay U.A., Monod M., Beck A., Ayoub D., Tsybin Y.O. (2014) Advantages of extended bottom-up proteomics using Sap9 for analysis of monoclonal antibodies. Anal. Chem., 86(19):9945-9953. DOI

  89. Devabhaktuni A., Elias J. E. (2016) Application of de novo sequencing to large-scale complex proteomics data sets. J. Proteome Res., 15(3):732-742.DOI

  90. Yang H., Chi H., Zhou W.-J., Zeng W.-F., Liu C., Wang R.-M., Wang Z.-W., Niu X.-N., Chen Z.-L., He S.-M. (2018) pSite: Amino acid confidence evaluation for quality control of de novo peptide sequencing and modification site localization. J. Proteome Res., 17(1):119-128. DOI
Published
2018-04-12
Section
Reviews