Figure 1 Diagram of an experiment workflow allowing co-alignment of samples of proteolytic peptides of human blood plasma with and without spike-in synthetic peptides. The spike-in peptides were aligned with the natural peptides by the RTs and m/z of their MS1 features in order to identify proteins by high quality MS2 spectra of spike-in synthetic peptides. Key stages: (A) sample preparation; (B) LC-MS/MS analysis by high resolution mass spectrometer; (C) the samples alignment using Progenesis software; (D) search for protein identifications in Mascot software; mapping of MS2 synthetic peptide identifications to MS1 features of aligned samples and the identification of human blood plasma proteins corresponding to spike-in peptides in a sample free of these peptides.
Figure 2 An example of MS2 spectra corresponding to the same MS1 features of samples of mixtures of proteolytic peptides of human blood plasma (PP) co-aligned with samples of the same mixtures with spike-in synthetic peptides (SP). (A) An identified (for SP) MS2 spectrum of peptide TVESITDIR and an unidentified (for PP) MS2 spectrum corresponding to a MS1 feature with m/z 517.28. (B) An identified (for SP) MS2 spectrum of peptide NFSLFDLTTLIHPR and an unidentified (for PP) MS2 spectrum corresponding to a MS1 feature with m/z 837.45.
Figure 3SRM identification of the protein Coagulation factor X (P00742) in human blood plasma. Shown ion chromatograms of native and stable isotope-labeled synthetic (heavy) peptide ETYDFDIAVLR from the Skyline software.
Identification of Human Blood Plasma Proteins Using Spike-In Peptides in Shotgun Proteomics
Institute of Biomedical Chemistry, 10 Pogodinskaya str., Moscow, 119121 Russia;
Key words:mass spectrometry; protein identification; shotgun proteomics; spike-in peptides
Abbreviations: AMT, accurate mass and time tag; FA, formic acid; HCD, higher energy collisional dissociation; NaDC, sodium deoxycholate; RTs, retention times; SIS, stable isotope-labelled synthetic; SWATH-MS, sequential window acquisition of all theoretical fragment-ion spectra mass spectrometry
Mass spectrometry is the main technique used in proteomics for analysis of biological samples . Peptides obtained by proteolytic cleavage of a protein or a mixture of proteins are analyzed using the bottom-up approach. In the classical version of bottom-up proteomics, protein identification is carried out using PMF [2, 3] and MS/MS techniques based on the mass spectra of proteolytic peptides (MS1) or their fragments (MS2), respectively . PMF is mostly used for identifying purified proteins or simple protein mixtures .
MS/MS is used for the analysis of complex protein mixtures to identify as many proteins as possible in a sample (from several hundred  to several thousand  in a single experiment) in an approach known as shotgun proteomics. In this case, tandem mass spectra (MS2 spectra) obtained after multidimensional chromatographic separation of a mixture of proteolytic peptides and scans of precursor ions (MS1) and their fragments (MS2) are analyzed . Nevertheless, a significant part of a proteome in biological materials, including proteins present at low concentrations, remains inaccessible for MS/MS analysis . As a consequence, over 70% of all recorded MS2 spectra remain unidentified or contain false identifications . Such large-scale losses of information can be avoided by processing MS1 spectra only. This approach is much more sensitive but suffers from a high rate of false positive identifications [9, 11]. Nevertheless, over the past ten years different research groups developed mass spectrometry methods with and without MS/MS in parallel over the past ten years.
Methods based on the AMT (Accurate Mass and Time) tag strategy provide another example; unlike PMF, it can analyze more complex protein mixtures and identify proteins from MS1 spectra and retention times (RTs) of peptides, bypassing the fragmentation stage [15, 16]. However, the effectiveness of AMT methods is limited by: (1) the effect of MS2 data used in AMT tags databases for protein search , (2) the labour intensity of creating AMT tags databases, and (3) the dependence of chromatography data (in this case RTs) on the conditions of a particular experiment, which complicates the use of AMT tags databases across laboratories . Models for predicting theoretical RTs have been proposed at different times for normalizing experimental RTs, but none of them has become universally suitable for generating error-free RTs  or adapting to different experimental conditions [20, 21].
Another strategy called SWATH-MS (Sequential Window Acquisition of all Theoretical fragment-ion spectra Mass Spectrometry) combines the high throughput of shotgun proteomics with the accuracy and reproducibility of quantitative SRM analysis [12, 13]. The feasibility of SWATH-MS is achieved by cyclic sequential scanning of fragment ions (MS2) of all precursors in the range between 400 and 1200 m/z, allowing large datasets to be processed with high accuracy in a dynamic range of up to four orders of magnitude [12, 13]. The key drawbacks of SWATH-MS include interference of fragment ions and the need for creating specialized spectrum libraries, the number of identified proteins being dependent on the quality of the libraries [13, 14].
The general idea of the study is based on the observation that identifications of peptides can be archived using MS/MS spectra of the same feature obtained in other LC-MS runs after all runs alignment by the Progenesis software. Thus, the goal of this work was to identify of desired proteins in human blood plasma using 19 synthetic proteotypic peptides for guaranteed recording of their MS2 spectra. Mass spectrometry was performed by LC-MS/MS and the high-resolution MS data were processed using the Mascot and Progenesis LC-MS software. Identification of human blood plasma proteins was archived by assigning tandem mass spectra of spiked in peptides to the corresponding aligned chromatographic peaks of proteolytic peptides, which had MS2 spectra of spike-in synthetic peptides. Analysis of MS1 and MS2 data allowed to identify 19 proteins in human blood plasma, which corresponded to 19 spike-in synthetic peptides. SRM verification of the identifications with SIS standards confirmed the presence in the plasma of above 17 proteins.
MATERIALS AND METHODS
Formic acid (FA), thiourea and HPLC grade water were purchased from «Acros Organics» (USA). Acetonitrile (ACN) was from «Merck» (Germany); ammonium bicarbonate was from «Pierce» (USA); porcine trypsin modified (sequencing grade) was from «Promega» (USA).
Internal Standard Production
The SIS peptides desired were obtained using the solid-phase peptide synthesis on the Overture (Protein Technologies, USA) synthesizer according to the published method . The isotope-labelled leucine (13C615N), arginine (13C615N4) or lysine (13C615N2) were used for incorporation into peptide standards instead of the unlabeled leucine, arginine or lysine, respectively. Concentrations of the synthesized peptides were measured by the method of amino acids analysis with fluorescent signal detection of amino acids derived after acidic hydrolysis of peptides.
Venous blood samples were collected from 3 volunteers in disposable plastic tubes (10 mL) containing K3-EDTA as an anticoagulant, followed by centrifugation in an Armed CH80-2S centrifuge (Germany) at 3000 rpm for 10 min to obtain plasma and sediment blood cell were stored at –80°C. The plasma supernatant was filtered through 0.22 µm cellulose-acetate filters («Whatman», USA) aliquoted and stored at -80°C until the proteomic analysis. The protein concentration was determined using the Micro BCA protein assay («Thermo Scientific», USA).
Protein digestion was performed according to the protocol described in detail by Zgoda et al. .
Samples were analyzed using the UltiMate 3000 nano-flow HPLC system («Thermo Scientific») connected to an Orbitrap Q-Exactive HF mass spectrometer («Thermo Scientific»). Peptides separation was carried out on a Zorbax 300SB-C18 column, (150 mm × 75 µm, 3.5 µm particle size; «Agilent Technologies», USA), in a linear gradient from 3% to 35% of mobile phase B (80% ACN, 0.1% FA) over 70 min, then from 35% to 99% of mobile phase B over 5 min at 0.3 μl/min followed by a washing the column at 99% of mobile phase B for 10 min and post-analytical equilibration at 3% solvent B for 5 min.
Mass spectra were acquired in the positive ionization mode with a resolution of 70000 (at m/z 400) for MS and 15000 (m/z 400) for MS/MS scans. Survey MS scan was followed by MS/MS acquisition of top-ten most abundant precursors. Higher-energy collisional dissociation (HCD) at 28 CE was used to generate fragment ions. The signal threshold for precursor ions was set to 100000 and ions were isolated within 2 m/z window. Tandem mass spectra were acquired within a range of the fixed first mass (m/z = 130) and charge-dependent last mass. Singly charged ions and ion with undefined charge state were excluded from MS/MS triggering. Dynamic exclusion for 10 s was applied if precursor ion was targeted at least 3 times.
Chromatographic separation of peptides was carried out using the UPLC Agilent 1290 («Agilent Technologies») system composed of a micro-flow pump and an autosampler. Ten microliters of sample were loaded onto an Eclipse Plus SBC-18 column (2.1 × 100 mm, 1.8 um, 100 A; «Agilent Technologies»). Separation was performed in a linear gradient of mobile phase A (0.1% (v/v) FA) and mobile phase B (80% (v/v) HPLC grade ACN/water with 0.1% (v/v) FA). Peptides were loaded at starting conditions of 3% B and increasing to 32% of B over 50 min, then from 32% to 53% of B over 3 min followed by washing the column at 90% of B for 5 min). The column was equilibrated at the initial gradient condition for 5 min before the next sample run. Mass spectra were acquired on a G6495 triple quadrupole mass spectrometer («Agilent Technologies») equipped with the Jet Stream ionization source. The following parameters were used for the Agilent Jet Stream ionization source: the temperature of the drying gas - 280°C, the pressure in the nebulizer - 18 psi, the flow of the drying gas – 14 l/min, and the voltage on the capillary – 3000V.
Raw data files were processed in the Progenesis LC-MS software (version 4.1; «Nonlinear Dynamics Ltd.», UK). Protein identification was performed using the Mascot software (version 2.4.1). The following search parameters were used: database, UniProt KB (version 2012_11) limited to Homo sapiens taxonomy; cutting enzyme: trypsin; precursor ions tolerance ±10 ppm; fragment ions tolerance ± 0.05 Da; one permissible missed cleavage sites was allowed; fixed modifications: cysteine carbamidomethylation; variable modifications, oxidation of methionine. The criterion for positive identification was considered with a score>13; significance threshold, p < 0.05; FDR < 1%. For the SRM data analysis and quantification of the human blood plasma proteins, the Skyline software (version 3.7) was used .
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE  partner repository with the dataset identifier PXD007580 and 10.6019/PXD007580.
RESULTS AND DISCUSSION
Figure 1 demonstrates the schematic workflow of the designed experiment. In our study, we used 19 tryptic peptides which corresponded to 19 proteins (Table 1). According to Nextprot records, most of the chosen peptides and corresponded proteins have been identified in human plasma or serum [PMID: 27457493, 29101746].
At the first step of the workflow, 19 peptides were spiked in digested plasma samples. All samples were analyzed by LC-MS/MS in triplicates as described in the “Material and methods” section. The obtained raw files were processed using Progenesis. Progenesis LC-MS was used for sequential alignment of experimental runs of human blood plasma proteolytic peptide mixtures samples (PP); PP runs and runs of the same samples with spike-in synthetic peptides (SP). The MS/MS spectra of recorded features were uploaded into the Mascot search engine in the ‘*.mgf’ format, and protein identification searching was performed. After co-alignment of the MS (MS1) features found in PP and SP, the analysis of PP yielded proteolytic peptides/proteins identical with or corresponding to the spike-in peptides, whose MS1 features were associated with identified MS/MS spectra.
Totally 126 proteins were identified in PP runs only, and 184 proteins were identified in SP; 117 proteins were common for both samples. Among 184 proteins identified in SP, we found all 19 spiked-in peptides and all of them were assigned to the corresponding features (peptides) of the PP (Table 1).
As can be seen from Table 1, only one peptide (TGIVSGFGR,) from the list was identified in the sample without the synthetic peptides (PP). This can indicate that remaining 18 ions of these proteolytic peptides were either (1) not selected for HCD fragmentation; or (2) their MS2 spectra were not good enough to identify the peptide. Figure 2 shows that for the same MS1 features (peptides) of co-aligned PP and SP, both identified (for SP) spectra of peptides TVESITDIR (m/z 517.28) and NFSLFDLTTLIHPR (m/z 837.45) had high-intensity peaks and unidentified (for PP) MS2 spectra had low-intensity peaks.
Therefore, identification of human blood plasma proteins (peptides) using the corresponding synthetic peptides was based on data of accurate alignment by RTs, m/z in MS1 features, and the MS2 identifications of these spike-in peptides. As an example, results of co-alignment of MS1 features in SamplesPP and SamplesSP runs for the ion intensity map regions with matching RTs and m/z are shown in Figure 1 and Supplementary Materials.
Verification of protein identifications was carried out using a triple quadrupole mass spectrometer in the SRM mode. SRM is characterized by high sensitivity, reproducibility of measurements and a linear dynamic range of five to six orders of magnitude for quantitative analysis of proteins . This targeted approach requires development of a method and synthesis of isotopically labelled peptide standards specific to each protein . For the validation purpose, we synthesized SISs with the same sequences of peptides used for the identification of the proteins in LC-MS/MS experiments (see Table 1).
The detection and quantification of proteins by SRM were carried out in the same plasma samples and SIS peptides were added to the plasma samples. The data obtained were analyzed using the criteria, which had been previously determined for the SRM method: first, a complete coincidence of chromatographic profiles of endogenous and isotope-labeled standard; secondly, the presence of signals from all three fragmented transitions; the difference between the relative intensities of fragmented transitions for the endogenous peptide and isotope-labeled peptide did not exceed 25% .
The examples given in Supplementary Materials and in Figure 2 show SRM profiles of peptides accepted in the course of quality control and fit identification criteria. Figure 3 shows traces of positive identification of the protein Coagulation factor X (P00742) where natural and SIS peptides are matched perfectly.
As a result of SRM traces quality control, we have detected the signals for 17 of 19 desired proteins in the plasma samples (see Table 1). Quantitative estimations for 17 detected proteins were obtained by SRM in a dynamic range of 2.5 orders of magnitude from 10μg/ml (Coagulation factor X, P00742) to 50 ng/ml (Cytidine and dCMP deaminase domain-containing protein 1, Q9BWV3) using SIS peptides added to the plasma samples. Thus, we obtained evidence of the presence of 17 proteins among 19 proteins identified in LC-MS/MS experiments.
In summary, LC-MS/MS shotgun analysis of blood plasma proteolytic peptides mixtures that were spiked with synthetic peptides yields MS2 spectra of greater intensity for the most abundant ions of spike-in peptides relatively to proteolytic peptides, This allows the reliable identification of human blood plasma proteins.
COMPLIANCE WITH ETHICAL STANDARDS
All blood donors gave informed consent to participate in this study. Human-related procedures were performed according to the guidelines of the local ethical committee (Institute of Biomedical Problems of RAS).
We acknowledge the IBMC «Human Proteome» Core Facility for assistance with the generation of mass-spectrometry data.
This study was supported by the Program of the Presidium of the Russian Academy of Sciences («Proteomic and Metabolomic Profile of Healthy Human»).
Supplementary materials are available at http://dx.doi.org/10.18097/BMCRM00093