br a total of tumor samples Haibe
a total of 116 tumor samples (Haibe-Kains et al., 2012), as well as immunohistochemistry (IHC) in an independent tu-mor collection of 78 patients (described in STAR Methods)
all confirmed a statistically significantly increased level of
ERBB2 in ER /HER2+ versus ER+/HER2+ tumors (Figure 5). This observation supports the notion that proteotypes poten-tially reveal finer graded classification than provided by con-ventional subtyping.
Analysis of Global Correlation between Proteins and Transcripts
To see how our protein-level data correlate with transcript-level data globally, we compared our comprehensive SWATH-MS da-taset against the five microarray datasets of 883 patients mentioned above (Haibe-Kains et al., 2012; see Data S4 for de-tails). We performed 475,755 individual comparisons of overlaps of differentially abundant proteins (FDR-adj. p < 0.05) versus their cognate transcripts (with the same trend) for 2,782 match-ing transcript-protein pairs between patient groups with different subtype, ER status, HER2 status, tumor grade, and lymph node status (Data S4). Overall, 6% of protein-level observations and 7%–15% of transcript-level observations (depending on the set of patients) exhibited statistically significant changes (Data S4B). Of these, 13%–28% of differentially abundant proteins also showed a statistically significant change with the same di-rection on the transcript level. From the reverse perspective, 9%–18% of significantly regulated transcripts showed a signifi-cant change with the same trend also on protein level. The global correlation coefficients for fold changes between transcripts and proteins ranged from R = 0.17 to R = 0.29, depending on the dataset (Figure 6A). In contrast, the correlation for the three key proteins from the decision tree was very high, with correla-tion coefficients from R = 0.67 to R = 0.81 (Figure 6B). A decision tree constructed from the five independent transcriptomics datasets using Artesunate data for 1,036 genes resulted in a tree with three nodes and similar structure (Figure S3). Taken together, although high correlation of protein and transcript levels was observed for the key proteins INPP4B, CDK1, and ERBB2, correlation and overlap of differentially expressed pro-teins and transcripts on a global scale was rather low, indicating the importance of protein-level measurements to study breast cancer biology.
High-Throughput Proteotyping by SWATH-MS as a Next-Generation Approach for Cancer Classification
The currently used classification of breast cancer tissues primar-ily relies on semiquantitative IHC, which is based on manual evaluation of antibody-stained tissue sections by a pathologist. Transcript-level approaches have been used for expression profiling of breast-cancer-associated genes and classification; however, as confirmed by our data, gene expression does not generally reflect levels of proteins. Protein-level quantification, although technically more difficult, is hence expected to provide the most relevant information. In this study, we employed a recently established massively parallel targeted proteomics technique, SWATH-MS, for the classification of human breast cancer tissues. The technique generally requires no more than 1 or 2 mg of total peptide sample and is capable of analyzing tissue samples obtained by needle biopsy (Guo et al., 2015). Moreover, it has good quantitative accuracy with high specificity due to targeted MS/MS data extraction (Gillet et al., 2012), low cost per run, and relatively high sample throughput, enabling the analysis of 10–24 samples per day. The hereby established proteotypes mostly recapitulated the five conventional sub-types, confirming the general applicability of proteotyping for the identification of cancer subtypes. The inconsistencies be-tween the proteotype-based and conventional classification might reflect further breast cancer subtypes (Prat et al., 2015), which could, for example, arise from additional genetic muta-tions. This is well illustrated by the TP53 mutation status in our 96 tumor samples: although 50% of tumors with more aggres-sive subtypes (triple-negative, HER2-enriched, and luminal B HER2+) had mutations in TP53, less aggressive luminal B and luminal A subtypes included only 12.5% and 0.0% of TP53-mutated tumors, respectively (Data S1B). Proper classification of such additional mutational heterogeneity could help to improve diagnostics and treatment of breast cancer.
Advantages of SWATH-MS to Classify Breast Cancer Tumors
Several studies used proteomics approaches to classify breast cancer tissues (Lam et al., 2014), applying a range of methods, from surface-enhanced laser desorption-ionization time-of-flight (SELDI-TOF) MS (Bouchal et al., 2013; Brozkova et al., 2008) and stable isotope labelling with amino acids in cell culture (SILAC)-liquid chromatography-tandem mass spectrometry (LC-MS/MS) (Waldemarson et al., 2016) in breast cancer tumor samples to MS1-based, label-free quantification of secreted proteins in a cell-line panel (Pavlou et al., 2013). These studies confirm the utility of protein expression profiling for the identification of novel molecular markers to classify breast cancer. We previously analyzed the tumor samples of the 96 patients described in the