• 2022-05
  • 2022-04
  • 2021-03
  • 2020-08
  • 2020-07
  • 2020-03
  • 2019-11
  • 2019-10
  • 2019-09
  • 2019-08
  • 2019-07
  • br B RNA sequencing derived transcription levels


    (B) RNA-sequencing derived transcription levels (FPKM = Fragments Per Kilobase of transcript per Million mapped reads) of APOBEC family members with documented deaminase activity on DNA and preference to induce mutations at TCN context were examined in clones from color-coded cell lines, where RNA-sequencing data was generated (Table S2). Only those clones were considered where sufficient data was generated to accurately derive point estimate ex-pressions of examined genes (STAR Methods). Expression was standardized relative to TATA-binding protein (TBP). Top panel: Bradykinin of APOBEC genes in clones from four indicated cell lines. Horizontal bars indicate the median expression level. Bottom panels: Expression of APOBEC genes was compared to the total burden of SBS2 and SBS13 mutations acquired genome-wide in vitro, in daughter and granddaughter clones from indicated cell lines. Robust regression was applied to derive the best estimates for the slopes of the indicated signatures (black lines), 95% confidence intervals (gray shading) and indicated P values, all of which were above the Bonferroni threshold corresponding to significance at the 0.05 level, p = 0.002 (corresponding to 0.05/23, where 23 is the number of successful tests). In some cases, insufficient data points were generated for a statistical comparison (p = NA).
    (C) Each panel represents enrichment of genome-wide C>T and C>G mutations in indicated clones, at SBS2 and SBS13-specific sequence contexts (TCN, TCA) and at motifs associated with APOBEC3A or APOBEC3B-indeced mutagenesis (YTCN/YTCA and RTCN/RTCA, respectively). N is any base, R is any purine and Y any pyrimidine base. A and B are parent clones, others are daughter and granddaughter clones from the related lineages.
    (legend on next page)
    Figure S5. Significant Relationships between Somatic Retrotransposition and Mutational Signatures in Cell Lines and Primary Cancers, Related to Figures 3 and 4
    (A and B) The upper plots in both panels show the dependence of the observed numbers of mutations assigned to the indicated signatures (dots), and fitted values (lines) estimated using the GLMM Poisson regression model (STAR Methods), on the L1 insertion rate in cell line clones (panel A) and primary cancer samples (panel B). P values which fall below the Bonferroni thresholds corresponding to significance at the 0.05, 0.01, and 0.001 levels are indicated as *, ** and ***,
    respectively. The bottom plots show the estimated effects of cell line (panel A) or primary cancer (panel B) types on the slope of the regression line, in ranked order, against the normal quantiles. For each tumor type, the fitted value is accompanied by a 95% confidence interval. See Table S5 for cell line and primary cancer samples considered in analyses.
    (legend on next page)
    Figure S6. Signatures of False-Positive Somatic Mutations Are Present in DNA Prepared from Single Cells, Related to Figure 6
    (A) Top two panels: bars represent the percentage of base substitutions attributed to color-coded signatures in complete (rather than filtered, see Figure 6A) mutational catalogs from whole-genome sequenced stock cell lines from the denoted cancer classes (abbreviations in Table S2) and their single cells. The bottom panel represents the color-coded fractions of minor alleles at examined heterozygous SNP loci, in indicated single cells, which were (i) lost due to WGA-associated locus dropouts, (ii) lost due to WGA-associated allele dropouts or (iii) fall under the detection threshold for identification of base substitutions due to WGA-associated imbalanced amplification.
    (B) Spectra of mutations identified genome-wide in two exemplar stock cell lines (top panels) and in their corresponding single cells (bottom panels), genome-wide or within haploid regions at the indicated variant allele fractions (VAF). Each panel is displayed according to the 96-substitution classification on the hor-
    izontal axis defined by the six color-coded substitution types and sequence context immediately 50 and 30 to the mutated base. Order of the sequence context follows the standard alphabetical representation (see Figure 6B). Total number of base substitutions is indicated on the top of each panel. C>T variants at NCG contexts and T>C mutations at ATN contexts in stock cell lines largely represent germline variation due to the non-availability for most cancer cell lines of normal DNAs from the same individuals.
    (legend on next page)
    Figure S7. Variant Allele Fraction Distribution Plots for Cell Line Clones, Related to Figures 3–5
    (A and B) Distribution plots showing frequencies of the variant alleles fractions (VAFs) of mutations that remain after the filtering steps (STAR Methods) in indicated clones analyzed by whole-exome (panel A) or whole-genome sequencing (panel B). VAF peaks often deviate from 50%, expected for clonal heterozygous somatic mutations in a diploid genome, because cancer cell lines are often polyploid and heterozygous copy number changes across the genome can further modulate the distribution of the VAF. Bimodal distributions and subclonal peaks can arise from mixed effects of mutations being acquired on different copy number states of the genome and/or subclonally. Minor proportion of mutations presenting at 100% of the reads in some clones can reflect loss of heterozygosity at the loci of the newly acquired mutations or residual germline variants, mainly in parent clones that were compared against the unmatched normal human genome (STAR Methods).