Journal Club

Omics Data is the Future of Medicine

Bailey Andrew

Feb 1, 2023

Cancer, from an omics perspective

Many ‘omics’: genomics, transcriptomics, proteomics, etc…

Cells divide and accrue mutations

Things can go wrong at the omics level

Why should you care?

  • Many diseases are caused by an omics issue
  • When so, other data is a proxy1
  • heterogenous tumors?: single-cell omics
  • Symptoms may be caused by omics issues
    • Proteomics enables better design of treatments?

1) Well, environmental factors can be a cause, but limited ability to affect them

Now that you care…

Hands-on Learning Experience™

https://www.ebi.ac.uk/gxa/sc/experiments/E-MTAB-8559/

Dataset Details - Where?

To build a living biobank, we established a biopsy pipeline, collecting samples from patients diagnosed with epithelial ovarian cancer treated at the Christie Hospital.

Nelson et al. (2020)

(Christie Hospital is in Manchester)

Dataset Details - Who?

Between May 2016 and June 2019, we collected 312 samples from patients with chemo-naïve and relapsed disease, either as solid biopsies or as ascites (Fig. 1a)

Ten patients had HGSOC while two had mucinous ovarian carcinoma. Longitudinal biopsies were collected from three patients.

Nelson et al. (2020)

Dataset Details - How?

The primer contains:

  • an Illumina TruSeq Read 1 (read 1 sequencing primer)

  • 16 nt 10x Barcode

  • 12 nt unique molecular identifier (UMI)

  • 30 nt poly(dT) sequence

Barcoded, full-length cDNA is amplified via PCR to generate sufficient mass for library construction.

Nelson et al. (2020)

Load the Data

file.path = './localdata/E-MTAB-8559-quantification-raw-files/'
raw.counts <- as(Matrix::readMM(
    paste(
        file.path,
        'E-MTAB-8559.aggregated_filtered_counts.mtx',
        sep=''
    )
), 'CsparseMatrix')
dim(raw.counts)
  1. 23284
  2. 19880

Load the Data

row.info <- read.table(
    paste(
        file.path,
        'E-MTAB-8559.aggregated_filtered_counts.mtx_rows',
        sep=''
    ),
    header=FALSE,
    col.names=c("Ensembl.ID", "Redundant")
)

# Drop duplicate field in row.info
row.info <- row.info['Ensembl.ID']
rownames(raw.counts) <- row.info$Ensembl.ID
       Ensembl.ID
1 ENSG00000000003
2 ENSG00000000419
3 ENSG00000000457
4 ENSG00000000460
5 ENSG00000000938
6 ENSG00000000971

Load the Data

col.info <- read.table(
    paste(
        file.path,
        'E-MTAB-8559.aggregated_filtered_counts.mtx_cols',
        sep=''
    ),
    header=FALSE,
    col.names=c('Cell.ID')
)
colnames(raw.counts) <- col.info$Cell.ID
                        Cell.ID
1 SAMEA6492740-AAACCCACAGTTAGGG
2 SAMEA6492740-AAACCCACATGTGTCA
3 SAMEA6492740-AAACCCAGTCGCATGC
4 SAMEA6492740-AAACCCAGTCTTTCAT
5 SAMEA6492740-AAACCCATCCGTGTCT
6 SAMEA6492740-AAACCCATCCTCTCTT

Load the Metadata

# Load in the experimental design matrix
exp.design.table <- read.table(
    './localdata/ExpDesign-E-MTAB-8559.tsv',
    header=TRUE,
    sep='\t'
)

# We can see we have four patients in our dataset
print(unique(exp.design.table$Sample.Characteristic.individual))
[1] "38b"  "59"   "74-1" "79"  

# For reproducibility
set.seed(0)

# Convenient data container
library(SingleCellExperiment)

# Libraries for scRNA analysis
library(scran)
library(scater)

# Library for cluster analysis
library(bluster)

# Store as a SingleCellExperiment object for convenience
ovarian.sce <- SingleCellExperiment(
    assays=list(counts=raw.counts),
)

# Add patient metadata
ovarian.sce$patient <- exp.design.table$Sample.Characteristic.individual.

# Take a peak at internal data structure
ovarian.sce
class: SingleCellExperiment 
dim: 23284 19880 
metadata(0):
assays(1): counts
rownames(23284): ENSG00000000003 ENSG00000000419 ... ENSG00000289701
  ENSG00000289716
rowData names(0):
colnames(19880): SAMEA6492740-AAACCCACAGTTAGGG
  SAMEA6492740-AAACCCACATGTGTCA ... SAMEA6492743-TTTGTTGGTCCTGGTG
  SAMEA6492743-TTTGTTGTCAGATTGC
colData names(1): patient
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):

Why So Much Preprocessing?

ELI5: Wet-lab work is hard

ovarian.sce <- scuttle::logNormCounts(ovarian.sce)
ovarian.sce
class: SingleCellExperiment 
dim: 23284 19880 
metadata(0):
assays(2): counts logcounts
rownames(23284): ENSG00000000003 ENSG00000000419 ... ENSG00000289701
  ENSG00000289716
rowData names(0):
colnames(19880): SAMEA6492740-AAACCCACAGTTAGGG
  SAMEA6492740-AAACCCACATGTGTCA ... SAMEA6492743-TTTGTTGGTCCTGGTG
  SAMEA6492743-TTTGTTGTCAGATTGC
colData names(2): patient sizeFactor
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):

ovarian.sce <- scran::fixedPCA(
    ovarian.sce,
    rank=14,
    subset.row=NULL,
    assay.type="logcounts"
)


ovarian.sce <- scater::runTSNE(
    ovarian.sce,
    dimred="PCA"
)

plotReducedDim(ovarian.sce, dimred="TSNE", colour_by="patient")

Identifying stromals

ovarian.sce$tumor.stromal.clusters <- scran::clusterCells(
    ovarian.sce,
    use.dimred="TSNE",
    BLUSPARAM=bluster::KmeansParam(centers=5)
)


# We'll assign clusters by eye since it's obvious
ovarian.sce$tumor.or.stromal <- "Stromal"
ovarian.sce$tumor.or.stromal[ovarian.sce$tumor.stromal.clusters==1] <- "Tumor.1"
ovarian.sce$tumor.or.stromal[ovarian.sce$tumor.stromal.clusters==2] <- "Tumor.2"
ovarian.sce$tumor.or.stromal[ovarian.sce$tumor.stromal.clusters==4] <- "Tumor.3"
ovarian.sce$tumor.or.stromal[ovarian.sce$tumor.stromal.clusters==5] <- "Tumor.4"

plotReducedDim(
    ovarian.sce,
    "TSNE",
    colour_by="tumor.stromal.clusters",
    text_by="tumor.or.stromal",
    shape_by="patient"
)

marker.info <- scran::scoreMarkers(
    ovarian.sce,
    ovarian.sce$tumor.stromal.clusters
)
cluster <- "3" # the stromal cluster
diff.exp.gene <- rownames(ovarian.sce)[
    order(marker.info[[cluster]]$mean.AUC, decreasing=TRUE)[[1]]
]
print(diff.exp.gene)
[1] "ENSG00000106366"
plotReducedDim(
    ovarian.sce,
    "TSNE",
    colour_by=diff.exp.gene,
    shape_by="patient",
    text_by="tumor.or.stromal",
    text_colour="cyan"
) +
    ggtitle("Ovarian Cancer Cells by SERPINE1 (ENSG00000106366) Expression") +
    theme(
        panel.background = element_rect(fill = "darkblue"),
        plot.background = element_rect(fill = "cyan")
    )

SERPINE1

PAI-1, the protein encoded by SERPINE1, is related to cancer!

  • Form hypotheses to guide future experiments:
    • Can SERPINE1 be a diagnostic factor?
    • Is it a cause or symptom of cancer?
    • Is it important for tumor health?
    • Does it affect patient wellbeing or outcomes?
    • Can drugs be designed to target it?

Try It Yourself!

  • Pick an interesting dataset
    • Download it
    • Read the paper
  • Read the Bioconductor scRNA ebooks!!!
    • Perhaps the best educational resource I’ve used for anything
    • Practical and conceptual

  • Recreate their preprocessing steps
  • Pick a figure in the paper and recreate it
    • We recreated Figure 5a from Nelson et al. (2020).

The End

References

Nelson, Louisa, Anthony Tighe, Anya Golder, Samantha Littler, Bjorn Bakker, Daniela Moralli, Syed Murtuza Baker, et al. 2020. “A Living Biobank of Ovarian Cancer Ex Vivo Models Reveals Profound Mitotic Heterogeneity.” Nature Communications 11 (1): 822. https://doi.org/10.1038/s41467-020-14551-2.