Loading Raw Data

To load scRNA-seq raw data:

library(Scillus)
library(tidyverse)
library(Seurat)
library(magrittr)

scRNA <- load_scfile(m)

Scillus will create Seurat object for each sample and automatically call PercentageFeatureSet() function to calculate the mitochondrial gene percentage. The resulting scRNA is a list of multiple Seurat objects. Its length equals to number of rows of metadata m:

length(scRNA)

[1] 6

Plotting QC Measures

The QC Measures can be plotted by plot_qc(). The resulting plot can be further customized (axis.title, theme, etc.) using ggplot grammar.

plot_qc(scRNA, metrics = "percent.mt")
Mitochondria gene percentage in each sample

Mitochondria gene percentage in each sample

plot_qc(scRNA, metrics = "nFeature_RNA")
Number of genes detected in each sample

Number of genes detected in each sample

plot_qc(scRNA, metrics = "nCount_RNA")
Number of UMIs in each sample

Number of UMIs in each sample

There are 3 optional parameters for plot_qc(): plot_type, group_by, and pal_setup.

The default value of plot_type is "combined", meaning both box plot and violin plot are drawn. If only one of the two plots is preferred, it can be set as "box" or "violin".

plot_qc(scRNA, metrics = "percent.mt", plot_type = "box")
Mitochondria gene percentage in each sample (Box plot)

Mitochondria gene percentage in each sample (Box plot)

"density" is used to draw density plot. Notice that additional ggplot grammar (log transformation here) can be added.

plot_qc(scRNA, metrics = "nCount_RNA", plot_type = "density") + scale_x_log10()
Number of UMIs in each sample (Density plot, log10 transformed)

Number of UMIs in each sample (Density plot, log10 transformed)

The default value of group_by is "sample", which corresponds to the sample column in the metadata m. Since additional metadata are included in the loading process, QC measures can also be plotted by those additional factors, like "group" (which corresponds to group column in the metadata m).

plot_qc(scRNA, metrics = "percent.mt", group_by = "group")
Number of UMIs in each group

Number of UMIs in each group

The parameter pal_setup supports three types of inputs: RColorBrewer palette name, palette setup dataframe (See Initial Setup, the last section), and manually specified color vector. The default value is the palette "Set2".

plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = "Accent")
Mitochondrial read percentage in each group (RColorBrewer palette name as palette input)

Mitochondrial read percentage in each group (RColorBrewer palette name as palette input)

plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = pal)
Mitochondrial read percentage in each group (Configured dataframe as palette input)

Mitochondrial read percentage in each group (Configured dataframe as palette input)

plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = c("purple","yellow"))
Mitochondrial read percentage in each group (Manually specified colors as palette input)

Mitochondrial read percentage in each group (Manually specified colors as palette input)

Filtering and Integration

The function filter_scdata() is used for Seurat object subsetting. The grammar of subset parameter is the same as subset() function for Seurat object. A bar plot will be automatically drawn to show the number of cells before and after filtering.

scRNA_f <- filter_scdata(scRNA, subset = nFeature_RNA > 500 & percent.mt < 10)
Number of cells before and after filtering

Number of cells before and after filtering

The list of filtered Seurat objects scRNA_f will be further processed by standard Seurat pipeline:

scRNA_f %<>% 
        purrr::map(.f = NormalizeData) %>%
        purrr::map(.f = FindVariableFeatures) %>%
        purrr::map(.f = CellCycleScoring, 
                   s.features = cc.genes$s.genes, 
                   g2m.features = cc.genes$g2m.genes)

The list of Seurat objects scRNA_f can be merged to one single Seurat object scRNA_int for integrated analysis:

scRNA_int <- IntegrateData(anchorset = FindIntegrationAnchors(object.list = scRNA_f, dims = 1:30, k.filter = 50), dims = 1:30)
scRNA_int %<>%
        ScaleData(vars.to.regress = c("nCount_RNA", "percent.mt", "S.Score", "G2M.Score"))

scRNA_int %<>%
        RunPCA(npcs = 50, verbose = TRUE)

scRNA_int %<>%
        RunUMAP(reduction = "pca", dims = 1:20, n.neighbors = 30) %>%
        FindNeighbors(reduction = "pca", dims = 1:20) %>%
        FindClusters(resolution = 0.3)

Factoring

Factoring Seurat object metadata by refactor_seurat() is an optional step, mainly for better plotting. The function takes the metadata m as argument and make the Seurat object metadata the same factor levels as those in m. If no metadata parameter is provided. All the character vectors in Seurat object metadata will be factored.

m %<>%
        mutate(group = factor(group, levels = c("Normal", "CTCL")))

scRNA_int %<>%
        refactor_seurat(metadata = m)