To load scRNA-seq raw data:
library(Scillus)
library(tidyverse)
library(Seurat)
library(magrittr)
scRNA <- load_scfile(m)
Scillus will create Seurat
object for each sample and automatically call PercentageFeatureSet()
function to calculate the mitochondrial gene percentage. The resulting scRNA
is a list of multiple Seurat objects. Its length equals to number of rows of metadata m
:
length(scRNA)
[1] 6
The QC Measures can be plotted by plot_qc()
. The resulting plot can be further customized (axis.title, theme, etc.) using ggplot grammar.
plot_qc(scRNA, metrics = "percent.mt")
plot_qc(scRNA, metrics = "nFeature_RNA")
plot_qc(scRNA, metrics = "nCount_RNA")
There are 3 optional parameters for plot_qc()
: plot_type
, group_by
, and pal_setup
.
The default value of plot_type
is "combined"
, meaning both box plot and violin plot are drawn. If only one of the two plots is preferred, it can be set as "box"
or "violin"
.
plot_qc(scRNA, metrics = "percent.mt", plot_type = "box")
"density"
is used to draw density plot. Notice that additional ggplot grammar (log transformation here) can be added.
plot_qc(scRNA, metrics = "nCount_RNA", plot_type = "density") + scale_x_log10()
The default value of group_by
is "sample"
, which corresponds to the sample
column in the metadata m
. Since additional metadata are included in the loading process, QC measures can also be plotted by those additional factors, like "group"
(which corresponds to group
column in the metadata m
).
plot_qc(scRNA, metrics = "percent.mt", group_by = "group")
The parameter pal_setup
supports three types of inputs: RColorBrewer
palette name, palette setup dataframe (See Initial Setup, the last section), and manually specified color vector. The default value is the palette "Set2"
.
plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = "Accent")
plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = pal)
plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = c("purple","yellow"))
The function filter_scdata()
is used for Seurat object subsetting. The grammar of subset
parameter is the same as subset()
function for Seurat object. A bar plot will be automatically drawn to show the number of cells before and after filtering.
scRNA_f <- filter_scdata(scRNA, subset = nFeature_RNA > 500 & percent.mt < 10)
The list of filtered Seurat objects scRNA_f
will be further processed by standard Seurat pipeline:
scRNA_f %<>%
purrr::map(.f = NormalizeData) %>%
purrr::map(.f = FindVariableFeatures) %>%
purrr::map(.f = CellCycleScoring,
s.features = cc.genes$s.genes,
g2m.features = cc.genes$g2m.genes)
The list of Seurat objects scRNA_f
can be merged to one single Seurat object scRNA_int
for integrated analysis:
scRNA_int <- IntegrateData(anchorset = FindIntegrationAnchors(object.list = scRNA_f, dims = 1:30, k.filter = 50), dims = 1:30)
scRNA_int %<>%
ScaleData(vars.to.regress = c("nCount_RNA", "percent.mt", "S.Score", "G2M.Score"))
scRNA_int %<>%
RunPCA(npcs = 50, verbose = TRUE)
scRNA_int %<>%
RunUMAP(reduction = "pca", dims = 1:20, n.neighbors = 30) %>%
FindNeighbors(reduction = "pca", dims = 1:20) %>%
FindClusters(resolution = 0.3)
Factoring Seurat object metadata by refactor_seurat()
is an optional step, mainly for better plotting. The function takes the metadata m
as argument and make the Seurat object metadata the same factor levels as those in m
. If no metadata
parameter is provided. All the character vectors in Seurat object metadata will be factored.
m %<>%
mutate(group = factor(group, levels = c("Normal", "CTCL")))
scRNA_int %<>%
refactor_seurat(metadata = m)