RNA-Seq reads need splicing-aware mappers because RNA comes from spliced transcripts where introns are removed, and some reads span exon-exon junctions. Regular mappers can’t handle these split reads, but splicing-aware tools (e.g., STAR, HISAT2) can align them correctly, ensuring accurate gene expression analysis and detection of splicing events.
What are the RPKM/FPKM and DESeq2/VST techniques?
Normalization techniques for bulk RNA-seq
RPKM/FPKM (Reads/Fragments Per Kilobase of transcript per
Million mapped reads):
- Normalizes for gene length and sequencing depth
- RPKM (single-end reads), FPKM (paired-end reads)
TPM (Transcripts per million):
- Normalizes for gene length first, then sequencing depth
- Makes expression levels comparable across genes and samples
DESeq2/VST (Varianze stabilising transformation)
: normalizes count data and performs differential gene expression analysis using a negative binomial model. VST (Variance Stabilizing Transformation) is a technique within DESeq2 that stabilizes variance across genes, making the data more suitable for visualization and clustering.
What are the key metrics for QC (in bulk DNA analysis)?
Read Quality: A measure of the accuracy and reliability of sequencing reads, often represented as a Phred score indicating the probability of an error in each base call.
Adapter Content: The presence of adapter sequences (used in library preparation) within the sequencing reads, which can interfere with downstream analysis if not removed.
Sequence Length Distribution: A summary of the lengths of the sequencing reads, used to check for consistency and identify potential trimming or sequencing issues.
GC Content: The proportion of guanine (G) and cytosine (C) bases in the sequences, often analyzed for biases that may affect sequencing coverage or downstream analysis.
Behavioral module
What is the purpose and general idea of a Linear Mixed-Effects Model (LME)?
Purpose: Account for fixed and random effects
* Fixed effects: consistent and systematic across all observations (e.g.
treatment or condition)
* Random effects: batch effects, individual variability
* LME allows to control for confounding variables (random effects) while estimating impact of variables of interest (fixed effects)
What is the issue with testing many genes and how can this be mitigated?
Multiple test correction
* Differential expression: many tests are performed
* Need to take this into account, e.g. using Benjamini–Hochberg
(BH) multiple testing correction
* BH adjusts the p-value based on the number of tests
* It controls the False Discovery Rate (FDR): among all genes called
significantly differentially expressed, which proportion is in reality
from the null model (i.e. not differentially expressed
Applications of PCA in RNA-seq)
What are average linkage and complete linkage methods for and what is the difference between them?
methods used in hierarchical clustering to determine how clusters are formed by measuring the distance between groups of data points
Complete linkage uses maximal intercluster dissimilarity.
The largest of the pairwise dissimilarities is use
Average linkage uses mean intercluster dissimilarity.
The average of the pairwise dissimilarities is used
What is Enrichment Analysis for and what are the steps?
statistical techniques used to identify whether specific biological categories (e.g., pathways, gene sets, or functional annotations) are overrepresented or “enriched” in a given list of genes, compared to what would be expected by chance.
Steps:
1. Input Gene List:
A set of genes of interest (e.g., differentially expressed genes, genes from a specific cluster, or genes with mutations).
4.Statistical Testing:
Compares the overlap between the input gene list and annotated gene sets to assess overrepresentation. Methods include: * Fisher's Exact Test or Hypergeometric Test: Determines whether the overlap is statistically significant.
What procedure is commonly used to reduce the FDR?
Benjamini-Hochberg (BH)
What are the benefits with single cell-approaches compared to Bulk RNA-seq?
applications and workflow of Single-Cell RNA Sequencing (scRNA-seq) preprocessing
?
WORKFLOW scTNA
1. Cell dissociation and isolation (e.g., FACS, microfluidics)
Challenges in scRNA-seq
What is spatial transcriptomics and what is the applications of it?
Applications:
* Reveals spatial organization of tissues: map gene expression to brain anatomy
* Understanding cell-type diversity
* Interactions and cell-cell communication
What does single-cell ATAC-seq do and what insights can be gained from it?
Single-Cell ATAC-seq: Profiling Chromatin Accessibility
Purpose: Profiles chromatin accessibility at single-cell resolution to identify active regulatory regions (e.g., enhancers and promoters).
Insight: Reveals which regions of the genome are open and potentially regulating gene expression in specific cell types.
What does Single-Cell DNA Methylation Sequencing do and what insights can be gained from it?
*Purpose: Profiles DNA methylation (an epigenetic modification) at single-cell resolution, using bisulfite sequencing.
Explain Multi-Omics at Single-Cell Resolution
What are the QC metrics in single cell rna-seq?
Importance of QC: Crucial due to variability in cell quality
Metrics:
* Total reads per cell
* number of detected genes
* mitochondrial gene content
What are doublets?
(Problem in single cell rna-seq)
* reads
originating from two cells are assigned to a single cell - Doublets can skew results
* Can be computationally removed.
What are the unique challenges of SINGLE CELL rNA-seq in the Alignment and Quantification process? How are these addressed?
FIiltering: * Filtering: low-quality cells and genes are removed (e.g., low gene
counts, genes not expressed in enough cells)
What are the steps in the single cell RNA-seq pipeline?
Normalization and Scaling in scRNA-seq (challenges and solutions)
Challenges:
* Zero Inflation: Excessive zero counts due to dropout events or technical issues.
* Variable Sequencing Depth: Uneven read counts between cells.
Solutions: * Imputation: Fills in missing values using statistical models (e.g., negative binomial). * Log-Normalization: Scales counts for sequencing depth and applies log transformation to stabilize variance.
How do we evaluate clusters in single-cell RNA-seq?
What is annotation
Annotating clusters involves linking them to cell types using marker genes, either manually or with automated tools like Garnett, based on differential expression analysis
goal, methods and applications of pseudotime analysis
Goal: Arrange cells along a temporal trajectory based on their gene expression profiles, simulating a time order of cellular processes without actual time points.
Methods:
Clustering-Based Approach:
Group cells into clusters.
Connect clusters to form a trajectory, reflecting transitions between cell states.
Probabilistic Frameworks:
Calculate transition probabilities between cells or clusters.
Build trajectories by modeling the most likely paths cells follow.
Applications:
Study cell differentiation (e.g., stem cells becoming specialized).
Analyze developmental processes (e.g., organ formation).
Explore cell responses to stimuli (e.g., immune activation).
Summary: Pseudotime analysis reconstructs cellular transitions, revealing dynamic processes like differentiation or development from static single-cell data.