What is shotgun sequencing?
DNA is randomly fragmented, sequenced, then assembled computationally
What is average coverage (two definitions)?
Total nucleotides ÷ genome size
Average number of reads covering each genome position
What is position-specific coverage?
Number of reads covering a particular base
What challenges affect genome assembly?
Sequencing errors → mismatches
Ploidy effects → diploid organisms have differing maternal/paternal alleles
Repeats (~50% of human genome) collapse when reads are too short
How can repeat regions be resolved in sequencing
Use longer reads to anchor repeats in unique context
“Walk in” from both sides of repeats until sequences overlap
What is N50?
The length of the shortest contig such that 50% of the genome is contained in contigs of that length or longer
What does a higher N50 indicate?
Better assembly continuity
What are structural variants?
Large-scale changes in genome structure such as insertions, deletions, duplications, translocations, or inversions
What is a synonymous mutation?
Mutation that does not change the amino acid
What is a non-synonymous mutation?
Mutation that changes the amino acid and is more likely to affect phenotype
What are non-coding mutations?
Mutations in promoters, enhancers, or operators that affect regulation
What is a common method to detect and prioritise mutations?
Multiple sequence alignments
What does Hardy-Weinberg equilibrium state?
Allele and genotype frequencies remain constant unless evolution acts
What conditions are required for Hardy-Weinberg equilibrium?
No mutations
No migration
Random mating
Large population
No natural selection
What is the equation for allele frequencies?
p + q = 1
What is the equation for genotype frequencies?
p² + 2pq + q² = 1
What do the genotype terms represent?
p² = homozygous dominant
2pq = heterozygous
q² = homozygous recessive
What is the purpose of enrichment analysis?
To identify pathways/functions that are overrepresented in a gene list compared to background
What databases are used in enrichment analysis?
GO (Gene Ontology) and KEGG (curated pathways)
Name tools used for enrichment analysis
g:Profiler
DAVID
GSEA
ShinyGO
What is conserved synteny?
Preservation of gene order between species
What is whole-genome phylogenetics used for?
Comparing complete genomes to reveal phenotypic, metabolic, or evolutionary differences
What does incongruence in genomic history mean?
Different genes in the same genome may have different evolutionary histories (e.g. horizontal gene transfer)
What is a pangenome?
The total genetic content of a species