2 - Genomic Data Analysis Flashcards

(30 cards)

1
Q

What is shotgun sequencing?

A

DNA is randomly fragmented, sequenced, then assembled computationally

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is average coverage (two definitions)?

A

Total nucleotides ÷ genome size

Average number of reads covering each genome position

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is position-specific coverage?

A

Number of reads covering a particular base

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What challenges affect genome assembly?

A

Sequencing errors → mismatches

Ploidy effects → diploid organisms have differing maternal/paternal alleles

Repeats (~50% of human genome) collapse when reads are too short

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can repeat regions be resolved in sequencing

A

Use longer reads to anchor repeats in unique context

“Walk in” from both sides of repeats until sequences overlap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is N50?

A

The length of the shortest contig such that 50% of the genome is contained in contigs of that length or longer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does a higher N50 indicate?

A

Better assembly continuity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are structural variants?

A

Large-scale changes in genome structure such as insertions, deletions, duplications, translocations, or inversions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a synonymous mutation?

A

Mutation that does not change the amino acid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a non-synonymous mutation?

A

Mutation that changes the amino acid and is more likely to affect phenotype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are non-coding mutations?

A

Mutations in promoters, enhancers, or operators that affect regulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a common method to detect and prioritise mutations?

A

Multiple sequence alignments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does Hardy-Weinberg equilibrium state?

A

Allele and genotype frequencies remain constant unless evolution acts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What conditions are required for Hardy-Weinberg equilibrium?

A

No mutations
No migration
Random mating
Large population
No natural selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the equation for allele frequencies?

A

p + q = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the equation for genotype frequencies?

A

p² + 2pq + q² = 1

17
Q

What do the genotype terms represent?

A

p² = homozygous dominant
2pq = heterozygous
q² = homozygous recessive

18
Q

What is the purpose of enrichment analysis?

A

To identify pathways/functions that are overrepresented in a gene list compared to background

19
Q

What databases are used in enrichment analysis?

A

GO (Gene Ontology) and KEGG (curated pathways)

20
Q

Name tools used for enrichment analysis

A

g:Profiler
DAVID
GSEA
ShinyGO

21
Q

What is conserved synteny?

A

Preservation of gene order between species

22
Q

What is whole-genome phylogenetics used for?

A

Comparing complete genomes to reveal phenotypic, metabolic, or evolutionary differences

23
Q

What does incongruence in genomic history mean?

A

Different genes in the same genome may have different evolutionary histories (e.g. horizontal gene transfer)

24
Q

What is a pangenome?

A

The total genetic content of a species

25
What is the core genome?
Genes shared by all individuals
26
What is the accessory genome?
Genes present in some but not all individuals
27
Why is pangenome analysis important?
It helps study diversity, adaptation, and functional potential
28
What is machine learning increasingly used for in genomics?
Variant effect prediction, gene function prediction, genome annotation, and pattern recognition in big data
29
Why is machine learning useful in genomics?
It can detect patterns difficult to identify with classical statistical methods.
30
What is a contig?
A contiguous stretch of DNA sequence assembled from multiple overlapping DNA fragments