What is a GWAS?
‘Association testing at many (but not all) markers across the entire genome’. SNP association with disease.
This is likely only to detect INDIRECT ASSOCIATION (Tagging).
What is the issue with using GWAS?
The human genome encodes 1 SNP/100-300bp
The human genome has approximately 3000m bp
So approx. 10m SNPs (assuming 1 SNP / 300bp)
It is often not possible to genotype and analyse such a large number of data due to several limiting factors
How should the issue of GWAS use be dealt with?
Use of Linkage Disequilibrium (LD)
Instead of genotyping all the 10M SNPs we can genotype tagSNPs in a haplotype block.
A tagging SNP is a representative SNP in a given region of the genome in high LD to all other SNPs in the region.
Genotyping chips with 0.5M-1M SNPs is sufficient for a good GWAS
What should be considered in concern to tagging?
Association does not necessarily mean causation
How are SNPs ‘close’ to each other correlated?
What does this mean?
Haplotypes (a group of genes within an organism that was inherited together from a single parent).
If a causal SNP at position 2 is correlated (say R2=1) with one at position 1 -> then you will observe an association with the SNP at position 1
What are the main steps in designing and analyzing a large (GWAS) association study?
Sample collection
Data generation
Standard analyses for identifying associated loci.
Replication
Quality assurance (QA) and Quality control (QC) (carried out over multiple stages of GWAS)
Why should a GWAS study be replicated?
A replication will reduce worries about (1) and provide more independent evidence against the Null
A Replication study should be performed by different group using a different sample and genoytyping method.
True or false
True
What problems were noted for a GWAS investigating Lupus?
Population Structure
Logistics
Politics
Too few controls
A genetic association study tests whether…
the presence of a specific genetic variant correlates with a trait of interest (e.g. presence/absence of disease)
What does the solid red line on a Manhattan plot represent?
P-value threshold that must be crossed for a snp to be declared as significant
What is Quality assurance good practice for?
Generating quality data
What does Quality control do?
Why?
Rids of bad data to conform to quality metrics
What is the most significant aspect of GWAS?
Quality assurance
What is important for statistical analysis?
Quality Control
Describe the quality control pipeline
Post-genotyping checks
Post-association checks
Describe the quality assurance pipeline
Pre-genotyping checks
Genotype calling
Why is individual missingness an issue in GWAS?
Indicates poor quality DNA.
Informative missingness. If DNA quality correlates with phenotype.
Why is Gender Check an issue in GWAS?
Indicates data recording problem
Why is Relatedness an issue in GWAS?
Independence
Violates association testing assumptions
Why are Population outliers an issue in GWAS?
False positives
Why is inbreeding an issue in GWAS?
Sample contamination
Population effect
How can quality assurance overcome Individual Missingness?
Equal number of cases and controls plated together
How can quality control overcome Individual Missingness?
Plot 1-missingness against % removed