What is a phylogenetic tree and what are its elements?
Explain the components of a tree (clade, length of branches, root, outgroup)
What is a rooted tree?
A tree with a root which is the most recent common ancestor (MRCA) of the sequences or species in the tree
- Root is oldest part of tree - all other nodes descend from it - gives directionality
- So in rooted trees we can talk about ancestors and descendents, or ancestral and derived character states
What is an outgroup?
A taxon which we know diverged before the MRCA of the group under consideration (the ingroup)
What is a clade?
A monophyletic group of sequences or organisms
- Groups which include all descendents of a single ancestor
- Some groups are not clades - termed ‘paraphyletic’ - e.g., reptiles (clade should include birds) and apes (clade should include humans)
What are monophyletic and paraphyletic groups?
How can you do phylogenetic reconstruction and what is the exception?
What two separate functions are required for phylogenetics?
What types of data can we use for phylogenetics?
Discrete characters:
- Independent variables who possible values are collections of mutually exclusive character states
- e.g., nucleotide position in n in DNA region x
- Can be qualitative or quantitative
Distances or similarities:
- Complex multi-variable dataset of differences among taxa combined into a composite measure, expressed as a single value
- n x n matrix is made
- More computationally efficient
What assumptions are made for discrete data?
What are the two main phylogenetic methods
Optimality criteria methods: application of a definition of the preffered tree - compares tree to optimality criterion to find best tree and ranks them
- Maximum parsimony (MP)
- Maximum likelihood
- Bayesian (MCMC sampling)
Algorithmic: set of instructions for how to go about making a phylogenetic tree - rolls together how to make trees and the definition of a preferred tree
- Clustering algorithms
- UPGMA
- Neighbour-joining (NJ)
How do you rank trees with the two phylogenetic methods?
Criterion-based:
- Rank in order of fit to the criteria and choose best
- Can quantitatively say how much better one topology is compared to an alternative
Algorithmic:
- Many trees that are equally likely and a simple algorithmic method (like NJ) will still give just one tree as result
- So is hard to say how much better one topology is over another
- Can use resampling statistics (e.g., bootstrapping) to indicate what extent the topology is supported by the data - but do not overcome this weakness
What are the two different algorithms of finding the best tree with optimality criterion methods?
Exact algorithms:
- ‘Guarantee’ to find optimal tree by using exhaustive seach to find best score
- Or use branch and bound search - eliminates part of tree that only contain suboptimal solutions
- But if too many trees - will take a very long time and lots of computational power
Heuristic algorithms:
- Approximate or quick and dirty methods to attempt to find optimal tree for method of choice - but cannot guarantee
- Often use ‘hill-climbing’ methods - but local peak may not be the highest mountain (optimal tree) - so is repeated many times
What is homoplasy?
Multiple mutational hits on the same site results in alleles identical in state, but not by descent
- Causes divergence rate to decline overtime rather than follow a linear ‘molecular clock’
What model allows for the possibility of homoplasy?
Substitution model:
- Can be used to try and allow for the effect of homoplasy and estimate the true evolutionary distance from the observed distance
How does UPGMA cluster analysis work and what assumption does it make?
The most similar pair of taxa is found - i.e. the two taxa separated by the smallest genetic distance
- Makes assumption of ultrametricity - i.e. that two of the three pairwise distances among three taxa are equal and at least as large as the third which involves the assumption that rates of evolution cannot vary among taxa
- But this assumption is nearly ALWAYS violated
How does neighbour joining cluster algorithm work?
Is like cluster analysis but allows for variation in evolutionary rates along different branches
- Doesn’t require assumption of ultrametricity
- It does assume ‘multiple hits’ of the same nucleotide sites
What is bootstrapping?
Resampling of data to get a feel for how well the topology is supported by the data
- i.e. data are resampled many times and reconstructed - asking if the same tree would have been generated if some of the sequence information had not been obtained, or if all the taxa had not been sampled
- Bootstrap values show number of time and indicate 100% of resampled replicated which include the observed clade in the original tree
What are the problems with bootstrapping?
Compare the algorithmic and optimality criteron based approaches
Algorithmic:
- Combines the definition of the ‘best tree’ and how to generate the tree topology
Optimality criteron:
- Separate definition of the best tree and generation of topologies
- Topologies can be quantitatively compared against the definition, and ranked based on a score as to how well they fit the definition
What are parsimony approaches?
Type of optimality criteria method
- Works directly with character data
- Makes assumption that the trees requiring the smallest number of character state changes that can explain the data are preferred
- So - less changes are more parsimonious than complexed sequence changes
What are the positives /problems with parsimony?
How do the maximum likelihood (ML) and Bayesian methods work?
These methods specify a model of molecular evolution and build this into the best estimate of the tree
- ML looks for the tree that maximises likelihood of observing the data, given the tree and the model
- Bayesian - seeks the tree that maximises the probability of the tree, given the data and the model
What are the advantages of ML and Bayesian methods over parsimony?