How often does DNA data double?
Every 18-24 months
How often does structure data double?
Every 6 years
What is the rough composition of the human genome?
3.2 Gbp, 2-5% coding, >50% repeated sequences, ~35% genes have alternative splicing
What is bioinformatics?
The application of computers to biological problems:
- Aiding the biologist in creating, storing, and analysing biological data (mainly sequences/structures)
- Presenting it in a way biologists can use
- Applying the analysis to make predictions
What is fragment assembly?
Searching sequence fragments for overlapping regions to join them in a continuous sequence
How do we conduct fragment assembly?
What is a moving window?
Take an odd number of residues and calculate some average property (typically between 7-21). Slide the window along each residue and calculate the averages.
What information can we predict from a DNA sequence?
What is an algorithm?
A complete and precise set of steps that will solve a problem and achieve an identical result when given the same set of data to a defined level of accuracy
What does computer programming enable us to do?
What are machine learning methods?
A general class of computer software which learns from examples and is then able to make predictions
How do MLMs work?
What are examples of MLMs?
What is a database?
A structured collection of data with some tool enabling it to be queried.
What is a databank?
A collection of data (normally in a simple text file) without an associated query tool. It allows you to use whatever software you like to analyse the data.
What are the types of databanks?
What is a primary databank? Give examples.
Raw data deposition and curation. e.g. Genbank, PDB, UniProtKB
What is a secondary databank? Give examples.
Derived data, patterns, annotations. e.g. Prosite, Pfam, Cath
What is a composite databank? Give examples.
Non-redundant sets of data derived from primary databases. e.g. OWL, NRDB
What is a gateway? Give examples.
Gateways give access to data. e.g. NCBI, Expasy, EBI
What is a gene ontology? Give examples.
Controlled vocabulary to describe gene and gene product attributes. e.g. molecular function, biological process, cellular components.
In what ways can you search databases?
What are the different sequence alignment methods?
What does annotation include?