Bioinformatics Flashcards by Kay Cee

interdisciplinary
field that combines biology, computer science,
statistics, mathematics, and engineering to analyze
and interpret biological data, particularly data from
large datasets like genomes or protein sequences

Bioinformatics

How well did you know this?

Not at all

Perfectly

It is a widely-used format for
representing nucleotide or protein sequences.

FASTA

How well did you know this?

Not at all

Perfectly

It consists of a header line starting with ‘>’, followed by the sequence data on subsequent lines.

FASTA

How well did you know this?

Not at all

Perfectly

in sequence alignment, a ________ represents a position where one sequence has an insertion or
deletion relative to another sequence.

Gap

How well did you know this?

Not at all

Perfectly

____________ are
introduced to optimize alignment and account for
evolutionary changes

Gap

How well did you know this?

Not at all

Perfectly

___________ are
introduced to optimize alignment and account for
evolutionary changes.

Gap

How well did you know this?

Not at all

Perfectly

It is the
sequence for which you are searching for similarities
or matches within a database

Query sequence

How well did you know this?

Not at all

Perfectly

It’s the sequence you
are using as a reference

Query sequence

How well did you know this?

Not at all

Perfectly

it is the
sequence(s) in a database against which the query
sequence is compared during sequence alignment or
similarity searches

Subject sequence

How well did you know this?

Not at all

Perfectly

it is a branching
diagram that depicts the evolutionary relationships
among a set of organisms, genes, or species

Phylogenetic tree

How well did you know this?

Not at all

Perfectly

It
shows the inferred evolutionary history and
relatedness based on genetic or sequence data

Phylogenetic tree

How well did you know this?

Not at all

Perfectly

it is a
unique numerical identifier assigned to each
sequence entry in the NCBI (National Center for
Biotechnology Information) databases.

GI number

How well did you know this?

Not at all

Perfectly

It provides a
stable and unique way to refer to a specific sequence
entry.

GI number

How well did you know this?

Not at all

Perfectly

It is a
unique identifier assigned to a sequence record in a
public sequence database (like GenBank, EMBL, or
DDBJ)

Accession number

How well did you know this?

Not at all

Perfectly

Typically consist of letters
and numbers and are used to reference specific
sequence entries.

Accession number

How well did you know this?

Not at all

Perfectly

Involves
identifying and labeling the features of a genome such as genes, regulatory sequences, and other
functional elements.

Genome annotation

How well did you know this?

Not at all

Perfectly

This process helps in
understanding the biological significance of the DNA
sequence.

Genome annotation

How well did you know this?

Not at all

Perfectly

In sequence alignment or similarity searches, it is a numerical value that quantifies the level
of similarity or quality of alignment between two
sequences.

Score

How well did you know this?

Not at all

Perfectly

Higher scores generally indicate more
significant similarity.(T or F)

TRUE

How well did you know this?

Not at all

Perfectly

It is a statistical
measure that estimates the number of different
alignments with scores equivalent to or better than a
given score that would occur by chance in a database
search.

Expect value (E-value)

How well did you know this?

Not at all

Perfectly

A ___________ indicates a more significant
match or similarity.

lower E-value

How well did you know this?

Not at all

Perfectly

A field which uses computers to store and analyze
molecular biological information

BIOINFORMATICS

How well did you know this?

Not at all

Perfectly

It is about finding and interpreting biological data
online

BIOINFORMATICS

How well did you know this?

Not at all

Perfectly

It is a field in which biology, mathematics, statistics, computer
science, information technology, and other health sciences are
merged into a single discipline to process biological data

BIOINFORMATICS

How well did you know this?

Not at all

Perfectly

It uses complex machines to read biological data at a much faster rate than before.

BIOINFORMATICS

There is a marriage between biology and informatics. (T or F)

TRUE

The science of collecting and analyzing complex biological data

BIOINFORMATICS

Allows the storage and management of large biological data sets

THE CREATION OF DATABASES

Data is being generated at a much greater pace than its analysis (e.g. Human Genome Project)

THE CREATION OF DATABASES

These are repositories so it's like a bank of biologic information and are designed to collect, archive, visualize, and organize biologic data.

Databases

This is to enable scientists to have an intelligent data description, interpretation, or retrieval.

Databases

There is much data that has been generated especially since the completion of the

Human Genome Project

When was Human Genome Project launched?

1990s

Objective of human genome project

To sequence the entire human genome which consists of about 3.2 billion base pairs.

It was completed in 2003 because of this there’s a large amount of data that have to be interpreted or analyzed.

Human Genome Project

Aside from the human genome, many other organisms were completely sequenced. So there is again an enormous amount of data that has to be understood that is why databases have been created. (T or F)

TRUE

PRINCIPAL COMPONENTS OF BIOINFORMATICS

*THE CREATION OF DATABASES *THE DEVELOPMENT OF ALGORITHMS AND STATISTICS *THE USE OF THESE TOOLS FOR THE ANALYSIS AND INTERPRETATION OF VARIOUS TYPES OF BIOLOGICAL DATA

Determine relationships among members of large data sets

THE DEVELOPMENT OF ALGORITHMS AND STATISTICS

The large set of data are organized so that relationships can be determined that is called

Algorithm

Algorithm is applied in ________

Statistics

including DNA, RNA and protein sequences, protein structures, gene expression profiles, and biochemical pathways

THE USE OF THESE TOOLS FOR THE ANALYSIS AND INTERPRETATION OF VARIOUS TYPES OF BIOLOGICAL DATA

Sciences that attempt to describe a living organism in terms of 'omics'

BRANCHES OF BIOINFORMATICS

Genomics Transcriptomics Proteomics Microbiomics Metabolomics

IDENTIFY THE BRANCH OF BIOINFORMATICS - involves the description of sequences of the entire genome of an organism

Genomics

IDENTIFY THE BRANCH OF BIOINFORMATICS study of all RNA molecules in a living organism

Transcriptomics

IDENTIFY THE BRANCH OF BIOINFORMATICS the description of the entire complement of proteins in a living organism.

Proteomics

IDENTIFY THE BRANCH OF BIOINFORMATICS They study the sequence, 3D structures, and other properties of proteins.

Proteomics

IDENTIFY THE BRANCH OF BIOINFORMATICS It is the entire proteins found in a living organism.

Proteomics

IDENTIFY THE BRANCH OF BIOINFORMATICS Pertains to microbes, viruses, fungi, parasites, bacteria.

Microbiomics

IDENTIFY THE BRANCH OF BIOINFORMATICS The genomes of these microorganisms are described within a specific environmental niche

Microbiomics

IDENTIFY THE BRANCH OF BIOINFORMATICS involves description of the chemical processes involving metabolites.

Metabolomics

DNA/RNA BIOINFORMATICS APPLICATIONS

● Retrieving DNA sequences from databases ● Computing nucleotide compositions ● Identifying restriction sites ● Designing polymerase chain-reaction (PCR) primers ● Identifying open reading frames (ORFs). ● Predicting elements of DNA/RNA secondary structure ● Finding repeats ● Computing the optimal alignment between two or more DNA sequences ● Finding polymorphic sites in genes (single nucleotide polymorphisms, SNPs) ● Assembling sequence fragments

Identifying open reading frames (ORFs) - Open reading frames means that you have a sequence which includes the

start codon until a stop codon

WHY DO BIOINFORMATICS?

● It serves to save time when doing real experiments. design primers ● You might want to do a simulated experiment on a computer (' in silico') instead of a real environment.

Bioinformatics is very convenient for a scientist because it serves to

Save him time when he wants to do a real experiment. As the experiment or the research study may start by simulating it in a computer first.

When you do simulated experiments in a computer, that is described as “in silico” so it is done in a computer rather than a real environment. For example, when you do PCR and you want to amplify a particular DNA fragment, you design primers using bioinformatic tools or software. (T or F)

TRUE

Once you have designed a primer, then you can do your actual laboratory experiment, we call it the ____________

Wet lab

Where the primer would be optimized and eventually used in the amplification reaction.

Wet lab

APPLICATIONS OF BIOINFORMATICS

● Sequence alignment and analysis ● Mapping and analyzing DNA, RNA, Protein, Amino Acid, and Lipid sequences ● Creation and visualization of 3-D structure models for biological molecules of significance, e.g., proteins ● Genome annotation ● Genetic diseases ● Designer Medicine

APPLICATIONS IN VARIOUS FIELDS

● Microbial genome applications ● Molecular medicine ● Personalized medicine ● Gene therapy ● Drug development ● Antibiotic resistance ● Evolutionary studies ● Waste cleanup ● Biotechnology ● Climate change studies ● Alternative energy sources ● Crop improvement ● Forensic analysis ● Bio-weapon creation ● Insect resistance ● Improve nutritional quality ● Veterinary science

The earliest databases for DNA sequences and proteins were developed by three groups of scientists from different parts of the world:

● Nucleic Acids (International Nucleotide Sequence Database) ● Protein (Worldwide Protein Data Bank)

IDENTIFY THE DATABASE DDBJ (DNA Data Bank of Japan)

Nucleic Acids (International Nucleotide Sequence Database)

IDENTIFY THE DATABASE EMBL (European Molecular Biology Lab)

Nucleic Acids (International Nucleotide Sequence Database)

IDENTIFY THE DATABASE EMBL (European Molecular Biology Lab)

Nucleic Acids (International Nucleotide Sequence Database)

IDENTIFY THE DATABASE Genbank (USA)

Nucleic Acids (International Nucleotide Sequence Database)

IDENTIFY THE DATABASE PDBj (Japan)

Protein (Worldwide Protein Data Bank)

IDENTIFY THE DATABASE RCSB PDB (USA)

Protein (Worldwide Protein Data Bank)

DNA Data Bank of Japan

DDBJ

Other databases

● Ensembl ● Human metabolome Database (HMDB) ● Gene Expression Databases - Mostly Microarray data ● Phenotypic Databases ● RNA Databases ● Amino Acid/Protein Databases ● Protein-Protein and other Molecular interactions ● Signal Transduction Pathway Databases ● Metabolic Pathway and Protein Function Databases ● Bacterial DNA Databases

Database that provides data on the genome of characteristic organisms

Ensembl

Very useful particularly if you want to determine the boundary of exons and introns in a eukaryotic gene.

Ensembl

GENETIC ANALYSIS APPLICATION

● A disease may arise due to changes the sequence of the gene being expressed ● Single Nucleotide Mutation: Sickle Cell Anemia

A consequence of a change that has occurred in the gene of hemoglobin particularly the beta portion of hemoglobin.

Sickle cell anemia

Mutations occurred in some individuals such that A is substituted by U so that the codon became GUG which codes for Vaseline. (T or F)

FALSE (Valine NOT VASELINE)

In sickle cell anemia there was a point mutation that occurred involving the codon GAG which codes

Glutamic acid

Genetic characteristic

Genotype

Physical characteristic

Phenotype

Recessive trait

Sickle-Cell Anemia

REVIEW THE FINDING THE DNA SEQUENCE OF A GENE, OWKI??

OWKI

A way of rearranging sequences of DNA, RNA or protein to identify regions of similarity

SEQUENCE ALIGNMENT

Sequence alignment is made between

a known sequence (reference sequence) and unknown sequence (query sequence)

Reference sequence

Known sequence

Query sequence

Unknown sequence

TYPES OF SEQUENCE ALIGNMENT

Pairwise Multiple

Compare two sequences

Pairwise

Compare more than two sequences

Multiple

Pairwise

○ EMBOSS WATER ○ BLAST

Multiple

○ MUSCLE ○ MAFFT ○ CLUSTAL Omega

TYPES OF PAIRWISE SEQUENCE ALIGNMENT

Global alignment Local alignment

IDENTIFY THE TYPE OF PAIRWISE SEQUENCE ALIGNMENT Matching the residues (bases or amino acids) of two sequences across their entire length.

Global alignment

IDENTIFY THE TYPE OF PAIRWISE SEQUENCE ALIGNMENT matches the identical sequences

Global alignment

IDENTIFY THE TYPE OF PAIRWISE SEQUENCE ALIGNMENT The two sequences are treated as potentially equivalent

Global alignment

IDENTIFY THE TYPE OF PAIRWISE SEQUENCE ALIGNMENT Comparing two genes with the same function (in human vs. mouse)

Global alignment

IDENTIFY THE TYPE OF PAIRWISE SEQUENCE ALIGNMENT Comparing two proteins with similar functions

Global alignment

IDENTIFY THE TYPE OF PAIRWISE SEQUENCE ALIGNMENT Matching of two sequences from regions which have more similarity with each other

Local alignment

IDENTIFY THE TYPE OF PAIRWISE SEQUENCE ALIGNMENT ○ The two sequences may or may not be related

Local alignment

IDENTIFY THE TYPE OF PAIRWISE SEQUENCE ALIGNMENT to see whether a substring (a part) in one sequence aligns well with a substring (a part) in the other sequence

Local alignment

IDENTIFY THE TYPE OF PAIRWISE SEQUENCE ALIGNMENT Searching for local similarities in large sequences (e.g., newly sequenced genomes)

Local alignment

IDENTIFY THE TYPE OF PAIRWISE SEQUENCE ALIGNMENT Looking for conserved domains of motifs in two proteins

Local alignment

The residues are colored so that you can easily see if there is difference if there is any variation among the sequences.

Clustal omega

When you have a multiple sequence alignment, you will be able to determine if all of the sequences are identical by the presence of an __________

Asterisk

if there is a variation, there is no asterisk. (T or F)

TRUE

MULTIPLE ALIGNMENT TOOLS: Analysis of more than 2 sequences

MUSCLE MAFFT Clustal Omega

MUSCLE

Multiple Sequence Comparison by Log Expectation

MAFFT

Multiple Alignment using Fast Fourier Transform

It is a multiple sequence alignment tool that arranges the sequences of DNA, RNA or protein to identify regions of similarity

MUSCLE (Multiple Sequence Comparison by Log Expectation)

Finds regions of local similarity between sequences just like MUSCLE and MAFT

NCBI: Basic Local Alignment Search Tool (BLAST)

The amino acid sequences of proteins or the nucleotides of DNA sequences.

NCBI: Basic Local Alignment Search Tool (BLAST)

Compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold

NCBI: Basic Local Alignment Search Tool (BLAST)

Can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families

NCBI: Basic Local Alignment Search Tool (BLAST)

Read additional notes about NCBI: Basic Local Alignment Search Tool (BLAST), owki??

OWKIII

Used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families

BLAST

You supply multiple sequences to be aligned to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences

MULTIPLE ALIGNMENT

Here you supply all the sequences with the tools that we used like MUSCLE.

MULTIPLE ALIGNMENT

it will align the sequences that you uploaded and it does not necessarily look for sequences in the database

MULTIPLE ALIGNMENT

Read and analyze the difference of multiple sequence alignment and BLAST, and the summary. OWKI??

OWKIII

Bioinformatics Flashcards

(116 cards)