SEPTEMBER 2 Flashcards

(83 cards)

1
Q

describes the flow of genetic information within a biological system — how information stored in DNA is used to make proteins, which carry out most cellular functions.

A

CENTRAL DOGMA IN BIOLOGY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

1967: developed an automated protein sequencer called “Sequanator”

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

is a book, an encyclopedia of protein sequences; they publish it monthly because of new proteins arising

A

The atlas of protein sequence and structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Structured way of looking into the information; structuring information depends on the biomolecule (DNA, RNA, Proteins, Whole Organisms

Working with tools that move the information into something that’s meaningful in an analysis – information to knowledge

A

BIOINFORMATICS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

1967: developed the “Erdman Degradation Reaction”

A

Pehr Victor Erdman

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Developed the Atlas of Protein Sequence and Structure at the National Biomedical Research Foundation where she was an Associate Director

1965

A

Margaret Dayhoff

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Bioinformatics started with _________; they actually thought that it was the information material

A

proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

a database of collections from collected, translated and curated proteins from:

SwissProt
TrEMBL
PIR
PDB
GenBank
PRF
RefSeq
TPA

A

NCBI Protein Database in NCBI:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

1985: Nobel prize for the Polypeptide Theory of protein sequence

He was looking into polypeptide structures

A

Frederick Sanger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

By figuring out what kind of structures, you can somehow figure out how vaccines are made, how medicines interact with cells – general connection was a black box in the past

A

protein structures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ATLAS OF PROTEIN SEQUENCE AND STRUCTURE (1967-68) Eventually became the

A

Resource Protein Sequence Database - COLLECTED

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

AMOS BAIROCH major contributions to bioinformatics

A

1986 Swiss Prot
1990 TrEMBL
2002 UniProtKB
NCBI Protein Database in NCBI:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

For translation of EMBL nucleotide sequences
Supplement to Swiss-Prot initially consisted of computationally annotated sequence entries derived from the translation of all coding sequences (CDSs) found in INSDC databases – TRANSLATED

A

1990 TrEMBL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

bank where 3D structure of the proteins are stored

A

Protein Data Bank (PDB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

is one of the earliest and most important biological databases developed to organize protein sequence information. It played a key role in bioinformatics and molecular biology.

One of the first bioinformatics projects in history.

Helped establish computational biology as a scientific field.

Provided the foundation for today’s sequence databases and bioinformatic tools.

houses the sequences of protein itself

A

PIR (PROTEIN INFORMATION RESOURCE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Swiss-Prot (manual, reviewed) + TrEMBL (automatic, unreviewed)

Together, they provide:

“A complete, reliable, and accessible source of protein knowledge for biological and biomedical research.”

Serves as the central protein database used globally by scientists.

Integrates data from genomics, proteomics, and structural biology.

Enables protein identification, functional prediction, and comparative analysis.

A

2002 UniProtKB

Universal Protein Resource Knowledgebase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

NCBI Protein Database in NCBI:

Curated protein sequences - EMBL Europe

A

SwissProt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

NCBI Protein Database in NCBI:

Translated sequences - EMBL Europe

A

TrEMBL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc.), a minimal level of redundancy and a high level of integration with other databases – CURATED

A

1986 Swiss Prot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

NCBI Protein Database in NCBI:

Submitted sequences - USA

A

PIR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why did they put these all together in NCBI ?

A

Synergy that comes with information → better predictions, better functional annotations

We can now understand what big data means

Now that we have a lot of data, it’s now easier to understand the 3D structure of protein based on sequence when you have so many sequences – Molecular aspect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

NCBI Protein Database in NCBI:

Three-dimensional data on proteins and nucleotides - PDB

A

PDB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

NCBI Protein Database in NCBI:

Annotated - NCBI-USA

A

RefSeq

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

NCBI Protein Database in NCBI:

Amino acids sequence data and translations - Japan

A

PRF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
DNA sequencing was one of the early techniques sanger used to deduce protein sequences. is an early protein sequencing technique based on partial hydrolysis of proteins to generate random overlapping peptide fragments. By analyzing these fragments and their overlaps, the complete amino acid sequence of a protein can be deduced.
SANGER’S RANDOM PERMEATION METHOD
6
We now have the creation of DNA side databases Submitted by people who are doing their research in the laboratories give 2 examples of dna databases
1979 Los Alamos DNA Sequence Database 1982 GenBank NCBI or the National Center for Biotechnology Information 1980 European Molecular Biology Laboratory Nucleotide Sequence Data Library Nucleotide sequence analysis 1986 DNA Data Bank of Japan Interdisciplinary Nucleotide Sequence Database Collection or INSDC
6
NCBI Protein Database in NCBI: And translated sequences - NCBI-USA
GenBank
6
NCBI Protein Database in NCBI: Annotated Third Party Annotated Sequences in NCBI, USA
TPA
6
UniProt is the world’s largest and most comprehensive protein database, providing detailed information about protein sequences, structures, and functions. Has information on proteins in different structures, proteomes, etc. Mature protein information system
UNIPROT
7
To look at info from different sources Store data Make sense across different databases No PubMed in EMBL (Resources are different) GenBank
NCBI
7
is a protein classification database that integrates data from many different protein signature databases to provide a comprehensive view of protein families and their functions. Like INSDC Trying to link big databases altogether
InterPro
8
These repositories are just part of the bigger picture of trying to disseminate information in lab settings (collaborations, etc.) Also has set of tools ENA (European Nucleotide Archive)
EMBL
9
in bioinformatics is the ability of systems and tools to efficiently manage and process the continuously increasing volume of biological data. The methods can also be done by others
SCALABILITY
10
Human genome is how many gb
8gB
11
Gene expression data for experiments is how many gb
1TB
12
Sequence errors estimated at between
0.37 and 35(!) errors per 1000 bases
13
Problems encountered in bioinformatics and sequencing
Recombination Contamination Annotation errors: propagates misannotations Errors not always corrected in a timely way Genes with varying unrelated functions depending on context Functional annotation is often unsystematic
14
refers to a common problem in bioinformatics and molecular biology where a gene or protein’s name does not accurately represent its true function — often because the name was given before its full biological role was understood.
Name-function disconnect
14
what sequence do we want in bioinformatics?
INFORMATION → INTEGRATION → INSIGHT → KNOWLEDGE
14
INFORMATION SOURCES (DATABASES) Catalogs of genetic variation (SNPs, indels, structural variants)
dbSNP/dbVar
15
INFORMATION SOURCES (DATABASES) Repository of raw nucleotide sequences (DNA/RNA) submitted by researchers worldwide Can be submitted by so many people Annotated collection of publicly available DNA sequences Information of submitter, submission, and context of submission
GenBank
15
INFORMATION SOURCES (DATABASES) Curated, non-redundant reference sequences for DNA, RNA, and proteins Only one entry Validated
RefSeq
16
INFORMATION SOURCES (DATABASES) Catalogs of genetic variation (SNPs, indels, structural variants)
dbSNP/dbVar
16
INFORMATION SOURCES (DATABASES) Human genes and genetic disorders with clinical relevance
OMIM
17
INFORMATION SOURCES (DATABASES) Literature database for biomedical research and background knowledge
PubMed
18
INFORMATION SOURCES (DATABASES) For chemical studies
PubChem
18
in gene bank, it is a table of annotated positions of genes
FEATURES TABLE
19
SOME COMMON ANALYSIS TOOLS Homology Searching
BLAST
20
SOME COMMON ANALYSIS TOOLS Sequence alignment
ClustalW
20
SOME COMMON ANALYSIS TOOLS Phylogenetics
PHYLIP
21
SOME COMMON ANALYSIS TOOLS Functional Patterns
HMMER
22
SOME COMMON ANALYSIS TOOLS Gene Prediction
GenScan
23
SOME COMMON ANALYSIS TOOLS Regulatory region analysis
MatInspector
24
SOME COMMON ANALYSIS TOOLS RNA structure
UniFold
24
SOME COMMON ANALYSIS TOOLS JPred
Protein Structure
25
TOOLS FOR DATA RETRIEVAL AND EXPLORATION Hub summarizing information on gene structures, function, expression, orthology
Gene Databases
25
TOOLS FOR DATA RETRIEVAL AND EXPLORATION The central search engine linking across all NCBI databases
Entrez
26
TOOLS FOR DATA RETRIEVAL AND EXPLORATION Provides amino acid sequences, annotations, and functions
Protein database (RefSeq/GenPept)
27
CONVENTIONS OR GENERAL SYNTAX Accession
[ACCN]
28
CONVENTIONS OR GENERAL SYNTAX Affiliation
[AD]
29
CONVENTIONS OR GENERAL SYNTAX MeSH major topic — One of the major topics discussed in the article
[MAJR]
29
CONVENTIONS OR GENERAL SYNTAX Author name
[AU]
30
CONVENTIONS OR GENERAL SYNTAX All fields
[ALL]
30
CONVENTIONS OR GENERAL SYNTAX Unique author identifier, such as an ORCID ID
[AUID]
31
CONVENTIONS OR GENERAL SYNTAX Journal title, official abbreviation, or ISSN number — e.g. Journal of Biological Chemistry, J Biol Chem, 0021-9258
[JOUR]
31
CONVENTIONS OR GENERAL SYNTAX Issue of journal
[ISS]
31
CONVENTIONS OR GENERAL SYNTAX Gene name
[GENE]
31
CONVENTIONS OR GENERAL SYNTAX Language
[LA]
32
CONVENTIONS OR GENERAL SYNTAX Organism
[ORGN]
32
CONVENTIONS OR GENERAL SYNTAX Publication date — YYYY/MM/DD, YYYY/MM, or YYYY; insert a colon for date range, e.g., 2016:2018
[PDAT]
33
CONVENTIONS OR GENERAL SYNTAX PubMed ID
[PMID]
34
CONVENTIONS OR GENERAL SYNTAX Protein name (for sequence records)
[PROT]
35
CONVENTIONS OR GENERAL SYNTAX Substance name — Name of chemical discussed in article
[SUBS]
36
CONVENTIONS OR GENERAL SYNTAX Title word
TITL]
36
CONVENTIONS OR GENERAL SYNTAX Secondary source ID — Names of secondary source databanks and/or accession numbers of sequences discussed in article
[SI]
37
ANALYSIS TOOLS Identify conserved domains and motifs in proteins
CDD (conserved domain database)
37
ANALYSIS TOOLS Finds sequence similarities Essential for gene identification, evolutionary studies, and annotation
BLAST (Basic Local Alignment Search Tool)
37
ANALYSIS TOOLS BLAST variants: looks at variants of the same sequences Compare DNA/RNA/Protein sequences
blastn/blastp/blastx
37
ANALYSIS TOOLS Design and validate PCR primers
Primer-BLAST
37
ANALYSIS TOOLS Predict open reading frames in a sequence
ORF Finder (between start and stop codons)
37
FUNCTIONAL AND PATHWAY TOOLS Repository and analysis of gene expression and functional genomics dataset
Gene Expression Omnibus (GEO)
37
FUNCTIONAL AND PATHWAY TOOLS Integration of genes and proteins into biological pathways
BioSystems/Pathway Links
37
FUNCTIONAL AND PATHWAY TOOLS Database linking genetic variation to clinical phenotypes Mutations in disease related genes
ClinVar