describes the flow of genetic information within a biological system — how information stored in DNA is used to make proteins, which carry out most cellular functions.
CENTRAL DOGMA IN BIOLOGY
1967: developed an automated protein sequencer called “Sequanator”
is a book, an encyclopedia of protein sequences; they publish it monthly because of new proteins arising
The atlas of protein sequence and structure
Structured way of looking into the information; structuring information depends on the biomolecule (DNA, RNA, Proteins, Whole Organisms
Working with tools that move the information into something that’s meaningful in an analysis – information to knowledge
BIOINFORMATICS
1967: developed the “Erdman Degradation Reaction”
Pehr Victor Erdman
Developed the Atlas of Protein Sequence and Structure at the National Biomedical Research Foundation where she was an Associate Director
1965
Margaret Dayhoff
Bioinformatics started with _________; they actually thought that it was the information material
proteins
a database of collections from collected, translated and curated proteins from:
SwissProt
TrEMBL
PIR
PDB
GenBank
PRF
RefSeq
TPA
NCBI Protein Database in NCBI:
1985: Nobel prize for the Polypeptide Theory of protein sequence
He was looking into polypeptide structures
Frederick Sanger
By figuring out what kind of structures, you can somehow figure out how vaccines are made, how medicines interact with cells – general connection was a black box in the past
protein structures
ATLAS OF PROTEIN SEQUENCE AND STRUCTURE (1967-68) Eventually became the
Resource Protein Sequence Database - COLLECTED
AMOS BAIROCH major contributions to bioinformatics
1986 Swiss Prot
1990 TrEMBL
2002 UniProtKB
NCBI Protein Database in NCBI:
For translation of EMBL nucleotide sequences
Supplement to Swiss-Prot initially consisted of computationally annotated sequence entries derived from the translation of all coding sequences (CDSs) found in INSDC databases – TRANSLATED
1990 TrEMBL
bank where 3D structure of the proteins are stored
Protein Data Bank (PDB)
is one of the earliest and most important biological databases developed to organize protein sequence information. It played a key role in bioinformatics and molecular biology.
One of the first bioinformatics projects in history.
Helped establish computational biology as a scientific field.
Provided the foundation for today’s sequence databases and bioinformatic tools.
houses the sequences of protein itself
PIR (PROTEIN INFORMATION RESOURCE)
Swiss-Prot (manual, reviewed) + TrEMBL (automatic, unreviewed)
Together, they provide:
“A complete, reliable, and accessible source of protein knowledge for biological and biomedical research.”
Serves as the central protein database used globally by scientists.
Integrates data from genomics, proteomics, and structural biology.
Enables protein identification, functional prediction, and comparative analysis.
2002 UniProtKB
Universal Protein Resource Knowledgebase
NCBI Protein Database in NCBI:
Curated protein sequences - EMBL Europe
SwissProt
NCBI Protein Database in NCBI:
Translated sequences - EMBL Europe
TrEMBL
A curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc.), a minimal level of redundancy and a high level of integration with other databases – CURATED
1986 Swiss Prot
NCBI Protein Database in NCBI:
Submitted sequences - USA
PIR
Why did they put these all together in NCBI ?
Synergy that comes with information → better predictions, better functional annotations
We can now understand what big data means
Now that we have a lot of data, it’s now easier to understand the 3D structure of protein based on sequence when you have so many sequences – Molecular aspect
NCBI Protein Database in NCBI:
Three-dimensional data on proteins and nucleotides - PDB
PDB
NCBI Protein Database in NCBI:
Annotated - NCBI-USA
RefSeq
NCBI Protein Database in NCBI:
Amino acids sequence data and translations - Japan
PRF