Annotations

CATH
ECOD
SCOP2
OPM
PDBTM
IMGT
SAbDab

Introduction

The Annotations tab aggregates information from various bioinformatics data resources pertaining to all or parts of a structure. While these data and analyses are not directly part of the PDB entries, the information that they present can be useful in learning more about the protein(s) of interest.

What are Annotations?

Annotations are notes, comments, and classifications based on analyses that present different perspectives and information about the subject (in this case the biological molecules in the PDB entry). Some of these annotations are based on identifying and organizing conserved regions in polymer sequences, structural domains, locations of proteins in cells or in membranes, and protein functions.

Why look at Annotations?

Information from annotations can help us develop new hypotheses about the function and interactions of the molecule(s) of interest. They can provide a foundation for creating new knowledge about the molecule(s) being studied.

Documentation

There are several types of annotations that are presented in PDB entries. Some of these annotations are based on analyses and classifications performed using PDB data. A variety of annotations are also integrated from a variety of other data sources.

The annotations documented here can be grouped into the following types based on:

  1. Gene product annotations (Gene Ontology or GO)
  2. Protein sequence - Protein Family annotations (Pfam)
  3. Protein structural domains - SCOPe, CATH, ECOD, SCOP2
  4. Membrane protein annotations - OPM, PDBTM, MemProtMD, mpstruc
  5. Sequence, Structure and Function annotations (for specific molecular classes) - e.g., Antibody (IMGT, SAbDab, and Thera-SAbDab), Antimicrobial Resistance (CARD)

All annotations listed here are listed at the level of either the PDB entry, polymer entity, or instance. For each of the annotations the Chain IDs of the polymers link back to the Sequence View tab where the residue ranges for the annotations are marked (where available).

Gene Product Annotation

The Gene Ontology resource provides a compilation of controlled vocabularies (ontologies) about the functions of gene products from a variety of organisms, ranging from bacteria to humans. Various research groups have used this ontology to annotate the functions of gene products in the PDB to describe the Molecular Function, Cellular Component, and Biological Process. Learn more about the Gene Ontology project and Annotations.

The Interface

The Gene Product (protein and/or RNA) or GO annotations groups all three types of annotation into one table where each row is dedicated to a specific polymer entity (and all its instances within the structure) (Figure A1).

Figure A1: Tabular representation of the Gene Product Annotations of T4 Lysozyme for PDB ID 102l.
Figure A1: Tabular representation of the Gene Product Annotations of T4 Lysozyme for PDB ID 102l.

Learning about the Structure
Examining the table in Figure A1 allows you to learn the following:

  • The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (GO Annotations).
  • A direct link to the Gene Ontology Resource is available in the top right corner (above the table).
  • The Molecular Functions column lists all the functions of the polymer entity (i.e., protein chain, Figure A1) such as catalytic and hydrolase activity.
  • The Biological processes assigned to the protein are catabolic process, cell wall organization, etc.
  • The Cellular Component lists the location(s) of the protein within the cell, such as intracellular, cytoplasmic, etc.

Exploring other structures

  • Each of the hyperlinked words displayed in the Gene Product Annotation table can be used to launch a search for other structures in the archive that have the same GO annotation.
  • The same set of PDB structures can also be identified using three different browse options: Biological Process, Cellular Component, and Molecular Function. Learn more about these browse options by following the links included here.

Protein Sequence Annotations

Comparing protein sequences is the most common way to group and organize protein structures into families. Protein sequences also represent a convenient way to integrate information about sequence conservations, sites of modification, etc.

Protein Family Annotation

Protein Family (Pfam) classifies proteins using multiple sequence alignments and presents annotations for these families. The annotations may be of the following six types:
Pfam entries are classified in one of six types of conservations:

  • Family - a protein region
  • Domain - a stable structural unit
  • Repeat - a short unit which may be unstable in isolation but forms a stable structure when multiple copies are present
  • Motifs - a short unit outside globular domains
  • Coiled-Coil - predominantly contain coiled-coil motifs
  • Disordered - do not have a specific shape, may be intrinsically disordered

Learn more about Pfam here.

The Interface

Annotations from Pfam for protein entities in the PDB are presented as in Figure B1.

Figure B1: Tabular representation of the Pfam annotations for PDB ID 6pa1.

Learning about the Structure

Examining the table in Figure B1 allows you to learn the following:

  • The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (Pfam).
  • A direct link to Pfam is available in the top right corner (above the table)
  • Chains: The Chain IDs in this column indicate the polymers that were annotated with information from Pfam. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity. The residue ranges for the Pfam domain annotations can be seen there.
  • Domain annotations for several chains are displayed in the table with descriptions and comments.

Exploring other structures

  • Each of the hyperlinked words displayed in the Accession and Identifier columns of the annotation table can be used to launch a search for other structures in the archive that have the same Pfam classification.

Protein Structural Domain Annotations

Domains are structurally and functionally stable regions of the protein that can fold and function independently from the rest of the protein. While some proteins are composed of a single domain, there are many proteins that have multiple domains, each with specific shapes, interactions, and functions.

Both shapes and functions of protein domains are conserved in nature and suggest evolutionary relationships. Several algorithms were developed to identify structural domains in the PDB and organize them into databases such as SCOP/SCOPe, CATH, and ECOD. Annotations from these databases are integrated to allow PDB users to learn about a protein’s structure, functions, and evolution.

SCOP/SCOPe

The Structural Classification of Proteins — extended (SCOPe) uses a combination of manual curation and rigorously validated automated methods to classify PDB structures based on structural features and similarities as well as homology and evolution. Learn more about SCOPe classification.

The Interface

Classification information from SCOPe is mapped to PDB structural domains (Figure C1).

Figure C1: Tabular representation of the SCOP/SCOPe classification of hemoglobin alpha and beta chains for PDB ID 4hhb.
Figure C1: Tabular representation of the SCOP/SCOPe classification of hemoglobin alpha and beta chains for PDB ID 4hhb.
  • The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (SCOPe).
  • A direct link to SCOP/SCOPe is available in the top right corner (above the table).
  • The Chains column lists the protein chains for the classified domains.
  • The SCOPe classification for each of the protein chains is listed in the Class, Fold, Superfamily, Family, Domain, and Species columns.
  • The classification presented here was based on SCOPe version 2.08.

Learning About the Structure

  • The structure-based classification of the hemoglobin alpha and beta chains indicates that the proteins have domains that are all alpha helical and contain a globin-like fold.
  • The species- and evolution-based classifications indicate that these proteins belong to the globin family and are of human origin.
  • To learn about the sequence ranges for the SCOPe domain see the SCOPe track (one of the 2 SCOP tracks) in the Sequence tab of the Structure Summary page.

Exploring other structures

  • Each of the hyperlinked words displayed in the SCOPe annotation table can be used to launch a search for other structures in the archive that have the same SCOPe classification.
  • The same set of PDB structures can also be identified using the SCOPe Browse options. Learn more about SCOPe browsing options.

CATH

The CATH database classifies protein domains based on evolutionary relationships using a combination of automated and manual procedures. The classification groups protein domains at four levels - Class, Architecture, Topology (fold family), and Homologous superfamily. Learn more about the CATH classification.

The Interface

Classification information from CATH is mapped to the PDB structural domains (Figure C2).

Figure C2: Tabular representation of the CATH classification of hemoglobin alpha and beta chains for PDB ID 4hhb.
Figure C2: Tabular representation of the CATH classification of hemoglobin alpha and beta chains for PDB ID 4hhb.

Learning About the Structure

  • The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (CATH).
  • A direct link to the CATH database is available in the top right corner (above the table).
  • The Chains column lists the protein chains for the classified domains. Clicking on the hyperlinked chain IDs will open the Sequence tab showing the polymer entity. The residue ranges for the CATH domain annotations can be seen there.
  • The CATH classification for each of the protein chains is listed in the Class, Architecture, Topology, Homology columns.
  • The classification presented here is based on CATH version (4.2.0).

Exploring other structures

  • Each of the hyperlinked words displayed in the CATH annotation table can be used to launch a search for other structures in the archive that have the same CATH classification.
  • The same set of PDB structures can also be identified using the CATH Browse options. Learn more about CATH browsing options.

ECOD

Evolutionary Classification of protein Domains (ECOD) is a hierarchical classification of protein domains organized according to their evolutionary relationships. The domains are organized into the following five levels:

  1. (A) architecture
  2. (X) possible homology
  3. (H) homology
  4. (T) topology
  5. (F) family

Learn more about ECOD classifications.

The Interface

Annotations from ECOD mapped to PDB are presented as in Figure C3.

Figure C3: Tabular representation of the ECOD annotations for PDB ID 6xzl.
Figure C3: Tabular representation of the ECOD annotations for PDB ID 6xzl.

Learning About the Structure

Examining the table in Figure C3 allows you to learn the following:

  • The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (ECOD).
  • A direct link to ECOD is available in the top right corner (above the table).
  • Chains: The Chain IDs in this column indicate that the polymers were annotated with information from ECOD. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity. The residue ranges for the ECOD domain annotations can be seen there.
  • The Domain Identifier column provides a link to the page in ECOD with additional information and a graphic to view the domain within the context of the full protein.
  • Annotations of the protein domain include family, topology, homology, possible homology, and architecture.

Exploring other structures

  • Explore other structures in the archive that have the same ECOD classification using the ECOD Browse options. Learn more about ECOD browsing options.

SCOP2

The SCOP2 database classifies representative structures with unique protein domains and extends the classification to related entries using SIFTS. Learn more about SCOP2 classification.

The Interface

Classification information from SCOP2 is mapped to PDB structural domains (Figure C4)

Figure C4: Tabular representation of the SCOP2 classification of hemoglobin alpha and beta chains for PDB ID 4hhb.
Figure C4: Tabular representation of the SCOP2 classification of hemoglobin alpha and beta chains for PDB ID 4hhb.
  • The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (SCOP2).
  • A direct link to SCOP2 is available in the top right corner (above the table).
  • The Chains column lists the protein chains for the classified domains.
  • The SCOP2 classification for each of the protein chains is listed in the Family name, while identifiers for the Domain and Family are linked to the corresponding pages in the SCOP2 database.
  • The classification presented here was based on SCOP2B (dated 2022-02-25).

Learning About the Structure

Examining the table in Figure C4 allows you to learn the following:

  • The structure-based classification of the hemoglobin alpha and beta chains indicates that the proteins have domains that contain a globin-like fold.

Exploring other structures

Membrane Protein Annotations

Membrane proteins are different from soluble proteins because parts of their structure either exist within the interior of a membrane or are associated with its surface. Several approaches have been used to organize these proteins into groups to study their membrane association as well as their overall structure and functions. Information from a few of these classifications have been mapped to PDB structures, and these annotations are described here. Learn more about Membrane Proteins Resources in the PDB.

OPM

The Orientations of Proteins in Membranes (OPM) database classifies membrane proteins based on their transmembrane or membrane-associated domain. Learn more about OPM.

The Interface

The OPM classification was built using SCOP and TCDB but has some unique features. It has four levels of hierarchy (Figure D1):

  • Type classifies a protein as transmembrane, monotopic/peripheral, or membrane-active.
  • Class groups the proteins by secondary structure, either all-α, all-β, α+β, α/β, or nonregular.
  • Superfamily groups evolutionarily related proteins with superimposable 3D structures.
  • Family includes proteins with detectable sequence homology.
Figure D1: Tabular representation of OPM Annotations of Klebsiella pneumoniae OmpK36 for PDB ID 5o79.
Figure D1: Tabular representation of OPM Annotations of Klebsiella pneumoniae OmpK36 for PDB ID 5o79.

Learning About the Structure

Examining the table in Figure D1 allows you to learn the following about OmpK36:

  • The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (OPM).
  • A direct link to the OPM database is available in the top right corner (above the table).
  • Chains A, B, and C are all instances of the OmpK36 protein in this structure. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity. The residue ranges for the OPM domain annotations can be seen there.
  • An external link provides access to a page in the OPM resource where you can learn more about this protein and see a graphical representation of the membrane position relative to the protein structure.
  • The protein is a transmembrane, beta-barrel protein that is trimeric and has a general bacterial porin structure.

Exploring other structures

  • Each of the hyperlinked words displayed in the OMP annotation table can be used to launch a search for other structures in the archive that have the same OPM classification.
  • The same set of PDB structures can also be identified using the OPM Browse options. Learn more about OPM browsing options.

PDBTM

The Protein Data Bank of Transmembrane Proteins (PDBTM) classifies transmembrane proteins using the TMDET algorithm. Learn more about PBDTM.

The Interface

Information from PDBTM is used to identify this as a membrane protein (Figure D2).

Figure D2: Tabular representation of PDBTM Annotations of Klebsiella pneumoniae OmpK36 for PDB ID 5o79.
Figure D2: Tabular representation of PDBTM Annotations of Klebsiella pneumoniae OmpK36 for PDB ID 5o79.

Learning About the Structure

Examining the table in Figure D2 allows you to learn the following about OmpK36:

  • The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (PDBTM).
  • A direct link to the PDBTM database is available in the top right corner (above the table).
  • The Chains A, B, and C are all instances of the OmpK36 protein in this structure. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity. The residue ranges for the PDBTM domain annotations can be seen there.
  • An external link provides access to a page in PDBTM and a graphical representation of the membrane position relative to the protein structure.

Exploring other structures

  • The hyperlink “Annotated as Membrane Protein by PDBTM” displayed in the PDBTM annotation table can be used to launch a search for other structures in the archive that have the same annotation.

MemProtMD

This is a database of intrinsic membrane protein structures identified in the Protein Data Bank and studied using molecular dynamics after insertion into simulated lipid bilayers. A coarse-grain self-assembly approach is used for the molecular dynamics simulations. Learn more about MemProtMD.

The Interface

Information from MemProtMD is used to identify this as a membrane protein (Figure D3).

Figure D3: Tabular representation of MemProtMD Annotations of Klebsiella pneumoniae OmpK36 for PDB ID 5o79.
Figure D3: Tabular representation of MemProtMD Annotations of Klebsiella pneumoniae OmpK36 for PDB ID 5o79.

Learning About the Structure

Examining the table in Figure D3 allows you to learn the following about OmpK36:

  • The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (MemProtMD).
  • A direct link to the MemProtMD database is available in the top right corner (above the table).
  • The Chains A, B, and C are all instances of the OmpK36 protein in this structure. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity.
  • An external link provides access to a page in MemProtMD and a graphical representation of the membrane position around the protein structure. Simulations in the lipid membrane are also available.

Exploring other structures

  • The hyperlink “Annotated as Membrane Protein by MemProtMD” displayed in the MemProtMD annotation table can be used to launch a search for other structures in the archive that have the same annotation.

mpstruc

The membrane proteins of known 3D structure (mpstruc) is a manually curated database that organizes membrane proteins by secondary structure and interactions with the membrane (transmembrane or monotopic). Learn more about mpstruc.

The Interface

Information from mpstruc is used to identify this as a membrane protein (Figure D4).

Figure D4: Tabular representation of mpstruc annotations of Klebsiella pneumoniae OmpK36 for PDB ID 5o79.
Figure D4: Tabular representation of mpstruc annotations of Klebsiella pneumoniae OmpK36 for PDB ID 5o79.

The three main groups in the mpstruc classification are:

  • Monotopic Membrane Proteins
  • Transmembrane Proteins: Beta-Barrel
  • Transmembrane Proteins: Alpha-helical

Learning About the Structure

Examining the table in Figure D4 allows you to learn the following about OmpK36:

  • The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (mpstruc).
  • A direct link to the mpstruc database is available in the top right corner (above the table).
  • The Chains A, B, and C are all instances of the OmpK36 protein in this structure.
  • An external link provides access to the home page for mpstruc. You can expand the groups shown on that page by clicking on the “+++” signs to open up the entire classification. As an exercise, you can do a search for a PDB ID of interest on the mpstruc page to see its classification.
  • The mpstruc classification of the OmpK36 protein places it in the “Transmembrane Proteins: Beta-Barrel” group and the “Beta-Barrel Membrane Proteins: Porins and Relatives” subgroup.

Exploring other structures

  • Each of the hyperlinks displayed in the mpstruc annotation table can be used to launch a search for other structures in the archive that have the same mpstruc classification.
  • The same set of PDB structures can also be identified using the mpstruc Browse options. Learn more about the mpstruc browsing options.

Sequence, Structure and Function Annotations

There are several classes of molecules in the PDB that have specific structural compositions and functional roles in biology. Their sequences and structures may show too wide a range of variations to be meaningfully classified and studied by sequence or structure, so several projects have grouped these molecules by their functions. Examples of these classifications are included here.

Antibody Annotations

Although antibodies are products of the adaptive immune system, understanding of their structures and functions has enabled scientists to design molecules that represent the functionally important regions of antibodies and produce them for diagnostics, therapeutics, and research. Information from two antibody databases were mapped to PDB data and antibody annotations from the above resources are presented in a tabular format with provenance and version information.

IMGT

The international ImMunoGeneTics information system (IMGT) is a resource that provides access to sequence, genome, and structure Immunogenetics data, and web-based interactive tools to explore them. Learn more about IMGT annotations.

The Interface

Antibody information integrated from IMGT is mapped on polymer entities in the structure (Figure E1).

Figure E1: Tabular representation of the Antibody Annotations of cetuximab from IMGT for PDB ID 6ayn.
Figure E1: Tabular representation of the Antibody Annotations of cetuximab from IMGT for PDB ID 6ayn.

Learning About the Structure

  • The orange banner (color of the header row in the table shown) indicates that information presented here was integrated from the IMGT resources.
  • A direct link to the IMGT databases is available in the top right corner (above the table).
  • Chains: This column lists the protein chains that were used to run the sequence matches against IMGT data. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity. Residue ranges for the Immunoglobulin domain annotations can be seen there.
  • Various features such as protein name, source organism, domain names and descriptions listed in this table show that this structure has two copies of a Fab fragment. Each Fab fragment of the chimeric therapeutic antibody cetuximab has a heavy and a light chain.
  • The Description column provides a link to the page in IMGT with additional information about this chain in the context of the full protein.
SAbDab

The Structural Antibody Database (SAbDab) annotates all antibody structures in the PDB, including experimental details, antibody nomenclature (e.g. heavy-light chain pairings), curated affinity data, and sequence annotations.

The Interface

Antibody information integrated from SAbDab is mapped on to polymer entities in the structure (Figure E2).

Figure E2: Tabular representation of the Antibody Annotations from SAbDab for the antibody present in PDB ID 4jn2.
Figure E2: Tabular representation of the Antibody Annotations from SAbDab for the antibody present in PDB ID 4jn2.

Learning About the Structure

  • The orange banner (color of the header row in the table shown) indicates that information presented here was integrated from the SAbDab resource.
  • A direct link to the SAbDab database is available in the top right corner (above the table).
  • Chains: This column lists the protein chains that were used to run the sequence matches. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity and various annotations.
  • Chain Subclass: This column lists the names of the polymer chain entities forming the antibody. Clicking on the hyperlinked names will open the entry specific page in the SAbDab database.
  • Chain Type: Where available, this column lists the name of the heavy or light chain type present in the antibody.
  • Antigen Name: This column lists the name of the antigen that this antibody binds.
Thera-SAbDab

The Therapeutic Structural Antibody Database (Thera-SAbDab) is a database of immunotherapeutic variable domain sequences and their representatives in SAbDab. It includes close sequence matches, too (e.g., 95-98% seqID, and 99% seqID). Learn more about Thera-SAbDab annotations.

The Interface

Antibody information integrated from Thera-SAbDab is mapped on to polymer entities in the structure (Figure E3).

Figure E3: Tabular representation of the Antibody Annotations of Idarucizumab from Thera-SAbDab for PDB ID 4jn2.
Figure E3: Tabular representation of the Antibody Annotations of Idarucizumab from Thera-SAbDab for PDB ID 4jn2.

Learning About the Structure

  • The orange banner (color of the header row in the table shown) indicates that information presented here was integrated from the Thera-SAbDab resource.
  • A direct link to the Thera-SAbDab database (search interface) is available in the top right corner (above the table).
  • Name: This structure has two copies of a Fab fragment of the therapeutic antibody Idarucizumab. Clicking on the hyperlinked name opens the therapeutic antibody specific page in Thera-SAbDab.
  • Target: This antibody targets the molecule Dabigatran.

Antimicrobial Resistance Annotations (from CARD)

The Comprehensive Antibiotic Resistance Database (CARD) provides curated reference sequences of antibiotic resistance genes, proteins, and their phenotypes, organized by the Antibiotic Resistance Ontology ("ARO"). PDB structures with proteins that perfectly or closely match (>95% sequence identity and 80% sequence coverage) the antibiotic resistance genes, are identified and linked to CARD annotations. The matched protein’s gene name, ARO identifier, description, impacted drug classes, and resistance mechanism are listed on the Annotations page.

The Interface

When the sequences of the PDB protein and reference Antimicrobial Resistance Gene have an exact match, information from CARD is mapped to the PDB entry in two different tables, (Figure E4).

Figure E4: Tabular representation of gene and gene family annotations from the CARD for IMP-1 beta lactamase from Pseudomonas aeruginosa, PDB ID 1jje.
Figure E4: Tabular representation of gene and gene family annotations from the CARD for IMP-1 beta lactamase from Pseudomonas aeruginosa, PDB ID 1jje.

When the sequences of the PDB protein and reference Antimicrobial Resistance Gene have <95% sequence identity, only the gene family annotations are included on the Annotations page (Figure E5)

Figure E5: Tabular representation of gene family annotations from the CARD for an aminoglycoside antibiotic inactivating protein AAC(6), from Escherichia coli, PDB ID 2bue.
Figure E5: Tabular representation of gene family annotations from the CARD for an aminoglycoside antibiotic inactivating protein AAC(6), from Escherichia coli, PDB ID 2bue.

Learning About the Structure

Examining the tables in Figure E4 and E5 allows you to learn the following about determinant of antibiotic resistance present in these PDB entries:

  • The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (CARD).
  • A direct link to the CARD database is available in the top right corner (above the table marked by the red-outlined oval shape).
  • Chains: This column lists the protein chains that were used to run the sequence matches against CARD data.
  • Accession: This column lists the gene or gene family ARO identifier, mapped to the proteins listed. The ARO identifier is linked to the CARD page that lists additional information about the gene or gene family.
  • The AMR Gene table lists its name and a description of the gene, including common organisms that are known to have this gene.
  • The AMR Gene family table lists the gene family name, drugs impacted by members of this family, and the antibiotic resistance mechanism.
  • Provenance Source (Version): The CARD classification presented here is based on the version 3.2.6.

Exploring other structures

The table in Figures E2 and E3 presents options to launch a PDB query by example or PDB Advanced Search using the gene/gene family name(s) (shown with a black outlined rectangles.



Please report any encountered broken links to info@rcsb.org
Last updated: 7/21/2023