Sequence

Introduction

What are sequences?

The specific order of amino acids or DNA or RNA nucleotides in a polymer is referred to as its sequence. In the PDB archive, an amino acid or nucleotide is usually represented by its one letter code using the FASTA format.

Which sequences are displayed here?

The protein and nucleic acid sequences of all entities are presented in the Sequence tab. A reference sequence from an external sequence database is displayed along with the polymer entity sequence where available. In addition, a variety of structural and functional annotations integrated from various sources (PDB, UniProt, and various bioinformatics resources) are also marked in this graphical display.

Why explore sequences?

Viewing the sequence display can provide a quick way to assess how the polymer sequence of the PDB entry relates to the reference sequence (e.g., UniProt sequence). This display can be used to answer many questions, such as

  • Is the polymer in the PDB entry the complete protein, just one or two domains, or merely a short peptide taken from the protein?
  • Does the polymer in the PDB have any mutations?
  • Is the mutation at an important functional site (e.g., catalytic residue in an enzyme or a metal binding site)?
  • Are there parts of the structure that are poorly modeled or are missing from the coordinates?

Documentation

The Interface

The Sequence tab displays the PDB polymer and reference sequences along with a variety of annotations:

Figure 1: Sequence View of the Cytochrome P450 protein for the PDB entry 7lad. Position of the metal binding amino acid is shown in red and blue circles. The hyperlink highlighted by the red oval opens a 3D view of the amino acid or sequence feature selected.
  • The first row in the display lists the sequence of the protein from the PDB entry.
  • The second row shows the reference sequence in a purple rectangle - e.g., UniProt.
  • The data directly derived from the PDB or computed based on data from the PDB are marked with a blue line on the left of the display.
  • Annotations integrated from various bioinformatics resources are marked with an orange line on the left of the display.
  • The display is interactive so you can zoom in and out to examine the sequence(s).
  • Clicking on any residue or sequence feature highlights it in yellow for ease of analysis.
  • The selected residue/feature can also be displayed in 3D by clicking on the “View Feature in 3D” hyperlink (marked with a red oval).
  • Learn more about the conventions used for displaying the sequences and annotations.

Learning about the Structure

The interactive sequence display provides a quick summary of the protein and nucleic acid polymers present in the structure. They help integrate information from a variety of resources and map them on the structure in an easily accessible format. Exploring this page can inform you about the structure and functions of the polymer - where the active site, binding site, etc. are located; where the hydrophilic and hydrophobic regions are located; where the mutations (if any) are present; and much more.

Depending on the polymer being displayed and the amount of information available about it, the sequence view pages can be very long and complex. See an example here.

Exploring other structures

The Sequence Summary page is mostly used to integrate information about various aspects of the polymer being studied from various data resources. However, in the row displaying the UniProt sequence, the UniProt ID for the sequence is included with a hyperlink. Clicking on that (see blue arrow) can open a page to display instances of polymer chains in the PDB that map to all or part of the UniProt sequence.

See also the Structure Summary page (UniProt mapped Resources) for another way to access this information.



Please report any encountered broken links to info@rcsb.org
Last updated: 11/19/2021