Group Summary Pages

Introduction

In the Protein Data Bank (PDB), a large number of proteins have many structural entries in the archive. This redundancy of biomolecular structures in the archive provides opportunities to explore the range of biological properties, interactions, and functions of these proteins. In most cases these structures provide snap-shots of the protein in a variety of contexts - e.g., in different pH, temperature, or other environments and in the presence of one or more ions, ligands, cofactors, substrates, and/or other biomolecules (proteins, peptides, nucleic acids, carbohydrates, lipids). Grouping structures in different ways can allow users to appreciate the range of interactions and functions of a specific biological macromolecule. Several approaches have been used to organize and group polymer sequences, structures, and assemblies in the PDB archive to simplify and/or expedite analysis, and examine ranges, patterns, and trends in structure-function relationships.

What is a Group Summary Page?

The Group Summary Pages (GSPs) provide overviews of key features, properties, sequence alignments, and annotations of any predetermined or custom group of structures. This is in contrast to the Structure Summary Pages (SSPs) that provide a quick overview of key details of a single structure in the PDB. Each group summary page also provides access to amino acid sequence information of all members of the group, ranges of amino acid sequences that are included in each member of the group, as well as structural and functional annotations of the sequence drawn from a variety of experimental and bioinformatics data resources.

Why use the Group Summary Page?

The Group Summary Pages provides easy access to groups of structures in the archive (organized by specific attributes and features) to explore patterns, trends, and ranges of conformational flexibility of all or parts of the proteins and relationships that may not be obvious in structures coming from individual experiments. This page also provides the opportunity to further refine group membership based on a variety of criteria including experimental features, source organism, protein domain classification, and protein function.

Documentation

Types of Group Summary Pages

Group summary pages are available for predetermined groups, and may be created for custom groups of structures and/or sequences in the PDB archive.

  • Predetermined groups in the PDB archive include groups of structures that share a specific identifier or property. For example:
    • PDB Deposit Group ID based groups of structure (e.g., G_1002057). Membership in these groups is determined by the authors submitting these structures and frequently represents a single protein target bound to a series of ligands, drugs, and/or drug candidates. Other criteria may also be used for deciding on group membership.
    • UniProt based groups including all polymer sequences in the PDB archive with the same UniProt Accession (e.g., P07550)
    • Sequence cluster based groups including all polymer sequences in the PDB archive that belong to a specific sequence cluster. These groups are computed following each weekly update of the archive and may change from week to week. Learn more about sequence clusters in the PDB.
  • Custom groups can be formed by combining specific search/browse criteria with relevant predetermined groups. For example:
    • Grouping search results of any simple or complex query or browse exploration of the archive may be grouped by structures (using PDB Deposit Group IDs) or sequences (using UniProt Accession or sequence clusters).

Accessing Group Summary Pages

The group summary pages can be accessed in a few different ways:

  • By grouping Search/Browse Results
    • The results returned from a simple or composite search or browse explorations may either be listed completely or grouped by specific criteria and either only representatives are shown or lists of groups are shown with links to Custom Group Summary pages. Learn more about options for viewing and grouping search results.
  • From Structure Summary pages
    • You can explore the group members of a whole group using links placed on the Structure Summary page. Links to predetermined groups that are displayed on the page and marked by a box including an icon with a stack of papers (indicating a group of structures). Clicking on these icons (see Figures 1 and 2) will open the corresponding Group Summary pages
Figure 1: Click to view group of structures with a specific PDB Deposit Group ID, as marked in the header section of the Structure Summary page.
Figure 2: In the Macromolecules section of the Structure Summary page (e.g., PDB ID 5qif), click to view groups with a specific UniProt Accessions for protein macromolecule entities, and links to pre-computed sequence clusters (e.g., at 30%, 50%, 70%, 90%, 95%, and 100%) that include the specific PDB entry.
  • From Non-redundant Protein Sequence Statistics tables
    • Search capabilities are available from the tabular view for statistics that summarize unique protein clusters in the PDB archive and growth in these clusters.
      • Select the sequence identity cluster of interest and option to view annual or cumulative growth in the corresponding level of sequence identity.
      • Click on the statistics table at the bottom of the page to view the corresponding grouped polymer entities.
    • New statistics are added to summarize unique UniProtKB entries with known 3D structure from the PDB archive and growth in UniProtKB entries.
      • Select the year of interest to view annual or cumulative growth of structures of polymer entities that map to a known UniProt Accession.
      • Click on the statistics table at the bottom of the page to view the corresponding grouped polymer entities.

Exploring Group Summary Pages

Currently available Group Summary Pages are of the following type:

  • Structure Groups, which presents summaries of the contents and properties of the structures along with a few functional annotations mapped to these structures.
  • Polymer entity Groups, which present summaries of the contents and properties of the structure, a few functional annotations mapped to these polymers, and a variety of views of multiple sequence alignments of polymer sequences in the groups.

Structure Groups

Summary page grouped by PDB Deposit Group ID

This type of Group Summary Page summarizes information about structures entering the PDB archive via the GroupDep system and are assigned PDB Deposit Group Dep IDs. Typically members of these groups are the same protein, with different bound ligands.

The contents and navigation of the Structure Group Summary page is explained with an example - a large-scale fragment screening against the SARS-CoV-2 main protease, one of the proteases essential for viral replication - PDB Deposit Group ID G_1002151.

What is on the page?
  • The top of the page (Figure 3) displays information about the group, a description of what is included in the group (provided by the authors), a place to browse through the images of structures in the group (on the left), and links to explore members in the group.
Figure 3: Top portion of the Group Summary page based on a PDB Deposition Group ID showing grouping criteria, description of what is included in the group, and links to explore group members.
  • Scroll down on the page to see a variety of other information, including:
    • Experimental features of structures in this group are summarized by a histogram showing method of structure determination and where appropriate, resolution of structures in the group.
    • Properties and annotations about the structures in this group are summarized in a series of histograms showing the distribution of source organism, key domains, and functional annotations for members of the group.
    • Small molecules associated with members of the group are listed as two histograms (see Figure 4) - one showing all the small molecules in these structures (including solvent molecules, buffer components, crystallization agents etc.) and another showing only ligands of interest. Learn more about Ligands of Interest here.
Figure 4: Group Summary page bottom showing histograms of number of entries with various small molecules (including Ligands of Interest).
Navigating the page
  • By default, upto 10 rows of functional annotations and 20 rows of ligands are shown on the Group summary page. Additional annotations/ligands are listed below the histogram. Clicking on the + sign can add rows to the histogram and the - sign can be used to show fewer rows till the default limits are reached (see Figure 4).
  • The histograms displayed on the Group Summary page are interactive and can be used to refine membership of the group.
    • Clicking on a single blue bar in the histogram will refine the Group Summary Page. Members that do not match the criteria specified by this bar will be filtered out from this and all other histograms on the page, and shown in gray.
    • For example, clicking on the ligand of interest with chemical component ID 6SU in the above small molecule histogram limits the group to include only (Figure 5).
    • Clicking on a blue bar while holding the “shift” key will invoke search, showing all group members in the PDB archive that match the current condition. So clicking on the 6SU blue bar will yield the search result shown in Figure 6.
Figure 5: Updated histograms on Group Summary Page. Click on a blue bar to launch a search.
Figure 6: Results of Search launched by clicking on a blue bar shown in Figure 5.

Polymer Entity Groups

This type of Group Summary Page summarizes information about polymer entities in the group and also presents a Group Sequence view for exploration of sequence alignments, positional features of the group members, and locations of ligand interactions, where appropriate.

Summary page grouped by UniProt Accession

A majority of PDB structures contain proteins whose sequences and annotations are archived in UniProtKB/SwissProt. Often, PDB structures include partial sequences, stably folded domains, and may include modifications such as engineered mutations or sequence artifacts to facilitate expression or crystallization. The UniProt Group Summary Page provides a comprehensive overview on the relationship between the parts of protein sequence included in the PDB and UniProt data. This can be useful in assessing the availability and extent of 3D structural coverage of the protein of interest and identifying modifications in the sequence.

The contents and navigation of the UniProt Group Summary page is explained with an example - all polymer entities that map to UniProtKB accession P0DTD1 - the Replicase polyprotein 1ab.

What is on the Group Summary Tab?
  • The top of the page (Figure 7) displays information about the Group - the UniProt Accession, protein name and description, options to browse through the images of structures in the group (on the left), and links to explore members in the group.
Figure 7: Top portion of the Group Summary page based on UniProt Accession showing grouping criteria, description of what is included in the group, and links to explore group members.
  • Scroll down on the page to see a variety of other information, including:
    • A histogram showing the release dates of structures with member polymer entities.
    • Experimental features of structures in this group are summarized by a histogram showing method of structure determination and where appropriate, resolution of structures in the group.
    • Properties and annotations about the structures in this group are summarized in a series of histograms showing the distribution of source organism, key domains, and functional annotations for members of the group.
Navigating the page
  • The histograms displayed on the Group Summary page are interactive and can be used to refine membership of the group
  • Polymer entities with specific features can be selected by clicking on the corresponding histogram. All other members of the group are filtered out and shown in gray.
What is on the Group Sequence Tab?

The Group Sequence page shows 3 tabs with multiple sequence alignment of all members of the group, mapping of structural features from various resources on the UniProt sequence, and a mapping of the binding sites on the UniProt sequence.

  • The Alignments tab
    • This tab depicts the sequence information from the UniProtKB and 3D structures from the PDB (Figure 8).
    • Tracks grouped by the orange vertical bar describe regions or sites of interest in the sequence annotated by UniProt.
    • PDB entity tracks, color-coded by the blue vertical bar, show structurally determined regions of PDB sequences and how these regions map to the UniProt sequence. Note that polymer entities present in different PDB structures map to different regions of the UniProt polyprotein sequence.
Figure 8: Sequence Alignment of polymer entities in specific PDB entries, mapped to a specific UniProt sequence, and showing functional regions and annotations.
  • The Structural Features Tab
    • This tab summarizes positional annotations for structural features such as structural domains from CATH, SCOP and PFAM (Figure 9).
    • Tracks grouped by the blue vertical bar present secondary structural features derived from the PDB entries, while those grouped by an orange vertical bar represent annotations from non-PDB data resources.
Figure 9: Annotations of the UniProt sequence based on PDB and other data resources.
  • The Binding Sites tab
    • This tab summarizes the location specific ligand binding to the protein sequence (Figure 10).
    • The Global Bindings track displays the aggregation of all protein-ligand binding sites for all group members.
Figure 10: Individual and Global ligands binding sites mapped to the UniProt sequence.
Summary page grouped by Sequence Identity Clusters

The redundancy in PDB can help with exploring the functions of a protein in different contexts. Each week all protein sequences in the PDB archive are grouped at different levels of sequence identity (e.g., 100%, 95%, 90%, 70%, 50% and 30%) to yield sequence clusters. Learn more about Sequence Clusters here. The Sequence Cluster Group Summary Page provides an overview of closely related sequences in the PDB. Exploring these groups of structures can inform users about the structure-functional range of proteins in these groups.

The contents and navigation of the Sequence Cluster Group Summary page is explained with an example of Ephrin type-A receptor 2 (PDB ID 1mqb) and selecting the 50% sequence cluster.

What is on the Sequence Cluster Group Summary Page?
  • The top of the page (Figure 11) displays information about the group criteria, list of group members, options to browse through the images of structures in the group (on the left), and links to explore members in the group.
Figure 11: Top portion of the Group Summary page based on sequence clusters showing grouping criteria, list of group members, and links to explore them.
  • Scroll down on the page to see a variety of other information, including:
    • A histogram showing the release dates of structures with member polymer entities.
    • Experimental features of structures in this group are summarized by a histogram showing method of structure determination and where appropriate, resolution of structures in the group.
    • Properties and annotations about the structures in this group are summarized in a series of histograms showing the distribution of source organism, key domains, and functional annotations for members of the group.
Navigating the page
  • The histograms displayed on the Group Summary page are interactive and can be used to refine membership of the group
  • Polymer entities with specific features can be selected by clicking on the corresponding histogram. All other members of the group are filtered out and shown in gray.
What is on the Group Sequence Tab?

The Group Sequence page shows 3 tabs with multiple sequence alignment of all members of the group, mapping of structural features from various resources on the UniProt sequence, and a mapping of the binding sites on the UniProt sequence.

  • The Alignments tab
    • This tab depicts sequence identity groups (Figure 12) and displays sequence alignment for group members using the interactive Protein Feature View tool.
Figure 12: Multiple Sequence Alignment of polymer entities mapped to the consensus sequence of the sequence cluster. A. denotes the consensus sequence; B. presents the conservation of amino acid sequence at any given position; and C. shows the occurrences of all amino acids in the aligned position and their relative frequencies.
  • The multiple sequence alignment is generated using Clustal Omega, a general purpose multiple sequence alignment program (Seivers and Higgings 2017). The alignment view initially captures the full length of the protein sequence. When you sufficiently zoom in, you see the polymer composition of aligned regions. A dash (-) symbol represents a gap in the alignment.
  • The Consensus Sequence is shown in the first row of the alignment and shows the most frequent residues found at each position in the sequence alignment. It serves as a simplified representation for a set of sequences.
  • The Conservation row highlights highly conserved and less conserved amino acid positions based on the relative frequency of occurring residues. In protein families highly conserved residues are more likely to have a functional role. This track has a frequency-based coloring scheme going from dark to light blue, with darker blue color representing higher conservation.
  • Even in very close homologous, sequence substitutions can occur at any given alignment position. Point the mouse cursor to any position on the Consensus Sequence track. A tooltip will show the occurrences of all amino acids in the aligned position and their relative frequencies.
  • Note: Multiple sequence alignments are precomputed for full groups. When a group subset is displayed, the original full group alignment is filtered and only the subset sequences are included. If a gap was present in the original alignment it will remain present in the subgroup.
  • The Structural Features tab
    • This tab summarizes positional annotations for structural features on the consensus sequence (Figure 13), such as structural domains or secondary structure assignments.
    • Color gradient indicates the frequency with which a given feature occurs at a given position. More intense colors indicate a higher frequency. Hover over any track at any position to see how often this feature occurs.
Figure 13: Annotations derived from various sources on the consensus sequence of the cluster.
  • The Binding Sites tab
    • This tab collects positional features for residues that interact with ligands.
    • The Global Bindings track displays the aggregation of all protein-ligand binding sites for all group members. Each position in the alignment displays the number of times an aligned residue has been observed to interact with a ligand. The remaining tracks show the protein-ligand binding site frequencies for specific compounds. Thus, each position displays the fraction between the observed residue-ligand interactions and the total number of interactions between the ligand and group members.
Figure 14: Individual and Global ligands binding sites mapped to the cluster consensus sequence.

Example to explore

Query the PDB archive for all polymer entities that map to UniProtKB Accession P0DTD1 (Replicase polyprotein 1ab polyprotein in SARS-CoV-2). Change the Return type to Polymer Entities, group the results at 30% sequence identity cluster and examine the resulting groups.

  • What do these groups represent?
  • What can you learn about these groups of polymer entities?

References

  • Sievers F, Higgins DG. Clustal Omega for making accurate alignments of many protein sequences. Protein Sci. 2018;27(1):135-145. doi:10.1002/pro.3290


Please report any encountered broken links to info@rcsb.org
Last updated: 4/19/2022