Structure Search

Introduction

The functions of biological molecules follow their form (or shape). This in turn means that molecules that have similar shapes or structures have similar functions. The number of structures in the Protein Data Bank (PDB) continues to grow each year. In addition, the size and complexity of the protein structures submitted to the PDB also continue to grow. Many of the structures are assemblies of multiple proteins or multiple copies of a protein. All these factors make finding structurally similar proteins and assemblies challenging.

What is Structure Search?

The Structure Search option allows you to query the PDB archive using the 3D shape of a protein structure. This RCSB PDB developed method (Guzenko et al., 2020) looks at proteins as volumes of space filled by atoms (i.e., density distribution), instead of a collection of atomic coordinates and chain connectivities. The protein volumes are broken down using a mathematical tool known as 3D Zernike polynomials, and are described as vectors of Zernike moments. This approach helps describe volumes with compact descriptors that are invariant to rotation and translation (Novotni and Klein, 2004). The search assesses global 3D-shape similarity using BioZernike descriptors to capture the global volumetric shape of the protein and works really fast for both individual protein chains and assemblies.

Why run a Structure Search?

Finding and classifying structures in the PDB is fundamental to understanding functional and evolutionary relationships. While sequence based searches can reveal conserved domains in proteins, there are many examples in biology where the protein shapes (and functions) are similar, despite sequence variations. Also, sometimes the same protein may adopt more than one conformation, such as open and closed forms of an enzyme. These structures can not be identified using sequence based searches and require structure search options.

Moreover, some proteins are stabilized and/or function as part of an assembly - where it interacts with one or more copies of itself or with other proteins. The structure search option allows you to identify similar assemblies - enabling exploration of shape and interactions of the protein (or its complex).

Documentation

There are a few different options that can be combined to run a Structure Search. These options are being listed here under 3 different sections:

  • Query - this will describe the option you have to input your query
  • Search - this will describe the types of searches that can be run (e.g., strict and relaxed).
  • Results - this will describe options available for what you wish to see in the results page.

Query Options

There are two types of structure searches possible:

  1. search for similar polymeric chains to a given chain
  2. search for similar assemblies to a given assembly

Both these types of structure searches can be launched from two different locations on the website as described here.

Query using the 'Advanced Search' panel

The structure search options are available from the “Advanced Search” panel and can be accessed by typing in a PDB ID in the box listed under Structure Similarity.

Once a PDB ID is typed in the box, some additional options become available. Select the type of search to launch:

For the assembly structure search, select the Assembly ID from the pull-down menus and click on the blue and green magnifying lens icon to launch the search.

For the protein chain based structure search, select the chain ID of the protein of interest and click on the blue and green magnifying lens icon to launch the search.

Query from the Structure Summary page

Each structure in the PDB has a dedicated Structure Summary page that displays information about the entities and assemblies of that entry.

To search for structures similar to any one polymer entity in the structure click on the “Structure” link above the details listed for the macromolecule.

To search for assemblies similar to a specific assembly of the structure click on the “Find Similar Assemblies” link written below the snapshot of the assembly on the page.

Search Options

For any structure search it is possible to choose between two modes of matching by selecting the corresponding radio button:

  • Strict: use this if you want to be sure your matches are all relevant, at the risk of not finding some more distant matches
  • Relaxed: use this if you want to be sure your matches include all similar structures, at the risk of bringing in some False Positives

Note that while the strict or relaxed options may be selected for the structure searches launched from the Advanced Search panel, the searches launched from the Structure Summary Page automatically select the strict search option.

Results

Depending on the options selected, structure search results list similar entities or assemblies.

For entity based searches, each matched entity can be superposed on the query entity and viewed in 3D using the pairwise alignment tool by clicking on the View button next to “Structure Match”

For assembly based searches, each matched assembly is assigned a structure match score, expressed as a percentage of the probability that it matches the query structure. So a score of 100 indicates a perfect match while lower numbers indicate lesser degrees of similarity in the assemblies.

Limitations of Structure Search

The structure search system has some limitations:

  • The method can not report an RMSD since it only produces a global optimal superposition of the volumes but knows nothing about residues that are paired in the alignment. Instead the method outputs a score that indicates the likelihood that the match is relevant.
  • Highly symmetric assemblies often produce false positives (with lower scores), e.g. searching for a D3 point-group symmetric assembly will likely match a few unrelated D3 assemblies with lower scores.
  • Highly symmetric assemblies often produce false positives (with lower scores), e.g. searching for a D3 point-group symmetric assembly will likely match a few unrelated D3 assemblies with lower scores.
  • Flexible NMR structures will often be unmatched due to the long flexible tails
  • Long protruding tails will result in failure to match otherwise globally similar shapes.
  • The matching is global, thus local similarities are not found. For example:
    • when searching for chains: 2 chains that are similar only in some common domain will usually not match,
    • when searching for assemblies: 2 assemblies that are similar in some subset of chains but not globally will usually not match.

Examples

1. Search for entities similar to Myoglobin

  • Launch this search from the Advanced Search interface for PDB ID 1mbn, Chain ID A
  • Select the strict search radio button, Display results as Polymer Entities, and launch the search
  • The search results show many myoglobin entities, some hemoglobin entities, a few neuroglobin and some others entities.

2. Search for entities that are conformationally similar to the open form of hexokinase

  • Use a structure of the enzyme hexokinase in an “open” conformation as a query. Launch this search from the Advanced Search interface for PDB ID 2yhx, Chain ID A
  • Select the strict search radio button, Display results as Polymer Entities, and launch the search
  • The search results show other hexokinase and related proteins. Note that the better matches are hexokinase entities with an open conformation while the matches listed towards the end of the result list include the same or related enzyme entities in the closed conformation.

3. Search for assemblies similar to the SARS-CoV-2 Spike protein trimer

  • The SARS-CoV-2 spike protein is composed of three polymer chains, each of which has a receptor-binding domain that can be in an open (or up) conformation for interacting with cellular receptors or a closed (or down) conformation. The Structure Search functionality can be used to identify spike structures that have a similar arrangement of these domains.
  • To find spike structures where all three receptor-binding domains are closed, launch the structure search from the Structure Summary page for the PDB ID 6vxx, Biological Assembly 1.
  • The search results show similar spike protein assemblies with closed conformations.

4. Search for assemblies similar to Insulin hexamers

  • Launch this search from the Structure Summary page for the PDB entry 1trz, Biological Assembly 3.

The search results show many other similar insulin assemblies, and some unrelated structures at ~12% Structure Match Scores.

References



Please report any encountered broken links to info@rcsb.org
Last updated: 10/15/2021