RCSB PDB Help

Basic Search


Video: Simple Searches for Experimental Structures and Computed Structure Models (CSMs)


Overview

What is Basic Search?

Located in the top bar of the website, this search can be used to searches for molecular structures (both experimental structures and computed structure models or CSMs).

Why use Basic Search?

Use this search option to quickly launch a text-based search for a biomolecular structure, ligand, or assembly or a sequence based search and find matches in either PDB entries and/or CSMs. Basic search may be run on both experimental structures and CSMs (when the “Include CSM” toggle switch is turned on). Searches may be launched with

  • name of the protein, gene, author, ligands, keywords etc. that are present in the structure.
  • specific identifiers related to the structure of interest (e.g., PDB IDs), gene/protein sequences (i.e., GenBank and UniProt IDs) and ligands (e.g., chemical component or BIRD molecule IDs). Note: When the “Include CSM” toggle switch is 'on' CSM identifiers (e.g., AlphaFold IDs, Model Archive IDs) available from the literature or other public data resources may also be used here.
  • a polymer sequence that has >25 building blocks (e.g., amino acids, nucleotides) and is submitted in FASTA format

Basic Search Options (Infographic)

Click on the image to access the full infographic ...
Click on the image to access the full infographic ...

Documentation

Interface Description

From any page on the site, a Basic Search can be run by entering a search term or sequence in the search box in the Top Bar. First select the type of search you wish to perform (i.e., include or exclude CSMs) see Figure 1.

Figure 1: Basic Search showing options to include or exclude CSMs
Figure 1: Basic Search showing options to include or exclude CSMs

The toggle switch within the textbox can be used to include or exclude CSMs. By default this toggle switch is turned off. To include CSMs, turn the toggle switch ‘on’—i.e., slide the white circle in the switch towards the right so that it is cyan colored. To exclude CSMs, turn the switch ‘off’—i.e., slide the switch towards the left so that it is colored gray.

Click on the magnifying lens icon on the right end of the text box to launch the search.

As you enter a term or phrase, you will see suggestions listed in a box that appears below the search box.

Here are some tips for executing a Basic search in the PDB Archive mode:

  • Full-text search:
    • You can type a word or phrase in the top bar search box and click the Search icon, or hit the 'Enter' key. This will perform a 'full-text' search against multiple fields in the database to find matches for the entered word or phrase.
    • The Basic Search is meant to be broad and inclusive. It uses text-based information present in various fields of archival PDBx/mmCIF data and data from external resources mapped to PDB structures.
    • The search examines all of the words across multiple fields and term(s) may match more than one field. This can cause distantly related matches to appear in the search results. For example, a search for citrate can find enzymes with this word as part of its name (e.g., citrate synthase, citrate lyase) and also structures with the small molecule citrate bound to it.
    • Basic search Full-text search query language supports the following syntax to express Boolean operators, such as AND, OR or NOT:
      • By default, searching multiple keywords is executed using a Boolean OR - i.e., a search for Word1 Word2 will find matches in the PDB archive with either Word1 or Word2. You can also use a | (or pipe) sign to join search words with OR. For example, a search for Citrate Synthase, which may also be written as Citrate | Synthase, will find matches in the PDB archive for entries with either Citrate or Synthase in one or more fields. Note that including the word OR in the top bar search box is interpreted as a search term and not as a Boolean operator.
      • A + (or plus) sign can be used to execute a Boolean AND operation - i.e., a search for Word1 + Word2 will find matches in the PDB archive with both Word1 and Word2. For example, a search for Citrate + Synthase will find PDB entries with both Citrate and Synthase. The matched words may be separated by other words in the text, appear in a different order in the sentence, or may even be matched in different fields. Note that including the word AND in the top bar search box is interpreted as a search term and not as a Boolean operator.
      • “ “ (or quotation) marks around a number of words signifies a phrase for searching - i.e., searching for “Word1 Word2” will find matches in the PDB archive that include Word1 AND Word2 next to each other in a phrase with no other words between them. For example, “Citrate Synthase” will find PDB entries with the phrase in one or more fields.
      • A - (or minus) sign placed in front of a Word1 is executed as a Boolean NOT - i.e., the search will find matches in the PDB archive which do not contain Word1. Note that including the word NOT in the top bar search box is interpreted as a search term rather than a Boolean operator. Also note that placing quotes around the -Word1 will search for the phrase containing a dash i.e. it will find matches for the search term “- Word1”.
      • () (or parenthesis) around a word and/or a sign in the query can specify the order in which search terms should be executed. Two examples are listed below
        • Search for -(Word1 + Word2) will find matches in the PDB archive that do not include either of the words Word1 orWord2, so for example -(Citrate + Synthase) will find matches in the PDB that do not include either of the words citrate or synthase in one or more fields. Search results may include matches with only citrate or only synthase in one or more fields.
        • Search for -(Word1 | Word2) will find matches in the PDB archive that include neither Word1 nor Word2, so for example -(Citrate|Synthase) will return entries that do not contain Citrate, Synthase, or any combination of these words in any field.
  • Auto-suggestion lists:
    • As you type query word(s) or phrases in the top bar search box, a list of suggestions appear in a box below, grouped by attribute or field name, indicating a specific field in which the search term was found.
    • Click on any term from the auto-suggest list to execute a search where the selected term matches the specified attribute.
    • In the Basic search a long list of auto-suggestions may be possible. The lists in each group of the auto-suggestions are organized alphabetically and only a few top matches are listed. Completing the word(s) in the query can help refine or shorten the lists and show more relevant matches. See also Advanced search options to refine the query results.
  • Advanced Query Builder options:

A tabular summary of the symbols that can be used to combine search terms with Boolean operators

Action Operator Description Example
OR Multiple keywords, | Will find entries containing either Word1 or Word2 Citrate Synthase Citrate | Synthase
AND + or plus sign Will find entries containing both Word1 and Word2 anywhere in the entry. Citrate + Synthase
NOT - or minus sign Will find entries where Word1 is not found anywhere in the entry. -Citrate (Note searching for “-Citrate” with quotes will return entries containing the phrase -Citrate)
Indicate order of search terms ( ) or parenthesis Placing parentheses around search terms will indicate the order of the search. -(Citrate+Synthase) -(Citrate | Synthase)
Search for a phrase " " or quotations Using quotes around a search term will find entries containing that exact phrase. “Citrate Synthase”

Here are some tips for executing a Basic search:

  • This search employs an implied “contains words” or “contains phrase” strategy. This means that if the user enters a word or list of words and clicks the Search Icon, the search is processed as “contains words”, and will return results containing any of the words in the webpage, file, or metadata associated with it.
  • If a phrase is selected from the autosuggest list, or entered within quotes (e.g., "set of words") the search is processed as “contains phrase”.
  • Note that if there are no documents/pages that match the query phrase, the query is automatically changed into a "contains word" search.

Search results

The search results are listed as structures, entities, assemblies, or molecular definitions that match the query. By default the search results are ordered by a relevance score for the query options defined.

Relevancy Scoring

The text-search functionality is powered by Elasticsearch, an open-source software that enables the construction and execution of highly-customizable and complex queries to retrieve specific results relevant to the research question. By default, the results of searches are sorted by "relevancy score," which Elasticsearch calculates. This takes into account the frequency of the given search term(s) in different fields of each result (e.g., does the query word/phrase appear in the title, description, organism) along with how closely the search term(s) match the terms in those fields. The final output from this scoring process is a ranked set of results, in which those with higher calculated relevancy scores are listed first, followed by those with lower relevancy scores. More details about how this search algorithm works may be found in this Elastic blog post (specifically, see the section titled, "How documents are ranked in Elasticsearch").
In addition to relevancy scoring, several other options to reorder the results are available - e.g., based on release date, structure quality, priority showing experimental structures first or last, etc.. Note that depending on the sorting option selected, some search results may be ordered such that the CSMs are listed at the top of the results page. Scroll through all the results and/or adjust your query and sorting criteria in order to identify the structures that meet your needs.

Examples

  1. Basic search for “allosteric regulator” in the 3D Structure mode (PDB structures alone).
  2. Basic search for “allosteric regulator” in the 3D Structure mode (PDB structures and CSMs).
  3. Basic search for “insulin receptor” in the 3D Structure mode (PDB structures and CSMs).
  4. Basic search for a protein sequence in the 3D Structure mode (PDB structures and CSMs).
  5. Basic search for a nucleic acid sequence in the 3D Structure mode (PDB structures and CSMs).


Please report any encountered broken links to info@rcsb.org
Last updated: 10/31/2024