News

Advanced Search: Sequence Motif

03/30

The RCSB PDB's Advanced Search lets users build queries of specific types of data. To look for structures with a particular Sequence Motif, try using one of these techniques with the Sequence Features>Sequence Motif option. A Sequence Motif can be an exact sequence or a sequence pattern expressed by regular expression syntax. Regular expressions are powerful notations for defining complex sequence patterns. Click on the sequence to run the example queries:

  •   Short Sequence Fragments

The sequence motif search, unlike BLAST or FASTA, can search for short sequence fragments of any size, such as NPPTP

  •   Wildcard Searches

Use an 'X' in the sequence for wildcard searching.  For example, XPPXP can be entered to look for SH3 domains using the consequence sequence -X-P-P-X-P (where X is a variable residue and P is Proline)

  •   Multiples of Variable Residues

The {n} notation can be used, where n is the number of variable residues. To query a motif with 7 variables between residues W and G, and 20 variable residues between G and L, try WX{7}GX{20}L

  •   Ranges of Variable Residues

The {n,m} notation can be used to indicate ranges of variable residues, where n is the minimum and m the maximum number of repetitions. For example the zinc finger motif that binds Zn in a DNA-binding domain can be expressed as: CX{2,4}CX{12}HX{3,5}H

  •   Motifs at the Beginning of a Sequence

The '^' operator searches for sequence motifs at the beginning of a protein sequence. Two ways of looking for sequences with N-terminal histidine tags are: ^HHHHHH and ^H{6}

  •   Alternative Residues  

Square brackets specify alternative residues at a particular position. To search for a Walker (P loop) motif that binds ATP or GTP, try: [AG]XXXXGK[ST]

The search will look for sequences with A or G, followed by 4 variable residues, then G K, and finally S or T.

Questions about this feature may be sent to info@rcsb.org.


News Index