Advanced Search: Sequence Motif

(Click image to enlarge)
Examples of zinc fingers from the Molecule of the Month
The RCSB PDB's Advanced Search lets users build queries of specific types of data. To look for structures with a particular Sequence Motif, try using one of these techniques with the Sequence Features>Sequence Motif option. A Sequence Motif can be an exact sequence or a sequence pattern expressed by regular expression syntax. Regular expressions are powerful notations for defining complex sequence patterns. Click on the sequence to run the example queries:
- Short Sequence Fragments
Use the sequence motif search to search for short sequence fragments of any size, such as NPPTP
- Wildcard Searches
Use an 'X' in the sequence for wildcard searching. For example, XPPXP can be entered to look for SH3 domains using the consequence sequence -X-P-P-X-P (where X is a variable residue and P is Proline)
- Multiples of Variable Residues
The {n} notation can be used, where n is the number of variable residues. To query a motif with 7 variables between residues W and G, and 20 variable residues between G and L, try WX{7}GX{20}L
- Ranges of Variable Residues
The {n,m} notation can be used to indicate ranges of variable residues, where n is the minimum and m the maximum number of repetitions. For example the zinc finger motif that binds Zn in a DNA-binding domain can be expressed as CX{2,4}CX{12}HX{3,5}H
- Motifs at the Beginning of a Sequence
The '^' operator searches for sequence motifs at the beginning of a protein sequence. Two ways of looking for sequences with N-terminal histidine tags are ^HHHHHH and ^H{6}
- Alternative Residues
Square brackets specify alternative residues at a particular position. To search for a Walker (P loop) motif that binds ATP or GTP, use [AG]XXXXGK[ST]
The search will look for sequences with A or G, followed by 4 variable residues, then G K, and finally S or T.