News
Advanced Search: Sequence Motif
03/30
The RCSB PDB's Advanced Search lets users build queries of specific types of data. To look for structures with a particular Sequence Motif, try using one of these techniques with the Sequence Features>Sequence Motif option. A Sequence Motif can be an exact sequence or a sequence pattern expressed by regular expression syntax. Regular expressions are powerful notations for defining complex sequence patterns. Click on the sequence to run the example queries:
- Short Sequence Fragments
The sequence motif search, unlike BLAST or FASTA, can search for short sequence fragments of any size, such as NPPTP
- Wildcard Searches
Use an 'X' in the sequence for wildcard searching. For example, XPPXP can be entered to look for SH3 domains using the consequence sequence -X-P-P-X-P (where X is a variable residue and P is Proline)
- Multiples of Variable Residues
The {n} notation can be used, where n is the number of variable
residues. To query a motif with 7 variables between residues W and G,
and 20 variable residues between G and L, try WX{7}GX{20}L
- Ranges of Variable Residues
The {n,m} notation can be used to indicate ranges of variable
residues, where n is the minimum and m the maximum number of
repetitions. For example the zinc finger motif that binds Zn in a
DNA-binding domain can be expressed as: CX{2,4}CX{12}HX{3,5}H
- Motifs at the Beginning of a Sequence
The '^' operator searches for sequence motifs at the beginning of a
protein sequence. Two ways of looking for sequences with N-terminal
histidine tags are: ^HHHHHH and ^H{6}
- Alternative Residues
Square brackets specify alternative residues at a particular
position. To search for a Walker (P loop) motif that binds ATP or GTP,
try: [AG]XXXXGK[ST]
The search will look for sequences with A or G, followed by 4 variable residues, then G K, and finally S or T.
Questions about this feature may be sent to info@rcsb.org.