News

Developers: Switch to New Sequence Cluster Files by April 12

04/04

Sequence clustering based on polymer entity ID are now available to replace files organized by chain identifiers.Sequence clustering based on polymer entity ID are now available to replace files organized by chain identifiers.

RCSB.org has introduced new files that contain the results of the weekly clustering of protein sequences in the PDB by MMseqs2 at 30%, 40%, 50%, 70%, 90%, 95%, and 100% sequence identity. Note that these files use polymer entity identifiers, instead of chain identifiers to avoid redundancy. The files are plain text with one cluster per line, sorted from largest cluster to smallest.

Files containing chain-based clustering will be updated only until April 12, 2022. Users should migrate to the new entity-based files as soon as possible.

This change enables more efficient delivery of sequence clustering data.

News Index