PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 90%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
1981844
19821862
19831173
19841184
19851296
19869105
198711116
198825141
198945186
199048234
199154288
199266354
1993217571
1994427998
19953221,320
19963851,705
19975312,236
19987052,941
19998363,777
20009374,714
20019995,713
200210606,773
200314808,253
2004204310,296
2005225012,546
2006251715,063
2007286717,930
2008263720,567
2009269123,258
2010271825,976
2011249028,466
2012269431,160
2013290334,063
2014358037,643
2015292440,567
2016340243,969
2017360047,569
2018346551,034
2019369854,732
2020457059,302
2021401063,312
2022492368,235
2023462972,864
2024479277,656
2025550383,159
202630883,467