PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 95%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771326
1978329
1979635
1980338
1981947
19821865
19831176
19841187
19851097
19869106
198710116
198823139
198945184
199047231
199151282
199265347
1993228575
19944691,044
19953671,411
19964001,811
19975732,384
19987743,158
19999744,132
200010725,204
200111176,321
200211847,505
200316739,178
2004229411,472
2005256814,040
2006291016,950
2007329420,244
2008304723,291
2009316126,452
2010319529,647
2011302032,667
2012316735,834
2013346539,299
2014428443,583
2015361347,196
2016424951,445
2017449055,935
2018422960,164
2019469464,858
2020569570,553
2021507675,629
2022448880,117