PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 95%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771326
1978329
1979635
1980338
19811048
19821866
19831177
19841188
19851098
19869107
198710117
198823140
198946186
199047233
199151284
199265349
1993229578
19944661044
19953681412
19963981810
19975712381
19987733154
19999764130
200010705200
200111186318
200211817499
200316759174
2004229111465
2005257114036
2006291216948
2007329220240
2008304923289
2009316726456
2010319729653
2011301932672
2012316135833
2013347139304
2014429343597
2015360547202
2016410451306
2017450655812
2018427660088
2019472964817
2020570370520
2021511875638
2022304878686