PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 90%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771225
1978328
1979533
1980336
1981844
19821862
19831072
19841183
19851093
19869102
198710112
198823135
198945180
199043223
199149272
199265337
1993216553
1994433986
19953461,332
19963841,716
19975512,267
19987322,999
19999173,916
200010254,941
200110726,013
200211347,147
200315968,743
2004220210,945
2005249713,442
2006281516,257
2007317819,435
2008291222,347
2009303025,377
2010304228,419
2011286431,283
2012300034,283
2013324137,524
2014402441,548
2015336344,911
2016377148,682
2017405652,738
2018387856,616
2019433860,954
2020521766,171
2021453570,706
2022489975,605