PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 50%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761212
19771123
1978326
1979228
1980230
1981737
19821855
1983560
19841171
1985879
1986887
1987996
198819115
198933148
199030178
199137215
199251266
1993153419
1994326745
19952631,008
19962971,305
19974571,762
19985362,298
19997173,015
20008173,832
20018474,679
20029115,590
200312956,885
200418258,710
2005206510,775
2006231913,094
2007250415,598
2008233917,937
2009237920,316
2010239322,709
2011213924,848
2012226827,116
2013237329,489
2014276432,253
2015245534,708
2016268737,395
2017280340,198
2018278942,987
2019300745,994
2020349549,489
2021288952,378
2022356855,946
202380356,749