PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 95%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
19811046
19821864
19831175
19841186
19851298
19869107
198711118
198825143
198946189
199052241
199157298
199266364
1993232596
19944621,058
19953451,403
19964061,809
19975632,372
19987573,129
19998954,024
200010035,027
200110466,073
200211097,182
200315568,738
2004211810,856
2005233713,193
2006264015,833
2007296418,797
2008275421,551
2009282024,371
2010286927,240
2011263129,871
2012289032,761
2013308535,846
2014380639,652
2015314142,793
2016372546,518
2017389050,408
2018372654,134
2019412358,257
2020494963,206
2021443467,640
2022541373,053
2023516178,214
2024554183,755
2025435188,106