PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 95%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
19811046
19821864
19831175
19841186
19851298
19869107
198711118
198825143
198946189
199052241
199157298
199265363
1993232595
19944621,057
19953451,402
19964061,808
19975632,371
19987573,128
19998954,023
200010035,026
200110466,072
200211087,180
200315578,737
2004211610,853
2005233713,190
2006264115,831
2007296318,794
2008275621,550
2009281324,363
2010287327,236
2011263429,870
2012289232,762
2013308835,850
2014380339,653
2015314042,793
2016372446,517
2017401950,536
2018371254,248
2019411158,359
2020502263,381
2021450667,887
2022548773,374
2023523678,610
2024545984,069
2025497289,041