PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 50%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977922
1978325
1979227
1980431
1981839
19821756
1983965
19841075
19851186
1986894
19879103
198820123
198933156
199034190
199145235
199253288
1993161449
1994315764
19952501,014
19962921,306
19974131,719
19985192,238
19996362,874
20007353,609
20017784,387
20028135,200
200311846,384
200416127,996
200517549,750
2006196611,716
2007214113,857
2008198915,846
2009196417,810
2010194319,753
2011170921,462
2012183923,301
2013193625,237
2014225227,489
2015185529,344
2016213231,476
2017215333,629
2018213035,759
2019225638,015
2020274940,764
2021221042,974
2022284345,817
2023272348,540
2024261651,156
2025219653,352