PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 30%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761111
1977920
1978323
1979225
1980429
1981736
19821450
1983757
19841168
19851179
1986887
1987895
198820115
198927142
199033175
199141216
199252268
1993145413
1994296709
1995225934
19962641,198
19973831,581
19984612,042
19995572,599
20006323,231
20016803,911
20026954,606
20039675,573
200413856,958
200514538,411
2006159610,007
2007169411,701
2008155613,257
2009147614,733
2010142116,154
2011124917,403
2012136118,764
2013143520,199
2014167721,876
2015139023,266
2016159924,865
2017164226,507
2018164128,148
2019166329,811
2020206631,877
2021163433,511
2022204135,552
2023198637,538
2024201739,555
2025179841,353