PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 30%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761111
1977920
1978323
1979225
1980429
1981736
19821450
1983757
19841168
19851179
1986887
1987895
198820115
198928143
199033176
199141217
199253270
1993144414
1994295709
1995225934
19962641,198
19973821,580
19984612,041
19995592,600
20006323,232
20016813,913
20026944,607
20039685,575
200413856,960
200514548,414
2006159810,012
2007169511,707
2008155713,264
2009147914,743
2010142116,164
2011125317,417
2012136018,777
2013143420,211
2014167821,889
2015138923,278
2016159724,875
2017163926,514
2018164328,157
2019165829,815
2020206631,881
2021163933,520
2022203835,558
2023198737,545
2024201839,563
2025156241,125