PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 30%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761111
1977920
1978323
1979225
1980429
1981736
19821450
1983757
19841168
19851078
1986886
1987894
198820114
198927141
199033174
199141215
199252267
1993146413
1994296709
1995224933
19962651,198
19973831,581
19984612,042
19995572,599
20006323,231
20016803,911
20026944,605
20039675,572
200413846,956
200514558,411
2006159510,006
2007169711,703
2008155513,258
2009148114,739
2010142016,159
2011124817,407
2012136318,770
2013143620,206
2014167621,882
2015139023,272
2016159724,869
2017164226,511
2018163928,150
2019166429,814
2020206431,878
2021163933,517
2022204435,561
2023198537,546
2024201839,564
2025193441,498