PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 30%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761212
1977921
1978324
1979226
1980329
1981736
19821551
1983758
19841068
19851078
1986886
1987995
198817112
198926138
199032170
199143213
199252265
1993141406
1994294700
1995222922
19962641,186
19973791,565
19984572,022
19995562,578
20006393,217
20016813,898
20026994,597
20039765,573
200413856,958
200514538,411
2006160210,013
2007168811,701
2008157813,279
2009148514,764
2010143016,194
2011124717,441
2012137318,814
2013144020,254
2014168321,937
2015138723,324
2016158624,910
2017164426,554
2018162528,179
2019167129,850
2020208631,936
2021164033,576
2022204435,620
2023200437,624
2024203439,658
2025210341,761