PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 70%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977821
1978324
1979226
1980430
1981838
19821755
1983964
19841074
19851286
1986995
198711106
198823129
198940169
199040209
199150259
199260319
1993181500
1994347847
19952751,122
19963291,451
19974581,909
19985902,499
19997073,206
20008154,021
20018714,892
20029295,821
200313167,137
200418208,957
2005199810,955
2006223013,185
2007245415,639
2008228717,926
2009230120,227
2010229922,526
2011205424,580
2012220326,783
2013233529,118
2014282231,940
2015229434,234
2016256836,802
2017267839,480
2018260842,088
2019278444,872
2020344948,321
2021273051,051
2022352654,577
2023344658,023
2024351361,536
2025337864,914