This
page displays the complete listing of the human and murine protease
degradomes |
||||||
Christopher
M. Overall, Eric M. Tam, Reinhild Kappelhoff, Andrea Connor, Tom Ewart,
Charlotte J. Morrison, Xose Puente, Carlos López-Otín,
and Arun Seth |
||||||
Eric
M. Tam, Charlotte J. Morrison, Yi I. Wu, M. Sharon Stack, and Christopher
M. Overall |
![]() |
|||||
|
||||||
Carlos López-Otín and
Christopher M. Overall |
Nature Reviews Proteomics Collection. September 2004 www.nature.com/reviews/focus/proteomics Index [HTML] |
|||||
Degradomics
Human and mouse proteases are divided into five classes, which are subdivided into families according to the MEROPS database criteria (Tables S1–S5). We have provided the MEROPS code for all enzymes for which they are available. There are some conflicting cases in which different codes have been previously assigned to human and mouse protease genes that were shown in this work to be true orthologues. In these cases, the human code is proposed for both orthologues. The genes encoding protease-like proteins that show changes in crucial residues for proteolytic activity are indicated as ‘np’ (non-protease homologues) after the code.
The Locus link or nucleotide accession number is provided for each protease. The information for human enzymes is labelled in green and for mouse in yellow. Genes that are absent from human or mouse are labelled in red. Genes that have been inactivated by mutation in one species, but are functional in the other, are labelled in pink. Although these specific pseudogenes have been included in the Tables to emphasize the human–mouse difference, they have not been incorporated into the final counts of protease genes. Genes that have been verified experimentally, but the sequence of which is missing from the available genome sequences, are indicated in red and in parentheses. ‘Y’ indicates that the corresponding human and mouse genes are syntenic. The percentage of identities between orthologous proteases are also shown.
These are divided into four families: A01, A02, A22 and Ax1. There are several pepsinogen A isozymogens encoded by highly related genes (>95% identities) that form part of a cluster located at 11q12. The individual pepsinogen A isozymogens result from haplotypes that contain different number of genes (ranging from 1 to 4)1,2. In agreement with other databases, this region has been annotated as a single gene in human. According to the criteria discussed above, we have assigned mouse pepsinogen F as the orthologue of human pepsinogen A, despite notable divergence of their structure and regulation3 . Ren2 is absent in some strains of laboratory mice. The gene that encodes prochymosin has been inactivated by mutations and frameshifts in the human genome and is classified as a pseudogene, although in mouse and other species it is functional4 .
The genes DDI1, DDI2, DDI-RP, NRIP2 and NRIP3 are included in the family A02 that contains predicted retroviral-like aspartic proteases5. All of these have mouse orthologues at syntenic regions, and are not embedded in endogenous retroviral elements. The human and mouse genomes also contain several aspartic protease-related sequences derived from endogenous retrovirus, but we have not annotated these as human or mouse proteases. In this regard, it is remarkable that most of the retroviruses embedded in both genomes have suffered inactivating mutations, also affecting the putative proteases that are encoded by these viral elements. However, HERV-K113, for example, which is located at 19p13 in ~30% of the human population, has intact open-reading frames for all viral proteins, including the corresponding aspartic protease, and remains capable of reinfecting human today6. The catalogue of aspartic proteases also includes a new family that is derived from the protein prolactin inducible protein/gross cystic disease fluid protein-15 (PIP/GCDFP15), which has recently been characterized as a protease belonging to this class of enzymes7. The four PIP-related proteins lack residues proposed to be essential for PIP proteolytic activity and have been classified as non-protease homologues.
Cysteine proteases
| C02.002 | calpain 2 | CAPN2 | 824 | 1q42 | Capn2 | 12334 | 1H4 | y | 93 |
|---|---|---|---|---|---|---|---|---|---|
| C02.004 | calpain 3 | CAPN3 | 825 | 15q15 | Capn3 | 12335 | 2F1 | y | 93 |
| C02.011 | calpain 5 | CAPN5 | 726 | 11q13 | Capn5 | 12337 | 7F1 | y | 92 |
| C02.971np | calpain 6 | CAPN6 | 827 | Xq23 | Capn6 | 12338 | XF2 | y | 95 |
| C02.008 | calpain 7 | CAPN7 | 23473 | 3p25 | Capn7 | 12339 | 14B | y | 95 |
| C02.007 | calpain 8 | CAPN8 | AA043093 | (1q42) | Capn8 | 170725 | 1H4 | y | 72 |
| C02.006 | calpain 9 | CAPN9 | 10753 | 1q42 | Capn9 | 73647 | 8E2 | y | 85 |
| C02.018 | calpain 10 | CAPN10 | 11132 | 2q37 | Capn10 | 23830 | 1D | y | 81 |
| C02.013 | calpain 11 | CAPN11 | 11131 | 6p21 | Capn11 | 103998 | 17C | y | 83 |
| C02.017 | calpain 12 | CAPN12 | 147968 | 19q13 | Capn12 | 60594 | 7A3 | y | 87 |
| C02.020 | calpain 13 | CAPN13 | 92291 | 2p23 | Capn13 | 240159 | 17E2 | y | 62 |
| C02.xxx | calpain 14 | CAPN14 | 114773 | 2p23 | |||||
| C02.010 | calpain 15/Sol protein | SOLH | 6650 | 16p13 | Solh | 50817 | 17B1 | y | 89 |
| C12.001 | ubiquitin C-terminal hydrolase 1 | UCHL1 | 7345 | 4p14 | Uchl1 | 22223 | 5D | y | 94 |
| C12.003 | ubiquitin C-terminal hydrolase 3 | UCHL3 | 7347 | 13q22 | Uchl3 | 50933 | 14E2 | y | 98 |
| C12.004 | ubiquitin C-term. hydrolase BAP1 | BAP1 | 8314 | 3p21 | Bap1 | 104416 | 14B | y | 93 |
| C12.005 | ubiquitin C-terminal hydrolase 5 | UCHL5 | 51377 | 1q31 | Uchl5 | 56207 | 1F | y | 96 |
| C12.007 | ubiquitin C-terminal hydrolase 4 | Uchl4 | 93841 | 9D | |||||
| C12.xxx | cylindromatosis protein | CYLD1 | 1540 | 16q12 | Cyld1 | 74256 | 8C4 | y | 95 |
| C13.004 | legumain | LGMN | 5641 | 14q32 | Lgmn | 19141 | 12F1 | y | 82 |
| C13.xxx | legumain-2 | LGMN2 | 122199 | 13q21 | |||||
| C13.005 | hGPI8 | PIGK | 10026 | 1p31 | Pigk | 66613 | 3H4 | y | 94 |
| C14.001 | caspase-1 | CASP1 | 834 | 11q22 | Casp1 | 12362 | 9A1 | y | 62 |
| C14.006 | caspase-2 | CASP2 | 835 | 7q34 | Casp2 | 12366 | 6B2 | y | 89 |
| C14.003 | caspase-3 | CASP3 | 836 | 4q35 | Casp3 | 12367 | 8B2 | y | 87 |
| C14.007 | caspase-4/11 | CASP4 | 837 | 11q22 | Casp11 | 12363 | 9A1 | y | 60 |
| C14.008 | caspase-5 | CASP5 | 838 | 11q22 | |||||
| C14.005 | caspase-6 | CASP6 | 839 | 4q25 | Casp6 | 12368 | 3H1 | y | 90 |
| C19.055 | USP47 | USP47 | 55031 | 11p15 | Usp47 | 320745 | 7F2 | y | 94 |
|---|---|---|---|---|---|---|---|---|---|
| C19.068 | USP48 | USP48 | 84196 | 1p36 | Usp48 | 170707 | 4D3 | y | 95 |
| C19.073np | USP49 | USP49 | 25862 | 6p21 | Usp49 | 224836 | 17C | y | 80 |
| C19.058np | USP50 | USP50 | AI990110 | 15q21 | Usp50 | 75083 | 2F2 | y | 75 |
| C19.065 | USP51 | USP51 | BF741256 | Xp11 | |||||
| C19.xxxnp | USP52 | USP52 | 9924 | 12q13 | Usp52 | 103135 | 10D3 | y | 97 |
| C19.031 | DUB-1 | Dub1 | 13531 | 7F2 | |||||
| C19.032 | DUB-2 | Dub2 | 13532 | 7F1 | |||||
| C19.xxx | DUB2a | Dub3 | AF393638 | 7F1 | |||||
| C19.xxx | DUB2a-like | Dub4 | AF393637 | 7F1 | |||||
| C19.xxx | DUB2a-like2 | Dub5 | BAC40791 | 7F1 | |||||
| C19.xxx | DUB6 | Dub6 | BN000117 | 7F1 | |||||
| C26.001 | γ-glutamyl hydrolase | GGH | 8836 | 8q12 | Ggh | 14590 | 4A3 | y | 69 |
| C44.001 | Gln-PRPP amidotransferase | PPAT | 5471 | 4q12 | Ppat | 231327 | 5E1 | y | 93 |
| C44.971np | Gln-fructose-6-P transamidase 1 | GFPT1 | 2673 | 2p13 | Gfpt1 | 14583 | 6D2 | y | 99 |
| C44.972np | Gln-fructose-6-P transamidase 2 | GFPT2 | 9945 | 5q35 | Gfpt2 | 14584 | 11B1 | y | 98 |
| C44.973np | Gln-fructose-6-P transamidase 3 | GFPT3 | 203431 | Xq21 | #Gfpt3 | XC3 | y | ||
| C46.002 | sonic hedgehog protein | SHH | 6469 | 7q36 | Shh | 20423 | 5A3 | y | 92 |
| C46.003 | indian hedgehog protein | IHH | 3549 | 2q35 | Ihh | 16147 | 1C3 | y | 95 |
| C46.004 | desert hedgehog protein | DHH | 50846 | 12q13 | Dhh | 13363 | 15F2 | y | 97 |
| C48.002 | sentrin/SUMO protease 1 | SENP1 | 29843 | 12q13 | Senp1 | 223870 | 15F2 | y | 88 |
| C48.007 | sentrin/SUMO protease 2 | SENP2 | 59343 | 3q27 | Senp2 | 75826 | 16B1 | y | 71 |
| C48.003 | sentrin/SUMO protease 3 | SENP3 | 26168 | 17p13 | Senp3 | 80886 | 11B4 | y | 95 |
| C48.008 | sentrin/SUMO protease 5 | SENP5 | 205564 | 3q29 | Senp5 | AK043171 | 16B2 | y | 71 |
| C48.004 | sentrin/SUMO protease 6 | SENP6 | 26054 | 6q14 | Senp6 | 215351 | 9E2 | y | 81 |
| C48.009 | sentrin/SUMO protease 7 | SENP7 | 57337 | 3q12 | Senp7 | 72869 | 16B1 | y | 87 |
| Cx1.xxx | CGI-77 | CGI77 | 51633 | 8q21 | Cgi77 | 72201 | 4A2 | y | 87 |
|---|---|---|---|---|---|---|---|---|---|
| Cx1.xxxnp | CGI-77b | Cgi77b | 236778 | XA3 | |||||
| Cx2.xxxnp | HetF-like | HETFL | 23331 | 22q12 | Hetfl | 209683 | 5F | y | 85 |
The cysteine proteases belong to 16 different families, and include proteins such as hedgehog family members, the protease function of which is only used for the autolytic processing of their respective precursors8. The C01 family is largely expanded in the mouse as a result of the presence of placental cathepsins and testins. We have annotated two further mouse testins, including testin-3, which was the first member of this subfamily predicted to be a functional protease. There are two functional human cathepsin L-like genes (CTSL and CTSL2) at 9q21, and a single gene in the mouse, which is more closely related to CTSL2. The cylindromatosis protein contains an ubiquitin C-terminal hydrolase domain and has been included in the C12 family. The genes for calpain 14, caspase 5 and caspase 10 are absent in mice, and the human gene for caspase 12 has been inactivated and is therefore classified as a pseudogene. We have annotated a second human legumain-like gene that is absent in mouse.
The C19 family of ubiquitin specific proteases (USPs) is large and complex. We have annotated 21 human members (USP30, 31, 34–52) and assigned their corresponding mouse orthologues. We have not found mouse orthologues for human USP6, -13, -34, -37, -42 and -51. USP17is located within the RS447 human megasatellite at 4p159. This region is highly polymorphic in the human genome, containing a variable number of USP17-related intronless tandemlyrepeated sequences (>95% identical), which have probably been generated by retrotransposition. Forty-four distinct alleles in 74 unrelated chromosomes containing 20–103 copies of the RS477 unit have been identified10. We have also identified several USP17-related sequences in a cluster located at 8p25. This cluster would contain at least seven USP17-like (USP17L) intronless genes (three of these are classified as non-protease homologues) and pseudogenes. The proteins encoded by these polymorphic and variable regions have been annotated as two single proteases (USP17 and USP17L) in this table. The closest relatives of USP17 genes in the mouse genome are those that code for proteins called DUBs (deubiquitinating enzymes). DUB1, DUB2, and DUB2A have been extensively characterized as members of a novel group of cytokine-inducible deubiquitylating enzymes that are produced by lymphocytes11–13. We have annotated three further members of this subfamily of haematopoietic proteases. The classification of mouse DUBs as orthologues of human USP17 genes is doubtful because, despite sequence similarities, their syntenic relationship is unclear. Accordingly, we have tentatively classified them as paralogous genes.
We have annotated six members of the C48 family of SUMO-1 proteases in the mouse genome, which are absent in the human genome. We have also included a family of recently described cysteine proteases with deubiquitylating activity containing the OTU-protease domain and tentatively called otubains14,15. This family should comprise 14 orthologues and one specific member in both human and mouse. All of them contain characteristic features of active proteases with the exception of TRABID and murine Cgi77b. The last protease included in our list of cysteine proteases is called HetF-like and forms part of the superfamily of caspase-haemoglobinase fold proteases16. Human and mouse HetF-like have a serine residue instead of the active-site cysteine present in cysteine proteases, and have been classified as non-protease homologues.
Metalloproteases
| Code M01.003 M01.014 M01.023 M01.001 M01.018 M01.004 M01.008 M01.010 M01.011 M01.022 M01.028 M01.027 M01.972np M02.001 M02.006 M02.971np M03.001 M03.002 M03.006 M08.003 M10.034 M10.001 M10.003 M10.005 M10.008 | Peptidase aminopeptidase A aminopeptidase B aminopeptidase MAMS aminopeptidase N aminopeptidase PILS leukotriene A4 hydrolase pyroglutamyl-peptidase II cytosol alanyl aminopeptidase leucyl-cystinyl aminopeptidase aminopeptidase B-like 1 aminopeptidase O aminopeptidase Q TBP-associated factor 2 angiotensin-converting enzyme 1 angiotensin-converting enzyme 2 angiotensin-converting enzyme 3 thimet oligopeptidase neurolysin mitochondrial intermediate peptidase leishmanolysin-2 collagenase-like B collagenase 1 gelatinase A stromelysin 1 matrilysin | Human Gene LocusLink ENPEP 2028 RNPEP 6051 AMPEP 64167 ANPEP 290 ARTS1 51752 LTA4H 4048 TRHDE 29953 NPEPPS 9520 LNPEP 4012 RNPEPL1 57140 AOPEP 84909 AQPEP BG623101 TAF2 6873 ACE 1636 ACE2 59272 #ACE3 THOP1 7064 NLN 57486 MIPEP 4285 LMLN 89782 MMP1 4312 MMP2 4313 MMP3 4314 MMP7 4316 | Locus 4q26 1q32 5q15 15q25 5q21 12q23 12q21 17q21 5q15 2q37 9q22 5q23 8q24 17q23 Xp21 17q23 19p13 5q13 13q12 3q29 11q22 16q22 11q22 11q22 | Mouse Gene LocusLink Enpep 13809 Rnpep 215615 Anpep 16790 Arts1 80898 Lta4h 16993 Trhde 237553 Psa 19155 Lnpep 266720 Rnpepl1 98480 Aopep BAC31943 Aqpep 74574 Taf2 319944 Ace 11421 Ace2 70008 Ace3 217246 Thop1 50492 Nln 75805 Mipep 70478 Lmln 239833 Mcolb 83996 Mcola 83995 Mmp2 17390 Mmp3 17392 Mmp7 17393 | Locus 3H1 1F 7D2 13C1 10C2 10D1 11D 13C1 1D 13B3 18C 15D 11E1 XF5 11E1 10C1 13D1 14C3 16B2 9A1 9A1 8C5 9A1 9A1 | Syntenic yyyyyyyyyyyyyyy yyyyyyyy | Identity 77 86 76 85 92 94 97 88 95 72 68 99 83 82 89 90 84 73 59 95 76 70 |
|---|
| M13.091 | PHEX endopeptidase | PHEX | 5251 | Xp22 | Phex | 18675 | XF4 | y | 96 |
|---|---|---|---|---|---|---|---|---|---|
| M14.001 | carboxypeptidase A1 | CPA1 | 1357 | 7q32 | Cpa1 | 109697 | 6A3 | y | 74 |
| M14.002 | carboxypeptidase A2 | CPA2 | 1358 | 7q32 | Cpa2 | 232680 | 6A3 | y | 86 |
| M14.010 | carboxypeptidase A3 | CPA3 | 1359 | 3q24 | Cpa3 | 12873 | 3A3 | y | 81 |
| M14.017 | carboxypeptidase A4 | CPA4 | 51200 | 7q32 | Cpa4 | 215225 | 6A3 | y | 84 |
| M14.020 | carboxypeptidase A5 | CPA5 | 93979 | 7q32 | Cpa5 | 76649 | 1A3 | y | 84 |
| M14.018 | carboxypeptidase A6 | CPA6 | 57094 | 8q13 | Cpa6 | 329093 | 1A3 | y | 86 |
| M14.003 | carboxypeptidase B | CPB1 | 1360 | 3q25 | Cpb1 | 76703 | 3A3 | y | 72 |
| M14.009 | carboxypeptidase U | CPB2 | 1361 | 13q14 | Cpb2 | 56373 | 14D2 | y | 82 |
| M14.021 | carboxypeptidase O | CPO | 130749 | 2q33 | #Cpo | 269201 | 1C2 | y | |
| M14.005 | carboxypeptidase E | CPE | 1363 | 4q32 | Cpe | 12876 | 8B3 | y | 97 |
| M14.004 | carboxypeptidase N | CPN | 1369 | 10q25 | Cpn | 93721 | 19D1 | y | 66 |
| M14.006 | carboxypeptidase M | CPM | 1368 | 12q15 | Cpm | 70574 | 10D2 | y | 79 |
| M14.011 | carboxypeptidase D | CPD | 1362 | 17q11 | Cpd | 12874 | 11B4 | y | 93 |
| M14.012 | carboxypeptidase Z | CPZ | 8532 | 4p16 | Cpz | 242939 | 5B1 | y | 82 |
| M14.015np | carboxypeptidase X1 | CPX1 | 56265 | 20p13 | Cpx1 | 56264 | 2F3 | y | 86 |
| M14.019np | carboxypeptidase X2 | CPX2 | 119587 | 10q26 | Cpx2 | 55987 | 7F4 | y | 89 |
| M14.951np | adipocyte-enhancer binding prot. 1 | AEBP1 | 165 | 7p13 | Aebp1 | 11568 | 11A1 | y | 90 |
| M16.002 | insulysin | IDE | 3416 | 10q24 | Ide | 15925 | 19C3 | y | 97 |
| M16.003 | mitochondrial processing pept. β-sub | PMPCB | 9512 | 7q22 | Pmpcb | 73078 | 5A3 | y | 90 |
| M16.005 | nardilysin | NRD1 | 4898 | 1p32 | Nrd1 | 230598 | 4C7 | y | 93 |
| M16.009 | pitrilysin metalloprotease 1 | PITRM1 | 10531 | 10p15 | Pitrm1 | 69617 | 13A1 | y | 86 |
| M16.971np | mitochondrial processing protease | INPP5E | 23203 | 9q34 | Inpp5e | 66865 | 2A3 | y | 91 |
| M16.973np | UCR1 | UQCRC1 | 7384 | 3p21 | Uqcrc1 | 22273 | 9F2 | y | 88 |
| M16.974np | UCR2 | UQCRC2 | 7385 | 16p12 | Uqcrc2 | 67003 | 7F3 | y | 85 |
| M16.976np | mitoch. processing protease-like | AMPP | 133083 | 4q22 | |||||
| M17.001 | leucyl aminopeptidase | LAP3 | 51056 | 4p15 | Lap3 | 66988 | 5B3 | y | 90 |
| M17.006 | aminopeptidase-like 1 | NPEPL1 | 79716 | 20q13 | |||||
|---|---|---|---|---|---|---|---|---|---|
| M18.002 | aspartyl aminopeptidase | DNPEP | 23549 | 2q36 | Dnpep | 13437 | 1C3 | y | 90 |
| M19.001 | membrane dipeptidase | DPEP1 | 1800 | 16q24 | Dpep1 | 13479 | 8E2 | y | 73 |
| M19.002 | membrane dipeptidase 2 | DPEP2 | 64174 | 16q22 | Dpep2 | 244632 | 8D2 | y | 70 |
| M19.004 | membrane dipeptidase 3 | DPEP3 | 64180 | 16q22 | Dpep3 | 71854 | 8D2 | y | 73 |
| M20.005 | glu-carboxypeptidase-like 1 | CPGL | 55748 | 18q22 | Cpgl | 66054 | 18E3 | y | 91 |
| M20.006 | glu-carboxypeptidase-like 2 | CPGL2 | 84735 | 18q22 | Cpgl2 | 240478 | 18E3 | y | 73 |
| M20.971np | HmrA-like protease | HMRALP | 135293 | 6q15 | Hmralp | 242377 | 4A5 | y | 83 |
| M20.973np | aminoacylase | ACY1 | 95 | 3p21 | Acy1 | 109652 | 9F1 | y | 85 |
| M22.003 | O-sialoglycoprotein endopeptidase | OSGEP | 55644 | 14q11 | Osgep | 66246 | 14C1 | y | 93 |
| M22.004 | O-sialoglycoprotein endopeptidase 2 | OSGEP2 | 64172 | 2q32 | Osgep2 | 72085 | 1C1 | y | 84 |
| M24.001 | methionyl aminopeptidase I | METAP1 | 23173 | 4q24 | Metap1 | 75624 | 3H2 | y | 92 |
| M24.002 | methionyl aminopeptidase II | METAP2 | 10988 | 12q23 | Metap2 | 56307 | 10C3 | y | 88 |
| M24.028 | methionyl aminopeptidase-like 1 | METAPL1 | 254042 | 2q31 | Metapl1 | 66559 | 2C3 | y | 95 |
| M24.005 | X-prolyl aminopeptidase 2 | XPNPEP2 | 7512 | Xq26 | Xpnpep2 | 170745 | XA3 | y | 81 |
| M24.007 | X-Pro dipeptidase | PEPD | 5184 | 19q13 | Pepd | 18624 | 7B1 | y | 90 |
| M24.009 | aminopeptidase P1 | XPNPEPL | 7511 | 10q25 | Xpnpep1 | 170750 | 19D2 | y | 81 |
| M24.026 | aminopeptidase P homologue | PEPP | 63929 | 22q13 | Pepp | 321003 | 15E3 | y | 93 |
| M24.973np | proliferation-association protein 1 | PA2G4 | 5036 | 12q13 | Pa2g4 | 18813 | 10D3 | y | 98 |
| M24.974np | suppressor of Ty 16 homologue | SUPT16H | 11198 | 14q11 | Supt16h | 114741 | 14C1 | y | 98 |
| M28.010 | glutamate carboxypeptidase II | FOLH1 | 2346 | 11p11 | Folh1 | 53320 | 7E1 | y | 85 |
| M28.011 | NAALADASE L peptidase | NAALADL | 10004 | 11q13 | NAALADL | BN000129 | 19A | y | 80 |
| M28.012 | NAALADASE II | NAALAD2 | 10003 | 11q14 | Naalad2 | 72560 | 9A3 | y | 89 |
| M28.975np | NAALADASE III | NAALAD3 | 254827 | 3q26 | Naalad3 | 229149 | 3A3 | y | 63 |
| M28.014 | plasma Glu-carboxypeptidase | PGCP | 10404 | 8q22 | Pgcp | 54381 | 15B3 | y | 93 |
| M28.018 | Ojeda peptidase | OJP | 79956 | 9p24 | Ojp | BAC38286 | 19C2 | y | 87 |
|---|---|---|---|---|---|---|---|---|---|
| M28.972np | transferrin receptor protein | TFRC | 7037 | 3q29 | Trfr | 22042 | 16B3 | y | 77 |
| M28.973np | transferrin receptor 2 protein | TFR2 | 7036 | 7q22 | Trfr2 | 50765 | 5G1 | y | 84 |
| M28.974np | glutaminyl cyclase | QPCT | 25797 | 2p22 | Qpct | 70536 | 17E3 | y | 81 |
| M28.016 | glutaminyl cyclase 2 | QPCT2 | 54814 | 19q13 | Qpct2 | 67369 | 7A2 | y | 84 |
| M38.972np | dihydroorotase | CAD | 790 | 2p23 | Cad | 69719 | 5B1 | y | 94 |
| M38.973np | dihydropyrimidinase | DPYS | 1807 | 8q22 | Dpys | 64705 | 15C | y | 88 |
| M38.xxxnp | dihydropyrimidinase-related prot. 1 | CRMP1 | 1400 | 4p16 | Crmp1 | 12933 | 5B2 | y | 96 |
| M38.xxxnp | dihydropyrimidinase-related prot. 2 | DPYSL2 | 1808 | 8p21 | Dpysl2 | 12934 | 14D1 | y | 98 |
| M38.xxxnp | dihydropyrimidinase-related prot. 3 | DPYSL3 | 1809 | 5q32 | Dpysl3 | 22240 | 18B3 | y | 98 |
| M38.xxxnp | dihydropyrimidinase-related prot. 4 | DPYSL4 | 10570 | 10q26 | Dpysl4 | 26757 | 7F5 | y | 93 |
| M38.xxxnp | dihydropyrimidinase-related prot. 5 | DPYSL5 | 56896 | 2p23 | Dpysl5 | 65254 | 5B1 | y | 98 |
| M41.004 | i-AAA protease | YME1L1 | 10730 | 10p12 | Yme1l1 | 27377 | 2A3 | y | 95 |
| M41.006 | paraplegin | SPG7 | 6687 | 16q24 | Spg7 | 234847 | 8E2 | y | 89 |
| M41.010 | Afg3-like protein 1 | #AFG3L1 | 172 | 16q24 | Afg3l1 | 114896 | 8E2 | y | |
| M41.007 | Afg3-like protein 2 | AFG3L2 | 10939 | 18p11 | Afg3l2 | 69597 | 18E1 | y | 94 |
| M43.004 | pappalysin-1 | PAPPA | 5069 | 9q32 | Pappa | 18491 | 4C1 | y | 93 |
| M43.005 | pappalysin-2 | PLAC3 | 60676 | 1q25 | Plac3 | 240848 | 1H1 | y | 78 |
| M47.001 | procol. III N-endopeptidase | PCOLN3 | 5119 | 16q24 | #Pcoln3 | BI690732 | 8E2 | y | |
| M48.003 | FACE-1/ZMPSTE24 | FACE1 | 10269 | 1p34 | Face1 | 230709 | 4D1 | y | 91 |
| M48.017 | VVML | VVML | 115209 | 1p32 | Vvml | 67013 | 4C6 | y | 71 |
| M49.001 | dipeptidyl-peptidase III | DPP3 | 10072 | 11q13 | Dpp3 | 75221 | 19A | y | 92 |
| M50.001 | S2P protease | MBTPS2 | 51360 | Xp22 | Mbtps2 | 270669 | XF4 | y | 97 |
These belong to 26 distinct families. The M01 family contains 13 members in human and 12 in mouse, which lacks aminopeptidase MAMS. We propose the names aminopeptidases O and Q for the M01 proteases previously annotated as human hypothetical proteins FLJ14675 and BG623101. We have also identified orthologues for these genes located at mouse chromosomes 13B3 and 18C. In the M02 family, we have tentatively annotated a mouse gene for a third angiotensin-converting enzyme-like (Ace3), which is located at chromosome 11E1. We have classified Ace3 as a non-protease homologue because it contains the HQMGH sequence instead of the consensus Zn-binding HExxH motif. No expressed sequence tags (ESTs) have been found for mouse Ace3, which could be an inactive pseudogene, although the locus is apparently complete and conserved in the rat. The corresponding human gene is a pseudogene as a result of the accumulation of stop codons and frameshifts.
There are some differences between human and mouse members of the M10 family of matrix metalloproteases (MMPs). Mouse McolB, a diverging counterpart of human MMP1 is absent in human, whereas human matrilysin-2 (MMP26) is absent from mouse, although there are some gaps in the mouse genome region which could contain this missing gene. MMP23 has been recently duplicated in the human genome17, generating two closely related genes MMP23A and MMP23B. This region is artefactually collapsed in the available public and private genome sequences owing to the high sequence identity between both genes, and is erroneously considered as containing a single gene. Apparently, there is a single mouse MMP23gene, although the possibility that this region is duplicated in the mouse genome and has also been computer-collapsed can not be ruled out. In the family M12, we have annotated a new member within the meprin/tolloid subfamily18 .
The ADAM (a disintegrin and metalloprotease) subfamily of M12 metalloproteases19 shows important differences between both organisms. The genes for ADAM-1, -3, -4, -5, -6 and -25 are pseudogenes in the human but active genes in the mouse. ADAM-1 and -6 are duplicated in mouse, whereas ADAM-20 is duplicated in human (ADAM-20 and ADAM-21). Also, testases — a subgroup of ADAMs located at 8B1 — are mouse specific. We have annotated five further members of this family (testases 5–9), although they are intronless and their functional relevance remains to be shown. The group of ADAMTSs (ADAMs with thrombospondin domains) is completed with the inclusion of human and mouse ADAMTS-20. In the M14 family of carboxypeptidases, we have found that mouse carboxypeptidase O has been specifically inactivated by mutation and is annotated as a pseudogene20. Dihydroorotase and several dihydropyrimidinases have been included as non-protease homologues of bacterial isoaspartyl dipeptidases. The gene that encodes procollagen III N-endopeptidase is inactivated in mouse, thereby representing an interesting difference between both human and mouse degradomes, as there are no other functional members in the M47 family that could compensate this specific loss in mouse. We have annotated 14 human and 13 mouse proteins in the recently described M67 family of metalloisopeptidases21,22. All of them contain the JAMM motif, although some lack conserved residues that are predicted to be essential for proteolytic activity, and have therefore been classified as non-protease homologues.
There are doubts about the ascription of the FACE-2/RCE1 prenyl endopeptidase to the cysteine or metalloprotease classes of enzymes23; however, in agreement with recent structural comparisons24, we have included it as the only human and mouse representative of a new family of membrane-bound metalloproteases. Finally, we have included three aminoacylases in our catalogue of metalloproteases. These enzymes are not, strictly speaking, proteases because they cleave peptide bonds that connect an acyl derivative with an amino acid25. However, the structure of ACY1 clearly allows its inclusion in the M20 family of metalloproteases, whereas those of ACY2 and ACY3 have also been proposed to be part of a superfamily of metalloproteases that contains members of the M14 family of carboxypeptidases26.
Serine proteases
| S01.192 | complement component C1ra | C1R | 715 | 12p13 | C1ra | 50909 | 6F2 | y | 81 |
|---|---|---|---|---|---|---|---|---|---|
| S01.xxx | complement component C1rb | C1rb | AF459018 | (6F2) | |||||
| S01.193 | complement component C1sa | C1S | 716 | 12p13 | C1sa | 50908 | 6F2 | y | 74 |
| S01.xxx | complement component C1sb | C1sb | 317677 | 6F2 | |||||
| S01.191 | complement factor D | DF | 1675 | 19p13 | Df | 11537 | 10C1 | y | 67 |
| S01.xxx | complement factor D-like | DF2 | 199783 | 19p13 | Df2 | 270746 | 10C1 | y | 79 |
| S01.199 | complement factor I | IF | 3426 | 4q25 | If | 12630 | 3H1 | y | 69 |
| S01.198 | MASP1/3 | MASP1/3 | 5648 | 3q29 | Masp1/3 | 17174 | 16B1 | y | 86 |
| S01.229 | MASP2 | MASP2 | 10747 | 1p36 | Masp2 | 17175 | 4E1 | y | 81 |
| S01.237 | neurotrypsin | PRSS12 | 8492 | 4q28 | Prss12 | 19142 | 3G3 | y | 82 |
| S01.231 | u-plasminogen activator | PLAU | 5328 | 10q22 | Plau | 18792 | 14B | y | 69 |
| S01.232 | t-plasminogen activator | PLAT | 5327 | 8p11 | Plat | 18791 | 8A3 | y | 80 |
| S01.233 | plasminogen | PLG | 5340 | 6q26 | Plg | 18815 | 17A2 | y | 79 |
| S01.976np | hepatocyte growth factor | HGF | 3082 | 7q21 | Hgf | 15234 | 5A3 | y | 91 |
| S01.975np | macrophage-stimulating protein | MSP | 4485 | 3p21 | Msp | 15235 | 9F2 | y | 80 |
| S01.999np | apolipoprotein | LPA | 4018 | 6q26 | |||||
| S01.223 | acrosin | ACR | 49 | 22q13 | Acr | 11434 | 15F1 | y | 68 |
| S01.972np | haptoglobin-1 | HP | 3240 | 16q22 | Hp | 15439 | 8D3 | y | 79 |
| S01.974np | haptoglobin-related protein | HPR | 3250 | 16q22 | |||||
| S01.277 | osteoblast serine protease | HTRA1 | 5654 | 10q26 | Htra1 | 56213 | 7F4 | y | 91 |
| S01.278 | HTRA2 | HTRA2 | 27429 | 2p12 | Htra2 | 64704 | 6D1 | y | 84 |
| S01.284 | HTRA3 | HTRA3 | 94031 | 4p16 | Htra3 | 78558 | 5B1 | y | 86 |
| S01.285 | HTRA4 | HTRA4 | 203100 | 8p11 | Htra4 | 66943 | 8A3 | y | 66 |
| S01.309 | umbilical vein protease | SPUVE | 11098 | 11q14 | Spuve | 76453 | 7E1 | y | 90 |
| S01.994np | similar to SPUVE | SPUVE2 | 167681 | 6q14 | Spuve2 | 244954 | 9E3 | y | 77 |
| S01.104 | plasma-kallikrein-like 1 | KLKBL1 | XP_116753 | 8p23 | Klkbl1 | 74215 | (14C3) | 66 | |
| S01.415 | plasma-kallikrein-like 2 | KLKBL2 | 203074 | 8p23 | Klkbl2 | 71037 | 14C3 | y | 71 |
| S01.419 | plasma-kallikrein-like 3 | #KLKBL3 | 8p23 | Klkbl3 | 73382 | 14C3 | y |
| S01.992np | plasma-kallikrein-like 4 | KLKBL4 | 221191 | 16q21 | Klkbl4 | BN000132 | 8C5 | y | 62 |
|---|---|---|---|---|---|---|---|---|---|
| S01.286 | similar to Arabidopsis Ser-prot. | SASP | 219743 | 10q22 | Sasp | 71767 | 10B4 | y | 80 |
| S01.991np | chymase-like serine protease | Clsp | 75106 | XC3 | |||||
| S08.063 | site-1 protease | MBTPS1 | 8720 | 16q23 | Mbtps1 | 56453 | 8E1 | y | 96 |
| S08.039 | proprotein convertase 9 | PCSK9 | 255738 | 1p32 | Pcsk9 | 100102 | 4C7 | y | 73 |
| S08.090 | tripeptidyl-peptidase II | TPP2 | 7174 | 13q33 | Tpp2 | 22019 | 1C1 | y | 95 |
| S08.072 | proprotein convertase 1 | PCSK1 | 5122 | 5q15 | Pcsk1 | 18548 | 13C1 | y | 93 |
| S08.073 | proprotein convertase 2 | PCSK2 | 5126 | 20p12 | Pcsk2 | 18549 | 2H1 | y | 97 |
| S08.071 | furin | PCSK3 | 5045 | 15q26 | Pcsk3 | 18550 | 7D2 | y | 94 |
| S08.074 | proprotein convertase 4 | PCSK4 | 5124 | 19p13 | Pcsk4 | 18551 | 10C1 | y | 82 |
| S08.076 | proprotein convertase 5 | PCSK5 | 5125 | 9q21 | Pcsk5 | 18552 | 19B | y | 92 |
| S08.075 | PACE4 proprotein convertase | PCSK6 | 5046 | 15q26 | Pcsk6 | 18553 | 7C | y | 93 |
| S08.077 | proprotein convertase 7 | PCSK7 | 9159 | 11q23 | Pcsk7 | 18554 | 9B | y | 88 |
| S09.001 | prolyl oligopeptidase | PREP | 5550 | 6q22 | Prep | 19072 | 10B2 | y | 96 |
| S09.015 | prolyl-oligopeptidase 2 | PREP2 | 9581 | 2p21 | Prep2 | 213760 | 17E4 | y | 94 |
| S09.003 | dipeptidyl-peptidase 4 | DPP4 | 1803 | 2q24 | CD26 | 13482 | 2C3 | y | 85 |
| S09.973np | dipeptidyl-peptidase 6 | DPP6 | 1804 | 7q36 | Dpp6 | 13483 | 5A3 | y | 91 |
| S09.018 | dipeptidyl-peptidase 8 | DPP8 | 54878 | 15q23 | Dpp8 | 74388 | 9D | y | 95 |
| S09.019 | dipeptidyl-peptidase 9 | DPP9 | 91039 | 19p13 | Dpp9 | 224897 | 17D | y | 89 |
| S09.974np | dipeptidyl-peptidase 10 | DPP10 | 57628 | 2q14 | Dpp10 | 269109 | 1E2 | y | 88 |
| S09.007 | Seprase | FAP | 2191 | 2q24 | Fap | 14089 | 2C3 | y | 90 |
| S09.004 | acylaminoacyl-peptidase | APEH | 327 | 3p21 | Apeh | 235606 | 9F2 | y | 91 |
| S09.055 | CGI-67 protein | CGI-67 | 51104 | 9q21 | Cgi-67 | BN000127 | 19C1 | y | 98 |
| S09.052 | CGI-67-like protease-1 | CGI-67L1 | 81926 | 19p13 | Cgi-67l1 | 216169 | 10C1 | y | 93 |
| S09.053 | CGI-67-like protease-2 | CGI-67L2 | 58489 | 15q25 | Cgi-67l2 | 70178 | 7D3 | y | 97 |
| S09.051 | BEM46-like 1 | BEM46L1 | 84945 | 13q33 | Bem46l1 | 68904 | 8A2 | y | 97 |
| S09.054 | BEM46-like 2 | BEM46L2 | 26090 | 20p11 | Bem46l2 | 76192 | 2H1 | y | 90 |
| S09.xxx | BEM46-like 3 | BEM46L3 | BG74273 | 14q22 | Bem46l3 | 278594 | 12C3 | y | 78 |
| S10.002 | lysosomal carboxypeptidase A | PPGB | 5476 | 20q13 | Ppgb | 19025 | 2H3 | y | 87 |
| S10.003 | vitellogenic carboxypeptidase-L | CPVL | 54504 | 7p15 | Cpvl | 71287 | 6B3 | y | 76 |
| S10.013 | serine carboxypeptidase 1 | RISC | 59342 | 17q23 | Risc | 74617 | 11C | y | 82 |
| S12.004 | β-lactamase | LACTB | 114294 | 15q22 | Lactb | 80907 | 9D | y | 85 |
| S14.003 | endopeptidase Clp | CLPP | 8192 | 19p13 | Clpp | 53895 | 17E1 | y | 87 |
| S16.002 | PIM1 endopeptidase | PRSS15 | 9361 | 19p13 | Prss15 | 74142 | 17E1 | y | 88 |
| S16.006 | PIM2 endopeptidase | PIM2 | 83752 | 16q21 | Pim2 | 66887 | 8C4 | y | 95 |
| S26.009 | signalase 18 kDa component | SPC18 | 23478 | 15q25 | Spc18 | 56529 | 7D2 | y | 98 |
| S26.010 | signalase 21 kDa component | SPC21 | 90701 | 18q21 | Spc21 | 66286 | 18E1 | y | 98 |
| S26.xxx | signalase-like 1 | SPCL1 | 158326 | 9p22 | Spcl1 | 230344 | 4C3 | y | 76 |
| S26.012 | mitoc. inner membrane protease 2 | IMMP2L | 83943 | 7q31 | Immp2l | 93757 | 12B3 | y | 90 |
| S26.013 | mitochondrial signal peptidase | IMMP1 | 196294 | 11p13 | Immp1 | 66541 | 2E3 | y | 95 |
| S26.xxx | lactotransferrin | LTF | 4057 | 3p21 | Ltf | 17002 | 9F2 | y | 70 |
| S28.001 | lysosomal Pro-X carboxypeptidase | PRCP | 5547 | 11q14 | Prcp | 72461 | 7E2 | y | 77 |
| S28.002 | dipeptidyl-peptidase II | DPP7 | 29952 | (9q24) | Dpp7 | 83768 | 2A3 | y | 80 |
| S28.003 | thymus-specific serine peptidase | PRSS16 | 10279 | 6p21 | Prss16 | 54373 | 13A3 | y | 79 |
| S33.009 | αβ-hydrolase dom. containing 4 | ABHD4 | 63874 | 14q11 | Abhd4 | 105501 | 14C1 | y | 96 |
| S33.971np | epoxyde hydrolase | EPHX1 | 2052 | 1q42 | Ephx1 | 13849 | 1H4 | y | 83 |
| S33.972np | Mesoderm specific transcript hom. | MEST | 4232 | 7q32 | Mest | 17294 | 6A3 | y | 97 |
| S33.974np | epoxyde hydrolase related protein | EPHXRP | 253152 | 1p22 | Ephxrp | 243192 | 5E | y | 87 |
| S33.xxxnp | CGI-58 | CGI-58 | 51099 | 3p21 | Cgi-58 | 67469 | 9F4 | y | 94 |
| S53.003 | tripeptidyl-peptidase I | CLN2 | 1200 | 11p15 | Cln2 | 12751 | 7F1 | y | 88 |
Most of these belong to the S01 family, but there are representatives of 13 further serine protease families in the human and mouse degradomes. All differences between human and mouse serine proteases correspond to changes in members of this densely populated family. The kallikreins are duplicated in mouse almost entirely — there are 28 members in mouse and 15 in human. The genes for mastin, implantation serine protease-2 (ISP-2), intestinal serine protease (DISP-1), and testis serine proteases TESP-2 and -3, are inactivated in human hence their classification as pseudogenes. The absence of genes for human DISP-2, ISP-1 and TESP-1, together with the finding that human DISP-1, ISP-2, TESP-2 and TESP-3 are pseudogenes, indicates that the functions performed by ISP, DISP and TESP proteases might be mouse-specific. We have also annotated several new members of the testis-specific serine protease (TESSP) subfamily, with TESSP-3, -4 and -6 being pseudogenes in human and active genes in mouse. Mast-cell proteases (Mcpt), granzymes (Gzm), trypsins and human-airway trypsin-like (HAT-like) proteases are expanded in mouse; two tryptases, an ovochymase-like protease and a form of pancreatic elastase, are only present in human. Two well-known non-protease homologues, apolipoprotein (a) (LPA) and haptoglobin-related protein, are absent in mouse. Further characteristic features of the mouse degradome include the duplication of complement factors C1r and C1s, and the presence of an extra functional member of the plasma-kallikrein like subfamily (Klkbl3), and of a non-protease homologue called Clsp (chymase-like serine protease).
We have included in the catalogue of serine proteases, a series of proteins such as lactoferrin, reelin and tumour rejection antigen (gp96), which have been recently reported to have this kind of proteolytic activity27–29. On the basis of structural analysis, lactoferrin has been tentatively classified as a member of the S26 family of serine proteases, whereas reelin, gp96 and their close relatives have been preliminarily ascribed to two Sx families of presently unclassified serine proteases. Gene Ontology annotation of the human proteome also predicts a series of serine proteases with minimal relationship to other members of this class of enzymes. They include torsin, NSP (novel serine protease) and Ufd1L (ubiquitin fusion degradation protein 1 homologue), but owing to the absence of enough evidence to support its ascription as serine proteases, they have not been included in the present version of the human and mouse degradomes.
Threonine proteases
| T03.016 | γ-glutamyltransferase m-3 | GGTL4 | 91227 | 22q11 | |||||
| T03.002 | γ-glutamyltransferase 5 | GGTLA1 | 220522 | 22q11 |
The most recently identified catalytic class of proteases, the threonine proteases30, are classified into three families: T01, containing the proteasome components; T02, composed of three distinct glycosylasparaginases; and T03, including diverse γ-glutamyltransferases (GGTs). All members of the T01 and T02 families are conserved between human and mouse. There are, however, some differences in the number of GGT genes clustered in a region of chromosome 22, which has undergone successive duplications31. As a consequence of this dynamic evolution, there are four GGT genes in this region of the human genome but only one in the corresponding region of the mouse genome (10B5). An additional GGT gene located at 20q11 is conserved in the mouse genome at an equivalent position (2H2).