UBC Logo. Click to go to the University of British Columbia Home Page.
  Click here to access the Overall Lab Intranet
Tantalus Range, Bowen Island, Howe Sound and Sunshine Coast, BC, Canada. Click to bring up a really nice picture of the Tantalus Range.
UBC, located amidst Pacific Spirit Park on the tip of Point Grey, Vancouver. Click to view a picture of sunset over the Overall Lab.

 

 

 

Overall Lab Protease Degradome
and Degradomics Home Page and Papers

This page displays the complete listing of the human and murine protease degradomes


Christopher M. Overall, Eric M. Tam, Reinhild Kappelhoff, Andrea Connor, Tom Ewart, Charlotte J. Morrison, Xose Puente, Carlos López-Otín, and Arun Seth
Protease Degradomics: Mass Spectrometry Discovery of Protease Substrates and the CLIP-CHIP, a Dedicated DNA Microarray of all Human Proteases and Inhibitors
Biological Chermistry 385 (6), 493-504 (2004).

[Preprint/PDF]

CLIP-CHIP. The first dedicated and complete human protease and inhibitor DNA microarray.

Eric M. Tam, Charlotte J. Morrison, Yi I. Wu, M. Sharon Stack, and Christopher M. Overall
Membrane Protease Proteomics: Isotope-coded Affinity Tag MS Identification of Undescribed MT1-Matrix Metalloproteinase Substrates
Proc. Natl. Acad. Sci. USA 101(18), 6917-6922 (2004).
[Preprint/PDF]

Protease substrate discovery: Flow diagram of MS/MS analysis of ICAT labelled proteins in cells transfected with MT1-MMP


Xose S. Puente, Luis, M. Sanchez, Christopher M. Overall and Carlos López-Otín
Human and Mouse Proteases: A Comparative Genomic Approach
Nature Reviews Genetics 4, 544-558 (2003). [Reprint (PDF)]
[HTML]
Summary [HTML]
Degradomic Tables [HTML]


Protease Wheel: Comparison of Human and Murine Proteases

Carlos pez-Otín and Christopher M. Overall
Protease Degradomics:
A New Challenge for Proteomics

Nature Reviews Molecular Cell Biology. 3, 509-519 (2002).

[Reprint/PDF]

Nature Reviews Proteomics Collection. September 2004

www.nature.com/reviews/focus/proteomics

Editorial [pdf] [HTML]

Index [HTML]

 

Degradomics

Human and mouse proteases are divided into five classes, which are subdivided into families according to the MEROPS database criteria (Tables S1–S5). We have provided the MEROPS code for all enzymes for which they are available. There are some conflicting cases in which different codes have been previously assigned to human and mouse protease genes that were shown in this work to be true orthologues. In these cases, the human code is proposed for both orthologues. The genes encoding protease-like proteins that show changes in crucial residues for proteolytic activity are indicated as ‘np’ (non-protease homologues) after the code.

The Locus link or nucleotide accession number is provided for each protease. The information for human enzymes is labelled in green and for mouse in yellow. Genes that are absent from human or mouse are labelled in red. Genes that have been inactivated by mutation in one species, but are functional in the other, are labelled in pink. Although these specific pseudogenes have been included in the Tables to emphasize the human–mouse difference, they have not been incorporated into the final counts of protease genes. Genes that have been verified experimentally, but the sequence of which is missing from the available genome sequences, are indicated in red and in parentheses. ‘Y’ indicates that the corresponding human and mouse genes are syntenic. The percentage of identities between orthologous proteases are also shown.

Aspartic proteases

These are divided into four families: A01, A02, A22 and Ax1. There are several pepsinogen A isozymogens encoded by highly related genes (>95% identities) that form part of a cluster located at 11q12. The individual pepsinogen A isozymogens result from haplotypes that contain different number of genes (ranging from 1 to 4)1,2. In agreement with other databases, this region has been annotated as a single gene in human. According to the criteria discussed above, we have assigned mouse pepsinogen F as the orthologue of human pepsinogen A, despite notable divergence of their structure and regulation3 . Ren2 is absent in some strains of laboratory mice. The gene that encodes prochymosin has been inactivated by mutations and frameshifts in the human genome and is classified as a pseudogene, although in mouse and other species it is functional4 .

The genes DDI1, DDI2, DDI-RP, NRIP2 and NRIP3 are included in the family A02 that contains predicted retroviral-like aspartic proteases5. All of these have mouse orthologues at syntenic regions, and are not embedded in endogenous retroviral elements. The human and mouse genomes also contain several aspartic protease-related sequences derived from endogenous retrovirus, but we have not annotated these as human or mouse proteases. In this regard, it is remarkable that most of the retroviruses embedded in both genomes have suffered inactivating mutations, also affecting the putative proteases that are encoded by these viral elements. However, HERV-K113, for example, which is located at 19p13 in ~30% of the human population, has intact open-reading frames for all viral proteins, including the corresponding aspartic protease, and remains capable of reinfecting human today6. The catalogue of aspartic proteases also includes a new family that is derived from the protein prolactin inducible protein/gross cystic disease fluid protein-15 (PIP/GCDFP15), which has recently been characterized as a protease belonging to this class of enzymes7. The four PIP-related proteins lack residues proposed to be essential for PIP proteolytic activity and have been classified as non-protease homologues.

Cysteine proteases

                                                                                                                                                     
C02.002 calpain 2 CAPN2 824 1q42 Capn2 12334 1H4 y 93
C02.004 calpain 3 CAPN3 825 15q15 Capn3 12335 2F1 y 93
C02.011 calpain 5 CAPN5 726 11q13 Capn5 12337 7F1 y 92
C02.971np calpain 6 CAPN6 827 Xq23 Capn6 12338 XF2 y 95
C02.008 calpain 7 CAPN7 23473 3p25 Capn7 12339 14B y 95
C02.007 calpain 8 CAPN8 AA043093 (1q42) Capn8 170725 1H4 y 72
C02.006 calpain 9 CAPN9 10753 1q42 Capn9 73647 8E2 y 85
C02.018 calpain 10 CAPN10 11132 2q37 Capn10 23830 1D y 81
C02.013 calpain 11 CAPN11 11131 6p21 Capn11 103998 17C y 83
C02.017 calpain 12 CAPN12 147968 19q13 Capn12 60594 7A3 y 87
C02.020 calpain 13 CAPN13 92291 2p23 Capn13 240159 17E2 y 62
C02.xxx calpain 14 CAPN14 114773 2p23
C02.010 calpain 15/Sol protein SOLH 6650 16p13 Solh 50817 17B1 y 89
C12.001 ubiquitin C-terminal hydrolase 1 UCHL1 7345 4p14 Uchl1 22223 5D y 94
C12.003 ubiquitin C-terminal hydrolase 3 UCHL3 7347 13q22 Uchl3 50933 14E2 y 98
C12.004 ubiquitin C-term. hydrolase BAP1 BAP1 8314 3p21 Bap1 104416 14B y 93
C12.005 ubiquitin C-terminal hydrolase 5 UCHL5 51377 1q31 Uchl5 56207 1F y 96
C12.007 ubiquitin C-terminal hydrolase 4 Uchl4 93841 9D
C12.xxx cylindromatosis protein CYLD1 1540 16q12 Cyld1 74256 8C4 y 95
C13.004 legumain LGMN 5641 14q32 Lgmn 19141 12F1 y 82
C13.xxx legumain-2 LGMN2 122199 13q21
C13.005 hGPI8 PIGK 10026 1p31 Pigk 66613 3H4 y 94
C14.001 caspase-1 CASP1 834 11q22 Casp1 12362 9A1 y 62
C14.006 caspase-2 CASP2 835 7q34 Casp2 12366 6B2 y 89
C14.003 caspase-3 CASP3 836 4q35 Casp3 12367 8B2 y 87
C14.007 caspase-4/11 CASP4 837 11q22 Casp11 12363 9A1 y 60
C14.008 caspase-5 CASP5 838 11q22
C14.005 caspase-6 CASP6 839 4q25 Casp6 12368 3H1 y 90
                                                                                                                                                                                                                                                                    
C19.055 USP47 USP47 55031 11p15 Usp47 320745 7F2 y 94
C19.068 USP48 USP48 84196 1p36 Usp48 170707 4D3 y 95
C19.073np USP49 USP49 25862 6p21 Usp49 224836 17C y 80
C19.058np USP50 USP50 AI990110 15q21 Usp50 75083 2F2 y 75
C19.065 USP51 USP51 BF741256 Xp11
C19.xxxnp USP52 USP52 9924 12q13 Usp52 103135 10D3 y 97
C19.031 DUB-1 Dub1 13531 7F2
C19.032 DUB-2 Dub2 13532 7F1
C19.xxx DUB2a Dub3 AF393638 7F1
C19.xxx DUB2a-like Dub4 AF393637 7F1
C19.xxx DUB2a-like2 Dub5 BAC40791 7F1
C19.xxx DUB6 Dub6 BN000117 7F1
C26.001 γ-glutamyl hydrolase GGH 8836 8q12 Ggh 14590 4A3 y 69
C44.001 Gln-PRPP amidotransferase PPAT 5471 4q12 Ppat 231327 5E1 y 93
C44.971np Gln-fructose-6-P transamidase 1 GFPT1 2673 2p13 Gfpt1 14583 6D2 y 99
C44.972np Gln-fructose-6-P transamidase 2 GFPT2 9945 5q35 Gfpt2 14584 11B1 y 98
C44.973np Gln-fructose-6-P transamidase 3 GFPT3 203431 Xq21 #Gfpt3 XC3 y
C46.002 sonic hedgehog protein SHH 6469 7q36 Shh 20423 5A3 y 92
C46.003 indian hedgehog protein IHH 3549 2q35 Ihh 16147 1C3 y 95
C46.004 desert hedgehog protein DHH 50846 12q13 Dhh 13363 15F2 y 97
C48.002 sentrin/SUMO protease 1 SENP1 29843 12q13 Senp1 223870 15F2 y 88
C48.007 sentrin/SUMO protease 2 SENP2 59343 3q27 Senp2 75826 16B1 y 71
C48.003 sentrin/SUMO protease 3 SENP3 26168 17p13 Senp3 80886 11B4 y 95
C48.008 sentrin/SUMO protease 5 SENP5 205564 3q29 Senp5 AK043171 16B2 y 71
C48.004 sentrin/SUMO protease 6 SENP6 26054 6q14 Senp6 215351 9E2 y 81
C48.009 sentrin/SUMO protease 7 SENP7 57337 3q12 Senp7 72869 16B1 y 87
                                            
Cx1.xxx CGI-77 CGI77 51633 8q21 Cgi77 72201 4A2 y 87
Cx1.xxxnp CGI-77b Cgi77b 236778 XA3
Cx2.xxxnp HetF-like HETFL 23331 22q12 Hetfl 209683 5F y 85

The cysteine proteases belong to 16 different families, and include proteins such as hedgehog family members, the protease function of which is only used for the autolytic processing of their respective precursors8. The C01 family is largely expanded in the mouse as a result of the presence of placental cathepsins and testins. We have annotated two further mouse testins, including testin-3, which was the first member of this subfamily predicted to be a functional protease. There are two functional human cathepsin L-like genes (CTSL and CTSL2) at 9q21, and a single gene in the mouse, which is more closely related to CTSL2. The cylindromatosis protein contains an ubiquitin C-terminal hydrolase domain and has been included in the C12 family. The genes for calpain 14, caspase 5 and caspase 10 are absent in mice, and the human gene for caspase 12 has been inactivated and is therefore classified as a pseudogene. We have annotated a second human legumain-like gene that is absent in mouse.

The C19 family of ubiquitin specific proteases (USPs) is large and complex. We have annotated 21 human members (USP30, 31, 34–52) and assigned their corresponding mouse orthologues. We have not found mouse orthologues for human USP6, -13, -34, -37, -42 and -51. USP17is located within the RS447 human megasatellite at 4p159. This region is highly polymorphic in the human genome, containing a variable number of USP17-related intronless tandemlyrepeated sequences (>95% identical), which have probably been generated by retrotransposition. Forty-four distinct alleles in 74 unrelated chromosomes containing 20–103 copies of the RS477 unit have been identified10. We have also identified several USP17-related sequences in a cluster located at 8p25. This cluster would contain at least seven USP17-like (USP17L) intronless genes (three of these are classified as non-protease homologues) and pseudogenes. The proteins encoded by these polymorphic and variable regions have been annotated as two single proteases (USP17 and USP17L) in this table. The closest relatives of USP17 genes in the mouse genome are those that code for proteins called DUBs (deubiquitinating enzymes). DUB1, DUB2, and DUB2A have been extensively characterized as members of a novel group of cytokine-inducible deubiquitylating enzymes that are produced by lymphocytes11–13. We have annotated three further members of this subfamily of haematopoietic proteases. The classification of mouse DUBs as orthologues of human USP17 genes is doubtful because, despite sequence similarities, their syntenic relationship is unclear. Accordingly, we have tentatively classified them as paralogous genes.

We have annotated six members of the C48 family of SUMO-1 proteases in the mouse genome, which are absent in the human genome. We have also included a family of recently described cysteine proteases with deubiquitylating activity containing the OTU-protease domain and tentatively called otubains14,15. This family should comprise 14 orthologues and one specific member in both human and mouse. All of them contain characteristic features of active proteases with the exception of TRABID and murine Cgi77b. The last protease included in our list of cysteine proteases is called HetF-like and forms part of the superfamily of caspase-haemoglobinase fold proteases16. Human and mouse HetF-like have a serine residue instead of the active-site cysteine present in cysteine proteases, and have been classified as non-protease homologues.

Metalloproteases

Table S3 | Metalloproteases

Code M01.003 M01.014 M01.023 M01.001 M01.018 M01.004 M01.008 M01.010 M01.011 M01.022 M01.028 M01.027 M01.972np M02.001 M02.006 M02.971np M03.001 M03.002 M03.006 M08.003 M10.034 M10.001 M10.003 M10.005 M10.008 Peptidase aminopeptidase A aminopeptidase B aminopeptidase MAMS aminopeptidase N aminopeptidase PILS leukotriene A4 hydrolase pyroglutamyl-peptidase II cytosol alanyl aminopeptidase leucyl-cystinyl aminopeptidase aminopeptidase B-like 1 aminopeptidase O aminopeptidase Q TBP-associated factor 2 angiotensin-converting enzyme 1 angiotensin-converting enzyme 2 angiotensin-converting enzyme 3 thimet oligopeptidase neurolysin mitochondrial intermediate peptidase leishmanolysin-2 collagenase-like B collagenase 1 gelatinase A stromelysin 1 matrilysin Human Gene LocusLink ENPEP 2028 RNPEP 6051 AMPEP 64167 ANPEP 290 ARTS1 51752 LTA4H 4048 TRHDE 29953 NPEPPS 9520 LNPEP 4012 RNPEPL1 57140 AOPEP 84909 AQPEP BG623101 TAF2 6873 ACE 1636 ACE2 59272 #ACE3 THOP1 7064 NLN 57486 MIPEP 4285 LMLN 89782 MMP1 4312 MMP2 4313 MMP3 4314 MMP7 4316 Locus 4q26 1q32 5q15 15q25 5q21 12q23 12q21 17q21 5q15 2q37 9q22 5q23 8q24 17q23 Xp21 17q23 19p13 5q13 13q12 3q29 11q22 16q22 11q22 11q22 Mouse Gene LocusLink Enpep 13809 Rnpep 215615 Anpep 16790 Arts1 80898 Lta4h 16993 Trhde 237553 Psa 19155 Lnpep 266720 Rnpepl1 98480 Aopep BAC31943 Aqpep 74574 Taf2 319944 Ace 11421 Ace2 70008 Ace3 217246 Thop1 50492 Nln 75805 Mipep 70478 Lmln 239833 Mcolb 83996 Mcola 83995 Mmp2 17390 Mmp3 17392 Mmp7 17393 Locus 3H1 1F 7D2 13C1 10C2 10D1 11D 13C1 1D 13B3 18C 15D 11E1 XF5 11E1 10C1 13D1 14C3 16B2 9A1 9A1 8C5 9A1 9A1 Syntenic yyyyyyyyyyyyyyy yyyyyyyy Identity 77 86 76 85 92 94 97 88 95 72 68 99 83 82 89 90 84 73 59 95 76 70
                                                                                                                                         
M13.091 PHEX endopeptidase PHEX 5251 Xp22 Phex 18675 XF4 y 96
M14.001 carboxypeptidase A1 CPA1 1357 7q32 Cpa1 109697 6A3 y 74
M14.002 carboxypeptidase A2 CPA2 1358 7q32 Cpa2 232680 6A3 y 86
M14.010 carboxypeptidase A3 CPA3 1359 3q24 Cpa3 12873 3A3 y 81
M14.017 carboxypeptidase A4 CPA4 51200 7q32 Cpa4 215225 6A3 y 84
M14.020 carboxypeptidase A5 CPA5 93979 7q32 Cpa5 76649 1A3 y 84
M14.018 carboxypeptidase A6 CPA6 57094 8q13 Cpa6 329093 1A3 y 86
M14.003 carboxypeptidase B CPB1 1360 3q25 Cpb1 76703 3A3 y 72
M14.009 carboxypeptidase U CPB2 1361 13q14 Cpb2 56373 14D2 y 82
M14.021 carboxypeptidase O CPO 130749 2q33 #Cpo 269201 1C2 y
M14.005 carboxypeptidase E CPE 1363 4q32 Cpe 12876 8B3 y 97
M14.004 carboxypeptidase N CPN 1369 10q25 Cpn 93721 19D1 y 66
M14.006 carboxypeptidase M CPM 1368 12q15 Cpm 70574 10D2 y 79
M14.011 carboxypeptidase D CPD 1362 17q11 Cpd 12874 11B4 y 93
M14.012 carboxypeptidase Z CPZ 8532 4p16 Cpz 242939 5B1 y 82
M14.015np carboxypeptidase X1 CPX1 56265 20p13 Cpx1 56264 2F3 y 86
M14.019np carboxypeptidase X2 CPX2 119587 10q26 Cpx2 55987 7F4 y 89
M14.951np adipocyte-enhancer binding prot. 1 AEBP1 165 7p13 Aebp1 11568 11A1 y 90
M16.002 insulysin IDE 3416 10q24 Ide 15925 19C3 y 97
M16.003 mitochondrial processing pept. β-sub PMPCB 9512 7q22 Pmpcb 73078 5A3 y 90
M16.005 nardilysin NRD1 4898 1p32 Nrd1 230598 4C7 y 93
M16.009 pitrilysin metalloprotease 1 PITRM1 10531 10p15 Pitrm1 69617 13A1 y 86
M16.971np mitochondrial processing protease INPP5E 23203 9q34 Inpp5e 66865 2A3 y 91
M16.973np UCR1 UQCRC1 7384 3p21 Uqcrc1 22273 9F2 y 88
M16.974np UCR2 UQCRC2 7385 16p12 Uqcrc2 67003 7F3 y 85
M16.976np mitoch. processing protease-like AMPP 133083 4q22
M17.001 leucyl aminopeptidase LAP3 51056 4p15 Lap3 66988 5B3 y 90
                                                                                                                                                                                                  
M17.006 aminopeptidase-like 1 NPEPL1 79716 20q13
M18.002 aspartyl aminopeptidase DNPEP 23549 2q36 Dnpep 13437 1C3 y 90
M19.001 membrane dipeptidase DPEP1 1800 16q24 Dpep1 13479 8E2 y 73
M19.002 membrane dipeptidase 2 DPEP2 64174 16q22 Dpep2 244632 8D2 y 70
M19.004 membrane dipeptidase 3 DPEP3 64180 16q22 Dpep3 71854 8D2 y 73
M20.005 glu-carboxypeptidase-like 1 CPGL 55748 18q22 Cpgl 66054 18E3 y 91
M20.006 glu-carboxypeptidase-like 2 CPGL2 84735 18q22 Cpgl2 240478 18E3 y 73
M20.971np HmrA-like protease HMRALP 135293 6q15 Hmralp 242377 4A5 y 83
M20.973np aminoacylase ACY1 95 3p21 Acy1 109652 9F1 y 85
M22.003 O-sialoglycoprotein endopeptidase OSGEP 55644 14q11 Osgep 66246 14C1 y 93
M22.004 O-sialoglycoprotein endopeptidase 2 OSGEP2 64172 2q32 Osgep2 72085 1C1 y 84
M24.001 methionyl aminopeptidase I METAP1 23173 4q24 Metap1 75624 3H2 y 92
M24.002 methionyl aminopeptidase II METAP2 10988 12q23 Metap2 56307 10C3 y 88
M24.028 methionyl aminopeptidase-like 1 METAPL1 254042 2q31 Metapl1 66559 2C3 y 95
M24.005 X-prolyl aminopeptidase 2 XPNPEP2 7512 Xq26 Xpnpep2 170745 XA3 y 81
M24.007 X-Pro dipeptidase PEPD 5184 19q13 Pepd 18624 7B1 y 90
M24.009 aminopeptidase P1 XPNPEPL 7511 10q25 Xpnpep1 170750 19D2 y 81
M24.026 aminopeptidase P homologue PEPP 63929 22q13 Pepp 321003 15E3 y 93
M24.973np proliferation-association protein 1 PA2G4 5036 12q13 Pa2g4 18813 10D3 y 98
M24.974np suppressor of Ty 16 homologue SUPT16H 11198 14q11 Supt16h 114741 14C1 y 98
M28.010 glutamate carboxypeptidase II FOLH1 2346 11p11 Folh1 53320 7E1 y 85
M28.011 NAALADASE L peptidase NAALADL 10004 11q13 NAALADL BN000129 19A y 80
M28.012 NAALADASE II NAALAD2 10003 11q14 Naalad2 72560 9A3 y 89
M28.975np NAALADASE III NAALAD3 254827 3q26 Naalad3 229149 3A3 y 63
M28.014 plasma Glu-carboxypeptidase PGCP 10404 8q22 Pgcp 54381 15B3 y 93
                                                                                                                                                                                                                                                     
M28.018 Ojeda peptidase OJP 79956 9p24 Ojp BAC38286 19C2 y 87
M28.972np transferrin receptor protein TFRC 7037 3q29 Trfr 22042 16B3 y 77
M28.973np transferrin receptor 2 protein TFR2 7036 7q22 Trfr2 50765 5G1 y 84
M28.974np glutaminyl cyclase QPCT 25797 2p22 Qpct 70536 17E3 y 81
M28.016 glutaminyl cyclase 2 QPCT2 54814 19q13 Qpct2 67369 7A2 y 84
M38.972np dihydroorotase CAD 790 2p23 Cad 69719 5B1 y 94
M38.973np dihydropyrimidinase DPYS 1807 8q22 Dpys 64705 15C y 88
M38.xxxnp dihydropyrimidinase-related prot. 1 CRMP1 1400 4p16 Crmp1 12933 5B2 y 96
M38.xxxnp dihydropyrimidinase-related prot. 2 DPYSL2 1808 8p21 Dpysl2 12934 14D1 y 98
M38.xxxnp dihydropyrimidinase-related prot. 3 DPYSL3 1809 5q32 Dpysl3 22240 18B3 y 98
M38.xxxnp dihydropyrimidinase-related prot. 4 DPYSL4 10570 10q26 Dpysl4 26757 7F5 y 93
M38.xxxnp dihydropyrimidinase-related prot. 5 DPYSL5 56896 2p23 Dpysl5 65254 5B1 y 98
M41.004 i-AAA protease YME1L1 10730 10p12 Yme1l1 27377 2A3 y 95
M41.006 paraplegin SPG7 6687 16q24 Spg7 234847 8E2 y 89
M41.010 Afg3-like protein 1 #AFG3L1 172 16q24 Afg3l1 114896 8E2 y
M41.007 Afg3-like protein 2 AFG3L2 10939 18p11 Afg3l2 69597 18E1 y 94
M43.004 pappalysin-1 PAPPA 5069 9q32 Pappa 18491 4C1 y 93
M43.005 pappalysin-2 PLAC3 60676 1q25 Plac3 240848 1H1 y 78
M47.001 procol. III N-endopeptidase PCOLN3 5119 16q24 #Pcoln3 BI690732 8E2 y
M48.003 FACE-1/ZMPSTE24 FACE1 10269 1p34 Face1 230709 4D1 y 91
M48.017 VVML VVML 115209 1p32 Vvml 67013 4C6 y 71
M49.001 dipeptidyl-peptidase III DPP3 10072 11q13 Dpp3 75221 19A y 92
M50.001 S2P protease MBTPS2 51360 Xp22 Mbtps2 270669 XF4 y 97

These belong to 26 distinct families. The M01 family contains 13 members in human and 12 in mouse, which lacks aminopeptidase MAMS. We propose the names aminopeptidases O and Q for the M01 proteases previously annotated as human hypothetical proteins FLJ14675 and BG623101. We have also identified orthologues for these genes located at mouse chromosomes 13B3 and 18C. In the M02 family, we have tentatively annotated a mouse gene for a third angiotensin-converting enzyme-like (Ace3), which is located at chromosome 11E1. We have classified Ace3 as a non-protease homologue because it contains the HQMGH sequence instead of the consensus Zn-binding HExxH motif. No expressed sequence tags (ESTs) have been found for mouse Ace3, which could be an inactive pseudogene, although the locus is apparently complete and conserved in the rat. The corresponding human gene is a pseudogene as a result of the accumulation of stop codons and frameshifts.

There are some differences between human and mouse members of the M10 family of matrix metalloproteases (MMPs). Mouse McolB, a diverging counterpart of human MMP1 is absent in human, whereas human matrilysin-2 (MMP26) is absent from mouse, although there are some gaps in the mouse genome region which could contain this missing gene. MMP23 has been recently duplicated in the human genome17, generating two closely related genes MMP23A and MMP23B. This region is artefactually collapsed in the available public and private genome sequences owing to the high sequence identity between both genes, and is erroneously considered as containing a single gene. Apparently, there is a single mouse MMP23gene, although the possibility that this region is duplicated in the mouse genome and has also been computer-collapsed can not be ruled out. In the family M12, we have annotated a new member within the meprin/tolloid subfamily18 .

The ADAM (a disintegrin and metalloprotease) subfamily of M12 metalloproteases19 shows important differences between both organisms. The genes for ADAM-1, -3, -4, -5, -6 and -25 are pseudogenes in the human but active genes in the mouse. ADAM-1 and -6 are duplicated in mouse, whereas ADAM-20 is duplicated in human (ADAM-20 and ADAM-21). Also, testases — a subgroup of ADAMs located at 8B1 — are mouse specific. We have annotated five further members of this family (testases 5–9), although they are intronless and their functional relevance remains to be shown. The group of ADAMTSs (ADAMs with thrombospondin domains) is completed with the inclusion of human and mouse ADAMTS-20. In the M14 family of carboxypeptidases, we have found that mouse carboxypeptidase O has been specifically inactivated by mutation and is annotated as a pseudogene20. Dihydroorotase and several dihydropyrimidinases have been included as non-protease homologues of bacterial isoaspartyl dipeptidases. The gene that encodes procollagen III N-endopeptidase is inactivated in mouse, thereby representing an interesting difference between both human and mouse degradomes, as there are no other functional members in the M47 family that could compensate this specific loss in mouse. We have annotated 14 human and 13 mouse proteins in the recently described M67 family of metalloisopeptidases21,22. All of them contain the JAMM motif, although some lack conserved residues that are predicted to be essential for proteolytic activity, and have therefore been classified as non-protease homologues.

There are doubts about the ascription of the FACE-2/RCE1 prenyl endopeptidase to the cysteine or metalloprotease classes of enzymes23; however, in agreement with recent structural comparisons24, we have included it as the only human and mouse representative of a new family of membrane-bound metalloproteases. Finally, we have included three aminoacylases in our catalogue of metalloproteases. These enzymes are not, strictly speaking, proteases because they cleave peptide bonds that connect an acyl derivative with an amino acid25. However, the structure of ACY1 clearly allows its inclusion in the M20 family of metalloproteases, whereas those of ACY2 and ACY3 have also been proposed to be part of a superfamily of metalloproteases that contains members of the M14 family of carboxypeptidases26.

Serine proteases

                                                                                                                                                                                                                          
S01.192 complement component C1ra C1R 715 12p13 C1ra 50909 6F2 y 81
S01.xxx complement component C1rb C1rb AF459018 (6F2)
S01.193 complement component C1sa C1S 716 12p13 C1sa 50908 6F2 y 74
S01.xxx complement component C1sb C1sb 317677 6F2
S01.191 complement factor D DF 1675 19p13 Df 11537 10C1 y 67
S01.xxx complement factor D-like DF2 199783 19p13 Df2 270746 10C1 y 79
S01.199 complement factor I IF 3426 4q25 If 12630 3H1 y 69
S01.198 MASP1/3 MASP1/3 5648 3q29 Masp1/3 17174 16B1 y 86
S01.229 MASP2 MASP2 10747 1p36 Masp2 17175 4E1 y 81
S01.237 neurotrypsin PRSS12 8492 4q28 Prss12 19142 3G3 y 82
S01.231 u-plasminogen activator PLAU 5328 10q22 Plau 18792 14B y 69
S01.232 t-plasminogen activator PLAT 5327 8p11 Plat 18791 8A3 y 80
S01.233 plasminogen PLG 5340 6q26 Plg 18815 17A2 y 79
S01.976np hepatocyte growth factor HGF 3082 7q21 Hgf 15234 5A3 y 91
S01.975np macrophage-stimulating protein MSP 4485 3p21 Msp 15235 9F2 y 80
S01.999np apolipoprotein LPA 4018 6q26
S01.223 acrosin ACR 49 22q13 Acr 11434 15F1 y 68
S01.972np haptoglobin-1 HP 3240 16q22 Hp 15439 8D3 y 79
S01.974np haptoglobin-related protein HPR 3250 16q22
S01.277 osteoblast serine protease HTRA1 5654 10q26 Htra1 56213 7F4 y 91
S01.278 HTRA2 HTRA2 27429 2p12 Htra2 64704 6D1 y 84
S01.284 HTRA3 HTRA3 94031 4p16 Htra3 78558 5B1 y 86
S01.285 HTRA4 HTRA4 203100 8p11 Htra4 66943 8A3 y 66
S01.309 umbilical vein protease SPUVE 11098 11q14 Spuve 76453 7E1 y 90
S01.994np similar to SPUVE SPUVE2 167681 6q14 Spuve2 244954 9E3 y 77
S01.104 plasma-kallikrein-like 1 KLKBL1 XP_116753 8p23 Klkbl1 74215 (14C3) 66
S01.415 plasma-kallikrein-like 2 KLKBL2 203074 8p23 Klkbl2 71037 14C3 y 71
S01.419 plasma-kallikrein-like 3 #KLKBL3 8p23 Klkbl3 73382 14C3 y
                                                                                                                                                                                                                                
S01.992np plasma-kallikrein-like 4 KLKBL4 221191 16q21 Klkbl4 BN000132 8C5 y 62
S01.286 similar to Arabidopsis Ser-prot. SASP 219743 10q22 Sasp 71767 10B4 y 80
S01.991np chymase-like serine protease Clsp 75106 XC3
S08.063 site-1 protease MBTPS1 8720 16q23 Mbtps1 56453 8E1 y 96
S08.039 proprotein convertase 9 PCSK9 255738 1p32 Pcsk9 100102 4C7 y 73
S08.090 tripeptidyl-peptidase II TPP2 7174 13q33 Tpp2 22019 1C1 y 95
S08.072 proprotein convertase 1 PCSK1 5122 5q15 Pcsk1 18548 13C1 y 93
S08.073 proprotein convertase 2 PCSK2 5126 20p12 Pcsk2 18549 2H1 y 97
S08.071 furin PCSK3 5045 15q26 Pcsk3 18550 7D2 y 94
S08.074 proprotein convertase 4 PCSK4 5124 19p13 Pcsk4 18551 10C1 y 82
S08.076 proprotein convertase 5 PCSK5 5125 9q21 Pcsk5 18552 19B y 92
S08.075 PACE4 proprotein convertase PCSK6 5046 15q26 Pcsk6 18553 7C y 93
S08.077 proprotein convertase 7 PCSK7 9159 11q23 Pcsk7 18554 9B y 88
S09.001 prolyl oligopeptidase PREP 5550 6q22 Prep 19072 10B2 y 96
S09.015 prolyl-oligopeptidase 2 PREP2 9581 2p21 Prep2 213760 17E4 y 94
S09.003 dipeptidyl-peptidase 4 DPP4 1803 2q24 CD26 13482 2C3 y 85
S09.973np dipeptidyl-peptidase 6 DPP6 1804 7q36 Dpp6 13483 5A3 y 91
S09.018 dipeptidyl-peptidase 8 DPP8 54878 15q23 Dpp8 74388 9D y 95
S09.019 dipeptidyl-peptidase 9 DPP9 91039 19p13 Dpp9 224897 17D y 89
S09.974np dipeptidyl-peptidase 10 DPP10 57628 2q14 Dpp10 269109 1E2 y 88
S09.007 Seprase FAP 2191 2q24 Fap 14089 2C3 y 90
S09.004 acylaminoacyl-peptidase APEH 327 3p21 Apeh 235606 9F2 y 91
S09.055 CGI-67 protein CGI-67 51104 9q21 Cgi-67 BN000127 19C1 y 98
S09.052 CGI-67-like protease-1 CGI-67L1 81926 19p13 Cgi-67l1 216169 10C1 y 93
S09.053 CGI-67-like protease-2 CGI-67L2 58489 15q25 Cgi-67l2 70178 7D3 y 97
S09.051 BEM46-like 1 BEM46L1 84945 13q33 Bem46l1 68904 8A2 y 97
                                                                                                                                                                                                                                                                             
S09.054 BEM46-like 2 BEM46L2 26090 20p11 Bem46l2 76192 2H1 y 90
S09.xxx BEM46-like 3 BEM46L3 BG74273 14q22 Bem46l3 278594 12C3 y 78
S10.002 lysosomal carboxypeptidase A PPGB 5476 20q13 Ppgb 19025 2H3 y 87
S10.003 vitellogenic carboxypeptidase-L CPVL 54504 7p15 Cpvl 71287 6B3 y 76
S10.013 serine carboxypeptidase 1 RISC 59342 17q23 Risc 74617 11C y 82
S12.004 β-lactamase LACTB 114294 15q22 Lactb 80907 9D y 85
S14.003 endopeptidase Clp CLPP 8192 19p13 Clpp 53895 17E1 y 87
S16.002 PIM1 endopeptidase PRSS15 9361 19p13 Prss15 74142 17E1 y 88
S16.006 PIM2 endopeptidase PIM2 83752 16q21 Pim2 66887 8C4 y 95
S26.009 signalase 18 kDa component SPC18 23478 15q25 Spc18 56529 7D2 y 98
S26.010 signalase 21 kDa component SPC21 90701 18q21 Spc21 66286 18E1 y 98
S26.xxx signalase-like 1 SPCL1 158326 9p22 Spcl1 230344 4C3 y 76
S26.012 mitoc. inner membrane protease 2 IMMP2L 83943 7q31 Immp2l 93757 12B3 y 90
S26.013 mitochondrial signal peptidase IMMP1 196294 11p13 Immp1 66541 2E3 y 95
S26.xxx lactotransferrin LTF 4057 3p21 Ltf 17002 9F2 y 70
S28.001 lysosomal Pro-X carboxypeptidase PRCP 5547 11q14 Prcp 72461 7E2 y 77
S28.002 dipeptidyl-peptidase II DPP7 29952 (9q24) Dpp7 83768 2A3 y 80
S28.003 thymus-specific serine peptidase PRSS16 10279 6p21 Prss16 54373 13A3 y 79
S33.009 αβ-hydrolase dom. containing 4 ABHD4 63874 14q11 Abhd4 105501 14C1 y 96
S33.971np epoxyde hydrolase EPHX1 2052 1q42 Ephx1 13849 1H4 y 83
S33.972np Mesoderm specific transcript hom. MEST 4232 7q32 Mest 17294 6A3 y 97
S33.974np epoxyde hydrolase related protein EPHXRP 253152 1p22 Ephxrp 243192 5E y 87
S33.xxxnp CGI-58 CGI-58 51099 3p21 Cgi-58 67469 9F4 y 94
S53.003 tripeptidyl-peptidase I CLN2 1200 11p15 Cln2 12751 7F1 y 88

Most of these belong to the S01 family, but there are representatives of 13 further serine protease families in the human and mouse degradomes. All differences between human and mouse serine proteases correspond to changes in members of this densely populated family. The kallikreins are duplicated in mouse almost entirely — there are 28 members in mouse and 15 in human. The genes for mastin, implantation serine protease-2 (ISP-2), intestinal serine protease (DISP-1), and testis serine proteases TESP-2 and -3, are inactivated in human hence their classification as pseudogenes. The absence of genes for human DISP-2, ISP-1 and TESP-1, together with the finding that human DISP-1, ISP-2, TESP-2 and TESP-3 are pseudogenes, indicates that the functions performed by ISP, DISP and TESP proteases might be mouse-specific. We have also annotated several new members of the testis-specific serine protease (TESSP) subfamily, with TESSP-3, -4 and -6 being pseudogenes in human and active genes in mouse. Mast-cell proteases (Mcpt), granzymes (Gzm), trypsins and human-airway trypsin-like (HAT-like) proteases are expanded in mouse; two tryptases, an ovochymase-like protease and a form of pancreatic elastase, are only present in human. Two well-known non-protease homologues, apolipoprotein (a) (LPA) and haptoglobin-related protein, are absent in mouse. Further characteristic features of the mouse degradome include the duplication of complement factors C1r and C1s, and the presence of an extra functional member of the plasma-kallikrein like subfamily (Klkbl3), and of a non-protease homologue called Clsp (chymase-like serine protease).

We have included in the catalogue of serine proteases, a series of proteins such as lactoferrin, reelin and tumour rejection antigen (gp96), which have been recently reported to have this kind of proteolytic activity27–29. On the basis of structural analysis, lactoferrin has been tentatively classified as a member of the S26 family of serine proteases, whereas reelin, gp96 and their close relatives have been preliminarily ascribed to two Sx families of presently unclassified serine proteases. Gene Ontology annotation of the human proteome also predicts a series of serine proteases with minimal relationship to other members of this class of enzymes. They include torsin, NSP (novel serine protease) and Ufd1L (ubiquitin fusion degradation protein 1 homologue), but owing to the absence of enough evidence to support its ascription as serine proteases, they have not been included in the present version of the human and mouse degradomes.

Threonine proteases

                             
T03.016 γ-glutamyltransferase m-3 GGTL4 91227 22q11
T03.002 γ-glutamyltransferase 5 GGTLA1 220522 22q11

The most recently identified catalytic class of proteases, the threonine proteases30, are classified into three families: T01, containing the proteasome components; T02, composed of three distinct glycosylasparaginases; and T03, including diverse γ-glutamyltransferases (GGTs). All members of the T01 and T02 families are conserved between human and mouse. There are, however, some differences in the number of GGT genes clustered in a region of chromosome 22, which has undergone successive duplications31. As a consequence of this dynamic evolution, there are four GGT genes in this region of the human genome but only one in the corresponding region of the mouse genome (10B5). An additional GGT gene located at 20q11 is conserved in the mouse genome at an equivalent position (2H2).

  1. Evers, M. P. et al.Nucleotide sequence comparison of five human pepsinogen A (PGA) genes: evolution of the PGA multigene family. Genomics4, 232–239 (1989).
  2. Taggart, R. T., Mohandas, T. K., Shows, T. B. & Bell, G. I. Variable numbers of pepsinogen genes are located in the centromeric region of human chromosome 11 and determine the high-frequency electrophoretic polymorphism. Proc. Natl Acad. Sci. USA82, 6240–6244 (1985).
  3. Chen, X., Rosenfeld, C. S., Roberts, R. M. & Green, J. A. An aspartic proteinase expressed in the yolk sac and neonatal stomach of the mouse. Biol. Reprod.65, 1092–1101 (2001).
  4. Ord, T., Kolmer, M., Villems, R. & Saarma, M. Structure of the human genomic region homologous to the bovine prochymosin-encoding gene. Gene91, 241–246 (1990).
  5. Krylov, D. M. & Koonin, E. V. A novel family of predicted retroviral–like aspartyl proteases with a possible key role in eukaryotic cell cycle control. Curr. Biol.11, 584 (2001).
  6. Turner, G. et al.Insertional polymorphisms of full-length endogenous retroviruses in humans. Curr. Biol.11, 1531–1535 (2001).
  7. Caputo, E., Manco, G., Mandrich, L. & Guardiola, J. A novel aspartyl proteinase from apocrine epithelia and breast tumors. J. Biol. Chem. 275, 7935–7941 (2000).
  8. Lee, J. J. et al.Autoproteolysis in hedgehog protein biogenesis. Science266, 1528–1537 (1994).
  9. Gondo, Y. et al.Human megasatellite DNA RS447: copy-number polymorphisms and interspecies conservation. Genomics54, 39–49 (1998).
  10. Okada, T. et al.Unstable transmission of the RS447 human megasatellite tandem repetitive sequence that contains the USP17 deubiquitinating enzyme gene. Hum. Genet.110, 302–313 (2002).
  11. Zhu, Y., Carroll, M., Papa, F. R., Hochstrasser, M. & D‘Andrea, A. D. DUB-1, a deubiquitinating enzyme with growth-suppressing activity. Proc. Natl Acad. Sci. USA93, 3275–3279 (1996).
  12. Zhu, Y. et al.DUB-2 is a member of a novel family of cytokine-inducible deubiquitinating enzymes. J. Biol. Chem.272, 51–57 (1997).
  13. Baek, K. H., Mondoux, M. A., Jaster, R., Fire-Levin, E. & D‘Andrea, A. D. DUB-2A, a new member of the DUB subfamily of hematopoietic deubiquitinating enzymes. Blood98, 636–642 (2001).
  14. Evans, P. C. et al.A novel type of deubiquitinating enzyme. J. Biol. Chem. (in the press).
  15. Balakirev, M. Y., Tcherniuk, S. O., Jaquinod, M. & Chroboczek, J. Otubains: a new family of cysteine proteases in the ubiquitin pathway. EMBO Rep.4, 517–522 (2003).
  16. Aravind, L. & Koonin, E. V. Classification of the caspase-hemoglobinase fold: detection of new families and implications for the origin of the eukaryotic separins. Proteins46, 355–367 (2002).
  17. Gururajan, R. et al.Duplication of a genomic region containing the Cdc2L1-2and MMP21-22genes on human chromosome 1p36.3 and their linkage to D1Z2. Genome Res.8, 929–939 (1998).
  18. Bertenshaw, G. P., Norcum, M. T. & Bond, J. S. Structure of homo- and hetero-oligomeric meprin metalloproteases: dimers, tetramers, and high molecular mass multimers. J. Biol. Chem.278, 2522–2532 (2003).
  19. Seals, D. F. & Courtneidge, S. A. The ADAMs family of metalloproteases: multidomain proteins with multiple functions. Genes Dev.17, 7–30 (2003).
  20. Wei, S. et al.Identification and characterization of three members of the human metallocarboxypeptidase gene family. J. Biol. Chem. 277, 14954–14964 (2002).
  21. Verma, R. et al.Role of Rpn11 metalloprotease in deubiquitination and degradation by the 26S proteasome. Science298, 611–615 (2002).
  22. Yao, T. & Cohen, R. E. A cryptic protease couples deubiquitination and degradation by the proteasome. Nature419, 403–407 (2002).
  23. Cadiñanos, J. et al.Identification, functional expression and enzymatic analysis of two distinct CaaX proteases from Caenorhabditis elegans. Biochem. J.370, 1047–1054 (2003).
  24. Pei, J. & Grishin, N. V. Type II CAAX prenyl endopeptidases belong to a novel superfamily of putative membrane-bound metalloproteases. Trends Biochem. Sci.26, 275–277 (2001).
  25. Biagini, A. & Puigserver, A. Sequence analysis of the aminoacylase-1 family: a new proposed signature for metalloexopeptidases. Comp. Biochem. Physiol. B 128, 469–481 (2001).
  26. Makarova, K. S. & Grishin, N. V. The Zn-peptidase superfamily: functional convergence after evolutionary divergence. J. Mol. Biol.292, 11–17 (1999).
  27. Hendrixson, D. R. et al.Human milk lactoferrin is a serine protease that cleaves Haemophilus surface proteins at arginine-rich sites. Mol. Microbiol.47, 607–617 (2003).
  28. Quattrocchi, C. C. et al.Reelin is a serine protease of the extracellular matrix. J. Biol. Chem.277, 303–309 (2002).
  29. Menoret, A., Li, Z., Niswonger, M. L., Altmeyer, A. & Srivastava, P. K. An endoplasmic reticulum protein implicated in chaperoning peptides to major histocompatibility of class I is an aminopeptidase. J. Biol. Chem. 276, 33313–33318 (2001).
  30. Seemuller, E. et al.Proteasome from Thermoplasma acidophilum: a threonine protease. Science268, 579–582 (1995).
  31. Courtay, C., Heisterkamp, N., Siest, G. & Groffen, J. Expression of multiple γ-glutamyltransferase genes in man. Biochem. J.297, 503– 508 (1994).