Overall Lab Protease Degradome & Degradomics Home Page & Papers
This page displays the complete listing of the human and murine protease degradomes
Human and mouse proteases are divided into five classes, which are subdivided into families according to the MEROPS database criteria (Tables S1–S5). We have provided the MEROPS code for all enzymes for which they are available. There are some conflicting cases in which different codes have been previously assigned to human and mouse protease genes that were shown in this work to be true orthologues. In these cases, the human code is proposed for both orthologues. The genes encoding protease-like proteins that show changes in crucial residues for proteolytic activity are indicated as ‘np’ (non-protease homologues) after the code.
The Locus link or nucleotide accession number is provided for each protease. The information for human enzymes is labelled in green and for mouse in yellow. Genes that are absent from human or mouse are labelled in red. Genes that have been inactivated by mutation in one species, but are functional in the other, are labelled in pink. Although these specific pseudogenes have been included in the Tables to emphasize the human–mouse difference, they have not been incorporated into the final counts of protease genes. Genes that have been verified experimentally, but the sequence of which is missing from the available genome sequences, are indicated in red and in parentheses. ‘Y’ indicates that the corresponding human and mouse genes are syntenic. The percentage of identities between orthologous proteases are also shown.
These are divided into four families: A01, A02, A22 and Ax1. There are several pepsinogen A isozymogens encoded by highly related genes (>95% identities) that form part of a cluster located at 11q12. The individual pepsinogen A isozymogens result from haplotypes that contain different number of genes (ranging from 1 to 4)1,2. In agreement with other databases, this region has been annotated as a single gene in human. According to the criteria discussed above, we have assigned mouse pepsinogen F as the orthologue of human pepsinogen A, despite notable divergence of their structure and regulation3 . Ren2 is absent in some strains of laboratory mice. The gene that encodes prochymosin has been inactivated by mutations and frameshifts in the human genome and is classified as a pseudogene, although in mouse and other species it is functional4 .
The genes DDI1, DDI2, DDI-RP, NRIP2 and NRIP3 are included in the family A02 that contains predicted retroviral-like aspartic proteases5. All of these have mouse orthologues at syntenic regions, and are not embedded in endogenous retroviral elements. The human and mouse genomes also contain several aspartic protease-related sequences derived from endogenous retrovirus, but we have not annotated these as human or mouse proteases. In this regard, it is remarkable that most of the retroviruses embedded in both genomes have suffered inactivating mutations, also affecting the putative proteases that are encoded by these viral elements. However, HERV-K113, for example, which is located at 19p13 in ~30% of the human population, has intact open-reading frames for all viral proteins, including the corresponding aspartic protease, and remains capable of reinfecting human today6. The catalogue of aspartic proteases also includes a new family that is derived from the protein prolactin inducible protein/gross cystic disease fluid protein-15 (PIP/GCDFP15), which has recently been characterized as a protease belonging to this class of enzymes7. The four PIP-related proteins lack residues proposed to be essential for PIP proteolytic activity and have been classified as non-protease homologues.
The cysteine proteases belong to 16 different families, and include proteins such as hedgehog family members, the protease function of which is only used for the autolytic processing of their respective precursors8. The C01 family is largely expanded in the mouse as a result of the presence of placental cathepsins and testins. We have annotated two further mouse testins, including testin-3, which was the first member of this subfamily predicted to be a functional protease. There are two functional human cathepsin L-like genes (CTSL and CTSL2) at 9q21, and a single gene in the mouse, which is more closely related to CTSL2. The cylindromatosis protein contains an ubiquitin C-terminal hydrolase domain and has been included in the C12 family. The genes for calpain 14, caspase 5 and caspase 10 are absent in mice, and the human gene for caspase 12 has been inactivated and is therefore classified as a pseudogene. We have annotated a second human legumain-like gene that is absent in mouse.
The C19 family of ubiquitin specific proteases (USPs) is large and complex. We have annotated 21 human members (USP30, 31, 34–52) and assigned their corresponding mouse orthologues. We have not found mouse orthologues for human USP6, -13, -34, -37, -42 and -51. USP17is located within the RS447 human megasatellite at 4p159. This region is highly polymorphic in the human genome, containing a variable number of USP17-related intronless tandemlyrepeated sequences (>95% identical), which have probably been generated by retrotransposition. Forty-four distinct alleles in 74 unrelated chromosomes containing 20–103 copies of the RS477 unit have been identified10. We have also identified several USP17-related sequences in a cluster located at 8p25. This cluster would contain at least seven USP17-like (USP17L) intronless genes (three of these are classified as non-protease homologues) and pseudogenes. The proteins encoded by these polymorphic and variable regions have been annotated as two single proteases (USP17 and USP17L) in this table. The closest relatives of USP17 genes in the mouse genome are those that code for proteins called DUBs (deubiquitinating enzymes). DUB1, DUB2, and DUB2A have been extensively characterized as members of a novel group of cytokine-inducible deubiquitylating enzymes that are produced by lymphocytes11–13. We have annotated three further members of this subfamily of haematopoietic proteases. The classification of mouse DUBs as orthologues of human USP17 genes is doubtful because, despite sequence similarities, their syntenic relationship is unclear. Accordingly, we have tentatively classified them as paralogous genes.
We have annotated six members of the C48 family of SUMO-1 proteases in the mouse genome, which are absent in the human genome. We have also included a family of recently described cysteine proteases with deubiquitylating activity containing the OTU-protease domain and tentatively called otubains14,15. This family should comprise 14 orthologues and one specific member in both human and mouse. All of them contain characteristic features of active proteases with the exception of TRABID and murine Cgi77b. The last protease included in our list of cysteine proteases is called HetF-like and forms part of the superfamily of caspase-haemoglobinase fold proteases16. Human and mouse HetF-like have a serine residue instead of the active-site cysteine present in cysteine proteases, and have been classified as non-protease homologues.
These belong to 26 distinct families. The M01 family contains 13 members in human and 12 in mouse, which lacks aminopeptidase MAMS. We propose the names aminopeptidases O and Q for the M01 proteases previously annotated as human hypothetical proteins FLJ14675 and BG623101. We have also identified orthologues for these genes located at mouse chromosomes 13B3 and 18C. In the M02 family, we have tentatively annotated a mouse gene for a third angiotensin-converting enzyme-like (Ace3), which is located at chromosome 11E1. We have classified Ace3 as a non-protease homologue because it contains the HQMGH sequence instead of the consensus Zn-binding HExxH motif. No expressed sequence tags (ESTs) have been found for mouse Ace3, which could be an inactive pseudogene, although the locus is apparently complete and conserved in the rat. The corresponding human gene is a pseudogene as a result of the accumulation of stop codons and frameshifts.
There are some differences between human and mouse members of the M10 family of matrix metalloproteases (MMPs). Mouse McolB, a diverging counterpart of human MMP1 is absent in human, whereas human matrilysin-2 (MMP26) is absent from mouse, although there are some gaps in the mouse genome region which could contain this missing gene. MMP23 has been recently duplicated in the human genome17, generating two closely related genes MMP23A and MMP23B. This region is artefactually collapsed in the available public and private genome sequences owing to the high sequence identity between both genes, and is erroneously considered as containing a single gene. Apparently, there is a single mouse MMP23gene, although the possibility that this region is duplicated in the mouse genome and has also been computer-collapsed can not be ruled out. In the family M12, we have annotated a new member within the meprin/tolloid subfamily18 .
The ADAM (a disintegrin and metalloprotease) subfamily of M12 metalloproteases19 shows important differences between both organisms. The genes for ADAM-1, -3, -4, -5, -6 and -25 are pseudogenes in the human but active genes in the mouse. ADAM-1 and -6 are duplicated in mouse, whereas ADAM-20 is duplicated in human (ADAM-20 and ADAM-21). Also, testases — a subgroup of ADAMs located at 8B1 — are mouse specific. We have annotated five further members of this family (testases 5–9), although they are intronless and their functional relevance remains to be shown. The group of ADAMTSs (ADAMs with thrombospondin domains) is completed with the inclusion of human and mouse ADAMTS-20. In the M14 family of carboxypeptidases, we have found that mouse carboxypeptidase O has been specifically inactivated by mutation and is annotated as a pseudogene20. Dihydroorotase and several dihydropyrimidinases have been included as non-protease homologues of bacterial isoaspartyl dipeptidases. The gene that encodes procollagen III N-endopeptidase is inactivated in mouse, thereby representing an interesting difference between both human and mouse degradomes, as there are no other functional members in the M47 family that could compensate this specific loss in mouse. We have annotated 14 human and 13 mouse proteins in the recently described M67 family of metalloisopeptidases21,22. All of them contain the JAMM motif, although some lack conserved residues that are predicted to be essential for proteolytic activity, and have therefore been classified as non-protease homologues.
There are doubts about the ascription of the FACE-2/RCE1 prenyl endopeptidase to the cysteine or metalloprotease classes of enzymes23; however, in agreement with recent structural comparisons24, we have included it as the only human and mouse representative of a new family of membrane-bound metalloproteases. Finally, we have included three aminoacylases in our catalogue of metalloproteases. These enzymes are not, strictly speaking, proteases because they cleave peptide bonds that connect an acyl derivative with an amino acid25. However, the structure of ACY1 clearly allows its inclusion in the M20 family of metalloproteases, whereas those of ACY2 and ACY3 have also been proposed to be part of a superfamily of metalloproteases that contains members of the M14 family of carboxypeptidases26.
Most of these belong to the S01 family, but there are representatives of 13 further serine protease families in the human and mouse degradomes. All differences between human and mouse serine proteases correspond to changes in members of this densely populated family. The kallikreins are duplicated in mouse almost entirely — there are 28 members in mouse and 15 in human. The genes for mastin, implantation serine protease-2 (ISP-2), intestinal serine protease (DISP-1), and testis serine proteases TESP-2 and -3, are inactivated in human hence their classification as pseudogenes. The absence of genes for human DISP-2, ISP-1 and TESP-1, together with the finding that human DISP-1, ISP-2, TESP-2 and TESP-3 are pseudogenes, indicates that the functions performed by ISP, DISP and TESP proteases might be mouse-specific. We have also annotated several new members of the testis-specific serine protease (TESSP) subfamily, with TESSP-3, -4 and -6 being pseudogenes in human and active genes in mouse. Mast-cell proteases (Mcpt), granzymes (Gzm), trypsins and human-airway trypsin-like (HAT-like) proteases are expanded in mouse; two tryptases, an ovochymase-like protease and a form of pancreatic elastase, are only present in human. Two well-known non-protease homologues, apolipoprotein (a) (LPA) and haptoglobin-related protein, are absent in mouse. Further characteristic features of the mouse degradome include the duplication of complement factors C1r and C1s, and the presence of an extra functional member of the plasma-kallikrein like subfamily (Klkbl3), and of a non-protease homologue called Clsp (chymase-like serine protease).
We have included in the catalogue of serine proteases, a series of proteins such as lactoferrin, reelin and tumour rejection antigen (gp96), which have been recently reported to have this kind of proteolytic activity27–29. On the basis of structural analysis, lactoferrin has been tentatively classified as a member of the S26 family of serine proteases, whereas reelin, gp96 and their close relatives have been preliminarily ascribed to two Sx families of presently unclassified serine proteases. Gene Ontology annotation of the human proteome also predicts a series of serine proteases with minimal relationship to other members of this class of enzymes. They include torsin, NSP (novel serine protease) and Ufd1L (ubiquitin fusion degradation protein 1 homologue), but owing to the absence of enough evidence to support its ascription as serine proteases, they have not been included in the present version of the human and mouse degradomes.
The most recently identified catalytic class of proteases, the threonine proteases30, are classified into three families: T01, containing the proteasome components; T02, composed of three distinct glycosylasparaginases; and T03, including diverse γ-glutamyltransferases (GGTs). All members of the T01 and T02 families are conserved between human and mouse. There are, however, some differences in the number of GGT genes clustered in a region of chromosome 22, which has undergone successive duplications31. As a consequence of this dynamic evolution, there are four GGT genes in this region of the human genome but only one in the corresponding region of the mouse genome (10B5). An additional GGT gene located at 20q11 is conserved in the mouse genome at an equivalent position (2H2).