The first, second, third and fourth order Markov chain analysis on the amino-acid sequence of human dopamine β-hydroxylase

Article metrics

Abstract

The repeated amino-acid sequences in human dopamine β-hydroxylase (DBH) may be indispensable for DBH activity, because such repetitions cannot be simply attributed to random chance. The amino acid sequence of human DBH was analysed according to two-, three-, four- and five-amino-acid sequences and their probabilities in human DBH were calculated. The first, second, third and fourth order Markov chain was used to calculate the transition probability for two-, three-, four- and five-amino-acid sequences. The longest repeated sequence is glycine-isoleucine-leucine-glutamic acid-glutamic acid, which appears twice in DBH. The results suggest that the amino acids with a high Markov transition probability may serve as the potential targets of new drugs, because they are unlikely to change into other amino acids.

Main

Human dopamine β-hydroxylase (DBH, EC 1.14.17.1) is the enzyme catalysing the conversion of dopamine into noradrenaline, and its activity has been found to be altered in schizophrenia.1, 2, 3, 4 Further analysis of human DBH primary structure is important for understanding of its activity and its role in schizophrenia. The human DBH is composed of 603 amino acids.5, 6 Any two amino acids in order can construct a two-amino-acid sequence, thus a total of 602 two-amino-acid sequences can be constructed, ie the first and second, the second and third, the third and fourth, etc. Furthermore, any three amino acids in order can also construct a three-amino-acid sequence, thus a total of 601 three-amino-acid sequences can be constructed, ie the first, second and third; the second, third and fourth; etc. Still, a total of 600 sequences can be constructed for four amino-acid sequences, and a total of 599 sequences can be constructed for five amino-acid sequences. The aim of such a study of amino-acid sequences is to view amino-acid sequences as possible ‘words’ in an unknown ‘language’ in an abstract sense and to determine: (i) whether an amino acid ‘word’ can be constructed by three amino acids as a DNA message constructed by three elements. At this stage an amino acid ‘word’ can be considered to be constructed by any sequence of amino acids; and (ii) whether there are ‘punctuation’ and ‘space’ sequences in an amino acid sequence of a protein. We do not know where an amino acid ‘word’ begins and finishes, consequently an amino acid ‘word’ can begin and finish anywhere.

Two amino acids in a two-amino-acid sequence can be constructed from any one of 20 amino acids, thus there are 400 (202) possible two-amino-acid sequences. Any two-amino-acid sequence in human DBH should be one of these 400 possible sequences and any two-amino-acid sequence which does not appear in human DBH should also be one of these 400 possible sequences. If each two-amino-acid sequence had the same probability to construct the human DBH, one would expect each two-amino-acid sequence to appear about 1.505 times (602/400). Similarly, three amino acids in a three-amino-acid sequence can be constructed from any one of 20 amino acids, thus there are 8000 (203) possible three-amino-acid sequences. If each three-amino-acid had the same probability to construct the human DBH, one would expect each three-amino-acid sequence to appear about 0.075 times (601/8000). Moreover, there are 160 000 (204) possible four-amino-acid sequences and 3 200 000 (205) possible five-amino-acid sequences. Not surprisingly, some kinds of sequences may not appear at all in human DBH, not only because the human DBH does not have such a long amino acid structure to hold all possible combinations, but also more importantly because the evolution process determines the preference of some particular amino-acid sequences, some of which would appear more frequently.

In a two-amino-acid sequence, the issue of which amino acid is more likely to follow a preceding amino acid is also interesting. In an ideally random situation, each amino acid could be possible, thus the probability to follow a preceding amino acid is 1/20. There are 39 alanines (A) in human DBH, an ‘A’ would have a probability of 0.065 (39/602) to follow a preceding amino acid, for example, a glutamine (Q). This probability is true in the real situation, therefore the fact that an ‘A’ follows a preceding ‘Q’ can be explained by a purely random mechanism. Similarly, an ‘A’ would have 0.063 (38/602) to follow a preceding ‘A’ according to a purely random mechanism, but an ‘A’ has the probability of 0.077 to follow a preceding ‘A’ in the real situation. This real probability is what the Markov chain calculates (the first order Markov chain transition probability). This issue is also important for the understanding of DBH structure and activity.

One hundred and seventeen of 400 (29.250%) possible two-amino-acid sequences do not exist. It means that these 117 kinds of sequences are not needed for the DBH function. As 117 kinds of two-amino-acid sequences do not exist, some two-amino-acid sequences appear more than once in the remaining 283 kinds of two-amino-acid sequences. These repeated sequences would not be considered to occur by chance. Due to the limitation of space, Table 1 shows the appeared-more-than-three-times two-amino-acid sequences; when a sequence of two-amino acids appears more than three times, its probability of appearance is larger than 0.005 (4/602). The first order Markov chain transition probability is the conditional probability that the second amino acid occurs in a two-amino-acid sequence, given the occurrence of the first amino acid, ie P(second amino acid|first amino acid). As two-letter words in English language, how large is the probability that the letter ‘e’ appears given the first letter is ‘w’. The first order Markov transition probability for the second amino acid in the two-amino-acid sequence, given a certain kind of the first amino acid is shown in parentheses in Table 1. For example, if the first amino acid in a two-amino-acid sequence is ‘A’, then the probability that the second amino acid is ‘G’ is 0.128. Of 602 measured first order Markov transition probabilities for the second amino acid in two-amino-acid sequences, 2 (0.332%) predicted conditional probabilities (QA and PR) match the measured first order Markov transition probabilities and therefore can be explained by a purely random mechanism.

Table 1 The appeared-more-than-three-times two-amino-acid sequences, their probabilities in DBH and their order of Markov chain transition probabilities (in parentheses)

Seven thousand four hundred and thirty-five of 8000 (92.938%) possible three-amino-acid sequences do not exist in human DBH; of course these sequences are not needed for the human DBH function. Among the remaining 565 kinds of sequences, some sequences appear more than once (Table 2), which should not be attributed to chance. It can be seen that the sequences of ‘QLL’ and ‘LEE’ are the three-amino-acid sequences that appear most frequently; each appears three times in the human DBH with a probability of 0.005 (3/601). The transition probability is the conditional probability that the third amino acid occurs in a three-amino-acid sequence given the occurrence of the first two amino acids, ie P(third amino acid|first and second amino acids). The second order Markov chain transition probability for the third amino acid in three-amino-acid sequences is shown in parentheses in Table 2. It can be seen that no other 19 amino acids but the amino acid ‘P’ can certainly appear when the first two amino acids are ‘LD’, for example. Similar cases can be seen in ‘PRE’, ‘PNI’, ‘PHF’ and ‘SLE’. No predicted conditional probability matches the measured second order Markov transition probability.

Table 2 The appeared-more-than-once three-amino-acid sequences, their probabilities in DBH and their second order Markov chain transition probabilities (in parentheses)

Five hundred and ninety-seven of 160 000 (0.373%) possible four amino-acid sequences exist in human DBH; the number of 597 is near to 600 sequences which four-amino-acid sequence can construct (see Introduction). This means that the repetition of four-amino-acid sequences is very rare. The other possible kinds of four amino-acid sequences are naturally not needed. There are only three sequences appearing twice in human DBH, ie ‘GILE’, ‘ILEE’ and ‘LEEP’, with the same appearance probability of 0.003 (2/600). The third order Markov transition probabilities for ‘GILE’, ‘ILEE’ and ‘LEEP’ given the occurrence of the first three amino acids are 1.000, 1.000, and 0.667, respectively. It is unlikely that the repetition of ‘GILE’, ‘ILEE’ and ‘LEEP’ occurs by chance, because such a probability is extremely low, ie 1/160 000. These sequences should play a very important role in the human DBH function.

Five hundred and ninety-eight of 3 200 000 (0.019%) possible five-amino-acid sequences exist in human DBH; the number of 598 is only one less than 599 sequences which five-amino-acid sequence can construct (see Introduction). This means only one repetition of five-amino-acid sequences and suggests that the other possible types of five-amino-acid sequences are not needed for DBH function. The repeated sequence is ‘GILEE’. ‘GILEE’ is in fact constructed by ‘GILE’ and ‘ILEE’ and ‘LEEP’ is also involved. The appearance probability of ‘GILEE’ is 0.003 (2/599) and its fourth order Markov transition probability for ‘GILEE’ given the occurrence of the first four amino acids is 1.000. The repetition of ‘GILEE’ is also impossible, because of extremely low probability (1/3 200 000). The two sequences of ‘GILEE’ are located from 157 to 161, and from 468 to 472 in human DBH. These two regions are somewhat near to the two potential glycosylation sites (170 ‘N’ and 552 ‘N’) in human DBH.

The results show that the Markov transition probability increases from two-amino-acid sequences to five-amino-acid sequences, therefore the random chance for an amino acid to follow an arbitrary amino acid decreases as the length of amino-acid sequence increases. Following this observation, it is likely that a mutation is unlikely to occur at the amino acid with a high Markov transition probability. Therefore, the amino acids with a high Markov transition probability may serve as the potential targets of new drugs, because they are unlikely to change into other amino acids.

At this stage, it is still difficult to establish the definitive relationship between the DBH primary structure and a possible functional role in schizophrenia, nevertheless, more efforts are needed to use more sophisticated models to assess this relationship in future.

Methods

The Markov chain is to calculate the transition probability from one state to another state. For example, the first order Markov chain deals with the transition from the first state to the second state. In the case of a two-amino-acid sequence, the second amino acid in a two-amino-acid sequence is unlikely to be any amino acid chosen randomly, but likely to be an amino acid in some subset of 20 kinds of amino acids. This constructs a conditional probability, ie which kind of amino acid is most likely to be the second amino acid in a two-amino-acid sequence given the first amino acid is a certain kind. If a three-amino-acid sequence is considered, the second order Markov chain can be defined, ie which kind of amino acid is most likely to be the third amino acid in a three-amino-acid sequence given the first two amino acid are certain kinds, and so on.7, 8, 9, 10

In order to compare the predicted conditional probability with the measured Markov transition probability, the predicted conditional probability for an amino acid to follow a preceding amino acid is calculated according to the random mechanism as stated in the Introduction. For example, there are 39 alanines and 28 arginines in human DBH. The predicted conditional probabilities for ‘AA’ and ‘RA’ are 38/602 and 39/602 for the second amino acid of ‘A’ in two-amino-acid sequences to follow an ‘A’ and a ‘R’; the predicted conditional probabilities for ‘AR’ and ‘RR’ are 28/602 and 27/602 for the second amino acid of ‘R’ to follow an ‘A’ and a ‘R’. The predicted conditional probability of the third amino acid of ‘A’ in a three-amino-acid sequence to follow ‘AA’ is 37/601, for example. The numbers of predicted conditional probabilities are identical to the numbers of possible two-, three-, four- and five-amino-acid sequences, ie 400 (202) for two-amino-acid sequences.

References

  1. 1

    Bowers MB . Central dopamine turnover in schizophrenic syndromes Arch Gen Psychiatry 1974; 31: 50–54

  2. 2

    Sternberg DE, van Kammen DP, Lerner P, Bunney WE . Schizophrenia: dopamine β-hydroxylase activity and treatment response Science 1982; 216: 1423–1425

  3. 3

    van Kammen DP, Mann LS, Sternberg DE, Scheinin M, Ninan PT, Marder SR et al. Dopamine-β-hydroxylase activity and homovanillic acid in spinal fluid of schizophrenics with brain atrophy Science 1983; 220: 974–977

  4. 4

    DeLisi LE, Wise CD, Potkin SG, Zalcman S, Phelps BH, Lovenberg W et al. Dopamine-β-hydroxylase, monoamino oxidase and schizophrenia Biol Psychiatry 1980; 15: 895–907

  5. 5

    Kobayashi K, Kurosawa Y, Fukita K, Nagatsu T . Human dopamine beta-hydroxylase gene: two mRNA types having different 3′-terminal regions are produced through alternative polyadenylation Nucleic Acids Res 1989; 17: 1089–1102

  6. 6

    Lamouroux A, Vigny A, Faucon Biguet N, Darmon MC, Franck R, Henry J-P et al. The primary structure of human dopamine-beta-hydroxylase: insights into the relationship between the soluble and the membrane-bound forms of the enzyme EMBO J 1987; 6: 3931–3937

  7. 7

    Ash RB . Information Theory Interscience: New York 1965

  8. 8

    Csiszár I, Körner J . Information Theory Academic Press: New York 1981

  9. 9

    Feller W . An Introduction to Probability Theory and Its Applications 3rd edn, Vol I: John Wiley and Sons: New York 1968

  10. 10

    van der Lubbe JCA . Information Theory Cambridge University Press: Cambridge 1997

Download references

Acknowledgements

The author wishes to thank S-M Yan MD, PhD at Department of Pathology, University of Udine, Italy for helpful discussion. The Electronic Engineer P Cossettini at the Centre for Advanced Research in Space Optics, Trieste, Italy is kindly acknowledged. Special thanks go to two anonymous Referees for their insightful comments and correcting the English in the previous version of manuscript.

Author information

Correspondence to G Wu.

Rights and permissions

Reprints and Permissions

About this article

Keywords

  • dopamine β-hydroxylase
  • Markov chain
  • schizophrenia

Further reading