DNA METHYLATION AND CANCER

Addition of a methyl group onto the 5 carbon in cytosine of the CpG dinucleotides is the only found significant covalent modification in DNA from mammalian cells. This has been postulated as an important homeostatic mechanism since about four decades ago when the mosaic type of organization of the eukaryotic genomes was realized., that the CG-rich segments are separated by the AT rich counterparts (for recent reviews see 1, 2). The recent surge of the interests in epigenetics as a whole and in DNA methylation in particular, can be attributed to the urgent need for function dissection and annotation of DNA sequences after the completion of human and various model organism genome projects. It has been amply demonstrated that the tissue-specific gene expression pattern during development is exclusively controlled by the inherited mechanisms without any involvement of DNA sequence change (Epigenetics) in somatic cells, except for those from immune system. The necessity to maintain DNA methylation profile in both a time/space specific and ordered manner has been confirmed from genetic studies with the knock-out mice for each of three DNA methyltransferase genes (DNMT1, DNMT3a and DNMT3b), respectively 3, 4.

Although genetic defects in genes that are required for establishment and maintenance of the DNA methylation profiles have not been directly linked to cancer formation, it is been generally accepted that cancers do suffer from the wide-spread aberrations in DNA methylation that have the profound etiological implications. In another word, cancer is also an epigenetic disease. Some genetic defects in cancer cells has been directly attributed to the hypermethylation that lead to the expressional loss of the DNA repair genes such as the hMLH 1 and MGMT (O6 methylguanine DNA methyltransferase) genes 5. The protein encoded by the MGMT gene is responsible for removal of the fortuitously added akyl group on the guanine (G) base of DNA to prevent the G to A mutation. In the recto-colon cancer cells where the hypermethylated promoter CpG island was found in parallel with the transcription silencing of the MGMT gene, there were prevalent G to A type mutations found in both the ras proto-oncogene and p53 tumor suppressor gene 6, 7. The biochemical inclination of the methylated C, but not the unmethylated, to T conversion has been suggested as a key mechanism both for the CpG depletion in the genome of high eukaryotes through evolution and the C to T mutations in the tumor suppressor gene p53 in human cancer (http://www-p53.iarc.fr/index.html). Furthermore, the genetically manipulated mice with a reduced level of the DNMT1 protein suffer from global demethylation in the genome, the increase in point mutation and in tumor formation 8,9,10. The etiologically significant events tend to be gathered at the early phase of carcinogenesis. Indeed, hypermethylation of the promoter CpG island of the p16INK4a tumor suppressor gene had been detected in the sputum DNA of lung cancer patients, as early as 35 month before diagnosis 11. In mutated alleles, there is no hypermethylation occurred on the promoter CpG island nor transcriptional gene silencing 11, suggesting that defects in DNA methylation is independent of the genetic flaws although a frequent cross-talking between them take place. It has been well established that the epigenetic makeup is much more amenable than the genetic counterparts in cell to the environment influences, including nutrients 12, strengthening the notion that in carcinogenesis the epigenetic disturbance probably prelude the genetic defects.

The major epigenetic reprogramming concerning DNA methylation in high eukaryotes occurs at two stages: the maturation of the germ cells and the early embryonic development 13, 14. Then, DNA methylation pattern in somatic cells gradually evolve during cell differentiation and aging process. During mitosis and meiosis, DNA methylation pattern is reliably passed to the next generation by a mechanism similar to the semi-conservative replication of DNA. The aging process of high organisms is characteristic with a decrease in the overall level of DNA methylation and an increased in methylation of the promoter CpG region 13, 15. The stochastic events also take place to confer the individuality of the DNA methylation pattern in somatic cells of the same tissue origin, which may contribute to the wide variation in the embryonic development of the cloned fertilized eggs with somatic nuclei 16. However, to the malignant state of cells, further drastic changes in DNA methylation have to take place. A dramatic reduction of the overall level of DNA methylation down to 25% to 33% of the normal was commonly shown in cancer cells. That causes the increase in the transcription/transposition activity of the normally methylated/ transcriptionally silenced repetitive sequences, which compose of up to 40% of the genome 17, which in turn bring about the genome stability at both the chromosomal and primary sequence levels, an important hallmark of the cancerous state. The local demethylation of the promoter CpG island has been linked to activation of the otherwise transcriptionally silenced genes including proto-oncogenes 17. The local hypermethylation in the promoter CpG island has been repeatedly reported as an equally important alternative to the mutation/deletion for the inactivation of the tumor suppressor genes 18. Not all the changes in methylation pattern contribute to tumorigenesis, but rather readout for the dysfunctional epigenetic homeostatic state in cancer cells. However, all the consistent changes should be valuable for tumor staging and classification in clinic. It is, therefore, desirable to profile all the consistent changes in DNA methylation in any given type of cancer, followed by the demonstration of their pathological implication.

During the last decade, enormous amounts of information at the genetic, biochemical and molecular biological levels have been accumulated on cancer formation. The cure of cancer has however not been benefited as both the survival rate and life quality of cancer patients in the Western world have hardly been improved 19. There is not much progress in staging and classification for solid tumors either, which rely almost exclusively on the observations from clinical, pathological, biochemical, serological and imaging analyses 20. This unsatisfactory state has been largely attributed to the inherent complexity of the problem that is linked to the huge heterogeneity at all the aspects in cancer mass, including the adaptability to the changing environment. There are other reasons. The gene coding for P53 protein is the most frequently mutated gene in cancer and is required for cell normality 21. There are 19,806 somatic mutations, 264 germline mutations and functional data on 423 mutant proteins of the TP53 currently available in the designated database (http://www-p53.iarc.fr/index.html). It is thus expected long before the completion of the human genome project that sampling p53 mutations should be valuable to the cancer clinical practice. The reality is however not so encouraging. Why? The human p53 gene consists of 11 exons and 19179 bp DNA sequence in length (www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Display&DB=gene). Even without taking into account the phenotypic important mutations in the upstream and downstream flanking sequences that are crucial to its transcription, it is unrealistic to comprehensively profile all possible single nucleotide change within the approximately 20 kb gene sequence in cancer cells even with the most powerful platform technologies that is currently available. This forms the so called “multiple targets in a single gene” problem coherent with the genetic biomarkers. To establish the expression profiles at either the mRNA or the protein levels, there are other difficulties. The gene expression is frequently affected by the tumor irrelevant known (the biological rhythm related, etc.) and unknown factors. The cellular heterogeneity in clinical materials presents a formidable challenge too. Therefore, an exceptional standard of cell purity is demanded for establishment of the genetic and expression profiles in the clinic cancer samples.

On the contrary, many advantages have been identified with the DNA methylation profiling in cancers approached by the methylation-specific PCR (MSP) method. It is possible to detect one tumor cell among as many as 104 of normal cells, providing the tumor cells assume a homologously opposite pattern of methylation from their normal counterparts. Hence, the undesirable presence of the non-tumorous cells in the clinical sample can be greatly tolerated. A correlation between the hypermethylated status of the promoter CpG island and the transcriptional inert status of the tumor suppressor gene has been demonstrated in the majority of cases, suggesting that the methylation status of the promoter CpG island, is a valuable for monitoring the expression state of the target gene (the single target of the signal gene). Furthermore, the DNA methylation pattern is rather stable biochemically as well as biologically, and does not change by the non-tumorous actors that profoundly fluctuate the levels of mRNA or protein in cells. Hence, using the DNA methylation pattern as the sensitive and reliable indicator for cancer status in the clinical setting is both theoretically sound and practically more feasible (Tab. 1).

Table 1 The characteristics of the molecular biomarkers.

To realize its great potential, a comprehensive methylation profile with more targets and large patient cohort for any given type of human tumor is needed. Unfortunately, few, if any, methylation profile is qualified. Our efforts remains at the very early phase of discovery even with the most studied type of tumor, the recto-colon cancer and lung cancer 22, 23. The primary hepatocellular carcinoma (HCC) that preferentially affects people in China demands more actions of Chinese biomedical scientists.

OUR UNDERSTANDING OF THE ROLE AND POTENTIAL OF THE DNA METHYLATION MEDIATED ETIOLOGICAL MECHANISMS IN LIVER CANCER

HCC is one of the most aggressive malignancies (http://www-depdb.iarc.fr/globocan/GLOBOframe.htm), (http://www-dep.iarc.fr/dataava/infodata.htm) in human. Although it is number five in occurrence, it ranks the fourth in mortality worldwide. It also ranks the first in tumor-caused death in mainland of China 24. The difficulties in early diagnosis and clinical treatment such as its inherent as well as adaptive resistance to the common chemotherapeutic drugs makes it a devastating health threat to the people in China and in many countries of far east Asia and Africa. Three year ago, we started to analyze DNA methylation profile in liver cancers and three other types of cancers, the non-small cell lung cancer (NSCLC), the malignant glioma and the recto-colon cancer ( 25,26,27,28,29,30,31,32,33 and our unpublished observation by Yu J et al). Transcription regulation of a number of genes involved in liver cancer formation has also been studied 33,34,35,36,37.

Identification of the critical CpG methylation for transcription silencing of the MAGEA1 gene 37

A typical CpG island consists of 500-1000 bp in length with up to several dozens of CpG dinucleotides. Not all of the CpG dinucleotides would be critical in methylation mediated control of transcription. The CpG within the critical cis-elements for their cognate transcription factors and is more crucial for gene transcription than other CpG. The MAGEA1 gene was hyperrmethylated in the normal liver tissues, but demethylated 26 and presented as a serum protein in over 75% of liver cancer patients, thus providing us a good model system for identification of the critical CpG for gene transcription regulation. Among 19 CpG dinucleotides within the promoter, the only -30 CpG has been found important for methylation mediated transcription control. Supportive observations are the follows: 1) this CpG is among five CpG dinucleotides having been methylated in a cell line where this gene is silenced in transcription; 2) the in vitro methylation by MSss I methyltransferase on this CpG would down-regulate the promoter activity up to 66%; 3) the methylation on this CpG diminishes a novel DNA-protein interaction. The discovery of the proteins involved should provide us more insights into the DNA methylation mediated control of MAGEA1 gene transcription.

Identification of the novel tumor associated genes with altered methylation pattern in liver cancer (our unpublished observations by He Y et al)

Most of the tumor associated genes were identified by genetic (mutation, deletion and translocation) or/and biochemical methods 38 until a few years ago when Jones and his colleagues used the “methylation sensitive arbitrary primed PCR” 39 approach to identify several tumor associated genes characteristic with the altered DNA methylation pattern. Using this method, we have identified 22 regions, from more than two thousand target bands, showing altered methylation patterns in a live cancer cell line from the normal liver tissues. Among them, two liver cancer associated candidate genes were identified, which were hypermethylated in over 75% of both cancer tissues and the established tumor cell lines of liver origin, while unmethylated in the normal liver tissues. Cell lines expressing either of these two genes with tet-off regulatable promoters have been established in our lab. The ongoing efforts are to functional annotating these two tumor associated candidate genes in cell culture as well as tumor animal models.

Methylation profiling in liver cancer and other tumors by MSP

MSP is targeted to the drastic differences of the methylated verse unmethylated cytosine in response to the deamination under the bisulphate treatment 40. As a result, the unmethylated C is converted to T, but the methylated C remains unchanged. However, the procedures including primer design, bisulphate treatment and the PCR reaction remain empirical. We have methylated over one hundred genes in liver cancer in the last three years and achieved a success rate of over 4/5 of the total attempts. Both false negative or positive PCR reactions account for the failure. Using the PCR for the in vitro methylated targets by MSss I as a positive control, we could conclude that the promoter CpG island of the CDH1, p16INK4a, PTEN and RASSF1C genes was unmethylated in the normal healthy liver tissue (Fig. 1A). Before proceeding for large-scale methylation-profiling on clinical samples, PCR products for each new target were sequenced to confirm their identity (the CDH1 and p16INK4a genes, Fig. 1B).

Figure 1
figure 1figure 1

The piloting experiments for the methylation specific PCR reactions. (A) DNAs from the normal liver tissue were methylated (+) with MSss I methyl transferase in vitro, followed by MSP analysis with each pair of primers specific to the methylated and unmethylated allele. −, the untreated DNA; U, with the primer for the unmethylated, and M, the methylated targets, respectively. (B) the sequence verification of the methylated and unmethylated allele of CDH1 and p16INK4a genes.

With this well verified MSP procedure, we have methylation-profiled 92 targets (the list of the targets will be provided under request) in samples from liver cancer patients (the size of the patient cohort is 26 to 30) 26,27, 30, 31 and our unpublished observations (He Y et al). The targets were selected for its clinical implication to cancer formation, among which over 2/3 were the first time investigated. To eliminate any non-tumorous changes, the liver tissues from four healthy donors were collected as the normal control. Among 92 targets, 32 exhibited changes in methylation pattern at various frequencies: 7 targets (MAGEA1, ASPH, OXCT, MTHFD2, SRP72, ENO3, and MDFI) had the reduced level of methylation and 25 (RASSF1A, GSPT1, SALL3, OCT6, CFTR, AR, p73, cyclin a1, MYOD1, p16INK4a, ABO, DBCCR1, ITGA9, IRF7, LRP6, PENK, WT1, CDH13, DKC1, CSPG2, GALR2, p57KIP2, MT1A, HIC1 and CAT) had the increased level of methylation (Fig. 2). There is no correlation observed between the methylation changes and the known clinical and pathological parameters such as the tumor staging and classification, age, and HBV infection. Further verification is underway with a large patient cohort.

Figure 2
figure 2

The altered methylation pattern of the promoter CpG island of the genes in liver cancer. (A) the detail profile of altered methylation pattern in the clinical samples. (B) the graphic presentation of the data in (A). C, cancer tissues and N, the neighboring non-cancerous tissues), in comparison with the pattern in the normal liver tissues (C). The empty box: homologously unmethylated. The filled box: homologously methylated and the grey box: heterologously methylated.

Tumor is a systems disease. The pathologically defined neighbouring non-cancerous tissues are likely experiencing the early stage changes of the carcinogenesis (Fig. 3A). We, therefore, defined the targets that show significant difference in frequencies between the tumor(C) and the neighbouring tissues(N) as the late phase and the otherwise as the early phase changed gene in DNA methylation, by taking the normal healthy liver tissue (M) as the reference (Fig. 3B). The following genes fall into the category of the early phase specific, CSPG2, OXCT, cyclin a1, RASSF1A, ABO, WT1, GALR2, p57KIP2, MAGEA1, MT1A, CDH13, MYOD1, DKC1, and HIC1; while the late phase specific genes are DBCCR1, PENK, IRF7, GSPT1, p73, OCT6, p16INK4a, SALL3, and AR.

Figure 3
figure 3

The phase specific alteration in methylation of the promoter CpG island of the genes in liver cancer. (A) a schematic presentation of the concepts of the phase-specific methylation during carcinogenesis of live cancer. (B) the early phase genes display similar frequency of changes in both tissues, while the late phase genes change at a significantly higher rate in cancer than the neighbouring non-cancerous tissues. Both χ2 and p-values for each gene have been calculated and shown in the tables. The genes in italic and bold are decreased in methylation in cancer. C, cancer tissues; N, the neighbouring non-cancerous tissues, and M, the normal liver tissues.

The availability of the methylation profile of as many as 32 genes in HCC has made possible to detect concordant behaviour of the targets by a mathematical method “Discovery association rule” 41. This information should provide a valuable guidance for target selection in assays of testing tumorous state with the DNA from patients' body fluids (blood, saliva, ascites and etc.) and of discharges (stool, sputum and cell disposal in urine). The genes: RASSF1A, GSTP1 and SALL3 or OCT6 are the components of the most informative three target set for HCC. The detection rate for each single target was 100% and 84.6% for all the three targets in HCC cases (Tab. 2).

Table 2 The concordant behaviour of the methylation in liver cancer

The altered methylation pattern in HCC has been compared with two other common solid cancers in China: the non-small cell lung cancer (NSCLC) and malignant glioma (Fig. 4). A couple of targets showed similar frequency changes in all three types of cancer. The p73 gene was hypermethylated in 67.85 % of HCC, 47.17 % of malignant glioma and 68.57 % of NSCLC. There were the targets exhibiting distinct tumor specific methylation patterns. For instance, the p16INK4a gene was significantly methylated in both HCC (53.85%) and NSCLC (42.8%), but only marginally methylated in malignant glioma (1.89%). The CDH13 gene was hypermethylated in 21.43% of liver cancer, 5.66% of malignant glioma and 71.42% of NSCLC patients. In the case of the CDH1 gene, it is hypermethylated at a significantly higher rate in NSCLC (22.86%) and malignant glioma (32.08%) than HCC (0%). It is clear that alteration in DNA methylation is specific to tumor type, suggesting its potential for differential diagnosis of tumor when the circulating DNA is used for testing. On the contrary, mutations of the tumor associated genes have never been shown as tumor type specific. Therefore, detection of mutations in the ras proto-oncogene and p53 tumor suppressor genes in circulating DNA would not be able to determine tumor origin.

Figure 4
figure 4figure 4

The tumour type specificity of the altered DNA methylation pattern.

Although we have established the DNA methylation pattern concerning at least over 30 targets in four common types in China, realization of the great potential of using DNA methylation pattern for cancer diagnosis and prognosis demand enormous works of both more extensive profiling and the mechanistic delineation.

THE FUTURE PERSPECTIVES

We have witnessed a recent surge of interests in DNA methylation in the biomedical field. An international consortium (http://www.epigenome.org) was set up in 1998 to compile the methylation profile at the sequence level of the promoter region of all the human genes of seven major tissues 42, 43. A significant progress has been made 44. It is generally believed that the information as such would improve our understanding of the DNA methylation mediated regulation of gene transcription during cell differentiation and serve as a better guidance for the personalized medicine than the genetic polymorphism. An European Consortium was also formed in 2004 for a epigenome project (EPIGENETIC PLASTICITY OF THE GENOME) to cover following eight non-DNA methylation areas in epigenetics: 1. chromatin modification; 2. nucleosome dynamics; 3. non-coding RNA and gene silencing; 4. Xi and imprinting; 5. transcriptional memory; 6. assembly and nuclear organization; 7. cell fate and disease and, 8. epigenomic maps (http://www.epigenome-noe.net/). It could become revolutionary hallmarks when these two big projects are completed.

We are initiating an effort for the large scale methylation-profiling of genes in HCC with an ultimate goal of using it for cancer staging and classification. Certainly, this project needs the involvement of scientists from laboratory benches as well as the clinician from patient's bedside. Contributions from bioinformatics and the development of the high through-put technology are also under demand.

MOLECULAR STAGING AND CLASSIFICATION OF LIVER CANCER BASED UPON THE ALTERED PATTERN OF DNA METHYLATION: WHAT WE HAVE LEARNT AND PLANED TO DO

There are approximately 29000 CpG island and 25000 genes in human genome 45. DNA methylation mediated control of transcription take place in about 40-50 % of tissue-specific genes and the majority of the house-keeping genes that possess the promoter CpG island 46. No less than 1358 genes have been implicated in carcinogenesis of human cancers (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene). Application of altered pattern of DNA methylation profile to clinical use in order to monitor cancer development and treatment requires an analysis of more targets. Ultimately, the capability should be obtained to profile in clinical samples, the entire array of promoter CpG islands (approximately 10000) and CpG islands (approximately 29000) in the whole genome.

We have selected 1358 tumor related genes, through searching the NCBI database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) by using the following key words: apoptosis, drug resistance, oncogene, tumor suppressor, DNA repairs, genetic imprinting and mitosis, respectively. By using the CpG island identification software (http://www.uscnorris.com/cpgislands/cpg.cgi) 47, we have found that over 70% of genes in this list are the promoter CpG island containing genes. The evolutionally conserved region between the human and mouse within the promoter CpG island (http://www.ncbi.nlm.nih.gov/genome/seq/MmBlast.html) was chosen as the target of the MSP analysis, where the primer pairs for methylation-profiling were then designed (http://micro-gen.ouhsc.edu/cgi-bin/primer3_www.cgi). There are total 675 genes having been selected for the forthcoming methylation profiling for liver cancer samples.

It is necessary to analyze gene expression for those showing HCC specific altered DNA methylation pattern in the cancer tissue. Cellular heterogeneity in cancer tissues limits the use of biochemical methods taking RNA or protein as the targets. It is possible to use cell culture system involved with the treatment by a general DNA methyltransferase inhibitor, 5-aza-deoxylcytidine, but unpractical to analyze more than a dozen genes because of the limited capability. The immunochemical approach on the tissue array, hence, is the method of choice to correlate the methylation with the expression state of the target genes. Quality clinical information is extremely important, too. In addition to make a large collection of both HCC and the paired neighbouring non-cancerous tissues, samples representing the precancerous stage of HCC, such as cirrhotic livers are also collected. The detailed clinical profile, including the information on the post-surgery chemotherapeutic regimes and survival should be also provided by the clinical team.

The MSP method has many advantages, but is capability is rather small due to its labouring manual procedure. It is impossible to use MSP method to obtain decent profiles for more than 500 genes in a patient cohort of more than 300 patients. For the ultimate goal to obtain a complete methylation profile on the promoter CpG island (approximately over 10000) as well as the CpG island from the whole genome (approximately 29000), an alternative method should be attempted. Up till now, the most promising method is based on affinity chromatography to extract methylated CpG rich DNA fraction by using the methylation binding domain of the MeCP2 protein 48, 49. The DNA fraction enriched in methylated CpG from the normal and malignant cells will be individually labelled with different fluorescence. To avoid the uncertainty caused by the cellular heterogeneity in cancer samples, the initial array analyses will be executed with DNA from the normal liver tissues and from established liver cancer cell lines. The informative targets will be verified for MSP analysis in clinical samples.

The questions raised are whether we could and when could we make a difference in the survival and the life quality of cancer patients. It will certainly be a long journey.