Background: Metabolic Associated Fatty Liver Disease (MAFLD) is a major health burden worldwide. Over 95% of patients who progress to hepatocellular carcinoma (HCC) do not survive beyond five years after diagnosis. The management of this disease is complicated by the lack of noninvasive diagnostic tools that can detect the disease in its early stages. Moreover, only a few treatment options are available. We aimed to identify novel genes that can be targeted for the diagnosis and therapy of HCC. We hypothesised that; i) the analysis of genome-wide association studies (GWAS) data can reveal novel combinations of genetic variants and genes that are associated with HCC in patient cohorts, ii) the integrated analysis of transcriptomics data from tumor and surrounding non tumor tissue can reveal genes important in HCC, iii) the identified genes can be predictive factors for the development of HCC.
Methods: Transcriptomics data and genotype (GWAS summary statistics) from liver cancer studies were accessed from both public and in house databases. The data was pre-processed following standard procedures. Only studies with substantive meta data available was considered for the final analysis. Summary statistics from GWAS studies were used because we were unable to access GWAS data.
Two data analysis pipelines were developed. The first pipeline was for detecting gene-gene interactions from genome-wide association studies (GWAS) data and aimed to solve the challenges involved in the analysis of these type of data. To develop this pipeline, we used genomics data from patients with inflammatory bowel disease. This pipeline minimises filtering of single nucleotide polymorphisms (SNPs) in the final analysis, accounting for non-linearity in SNP data and for dependencies between SNPs within a gene, and yields the gene-gene interactions. The second pipeline was for the efficient application of genome-scale metabolic modeling (GEM) and here we used data from a Cyp51 knockout mice diet experiment that aimed to study development of MAFLD in mice. This pipeline aimed to remedy the challenges involved in extraction of GEMs, such as choosing the most appropriate model extraction method, and thresholds to yield the most relevant models. This pipeline was used for extracting personalised models. The developed pipeline was applied to integrate transcriptomics data in a human reference model, the Human-GEM, to yield personalised models which were used for in silico simulations. Using these models, we performed reactions and subsystems enrichment analysis and identified candidate genes from significantly perturbed reactions.
We did a meta-analysis of differential gene expression between tumor and surrounding non tumor tissue using data from the integrative molecular database of hepatocellular carcinoma (HCCDB). First, differential gene expression analysis for each data set was done by fitting linear models using LIMMA package in R software. This was followed by fitting random effects models for each gene to combine results from different analyses. Genes with p-value less than 0.05 and mean |log2FC| >= 1 were identified as significantly differentially expressed. Those genes were then used for KEGG pathway enrichment analysis and network analysis in STRING db. Candidate genes from hypothesis 1 and from the meta analysis in hypothesis 2 were combined to form the final list of candidate genes. Genes identified as potential candidate markers were validated in a human cohort using RT-qPCR.
Results and conclusions: With combination of multiple computational modelling and data mining approaches, followed by experimental validation in human samples, we identified the following genes as relevant in HCC; ACSL1, ACSL4, ACSM3, GABRP, HAO1, IYD, PIPOX, PROZ, RDH5, APOF, DCN, LPA, GCKR, E2F7, CIDEB, and OXT. Several biological processes like fatty acid metabolism, complement and coagulation cascade, chemical carcinogenesis and retinol metabolism were identified as key pathways in HCC. Integration of transcriptomics data into a reference human genome-scale metabolic model revealed that fatty acid activation, purine metabolism, vitamin D, and vitamin E metabolism are key processes in the development of HCC and therefore need to be explored further for the development of new therapies. Genes, LPA, DCN, APOF, PROZ, PIPOX, are expressed in blood and hence are good candidates for developing diagnostics. Our computational analysis and validation in a human cohort provide the first evidence that GABRP gene is important in HCC in humans, and together with IYD and RDH5 genes should be investigated further in human studies.
|