by Dimitri Perrin, Heather J. Ruskin, Martin Crane and Ray Walshe

The field of epigenetics looks at changes in the chromosomal structure that affect gene expression without altering DNA sequence. A large-scale modelling project to better understand these mechanisms is gaining momentum.

Early advances in genetics led to the all-genetic paradigm: phenotype (an organism's characteristics/behaviour) is determined by genotype (its genetic make-up). This was later amended and expressed by the well-known formula P = G + E, encompassing the notion that the visible characteristics of a living organism (the phenotype, P) is a combination of hereditary genetic factors (the genotype, G) and environmental factors (E). However, this method fails to explain why in diseases such as schizophrenia we still observe differences between identical twins. Furthermore, the identification of environmental factors (such as smoking and air quality for lung cancer) is relatively rare. The formula also fails to explain cell differentiation from a single fertilized cell.

In the wake of early work by Waddington, more recent results have emphasized that the expression of the genotype can be altered without any change in the DNA sequence. This phenomenon has been tagged as epigenetics. To form the chromosome, DNA strands roll over nucleosomes, which are a cluster of nine proteins (histones), as detailed in Figure 1. Epigenetic mechanisms involve inherited alterations in these two structures, eg through attachment of a functional group to the amino acids (methyl, acetyl and phosphate). These 'stable alterations' arise during development and cell proliferation and persist through cell division. While information within the genetic material is not changed, instructions for its assembly and interpretation may be. Modelling this new paradigm, P = G + E + EpiG, is the object of our study.

Figure 1: A nucleosome, the fundamental subunit of the chromosome (adapted from C. Brenner, PhD thesis, Université Libre de Bruxelles, 2005).
Figure 1: A nucleosome, the fundamental subunit of the chromosome (adapted from C. Brenner, PhD thesis, Université Libre de Bruxelles, 2005).

To our knowledge, no previous efforts have sought to model directly the mechanisms that affect epigenetic changes. Biological research on epigenetic phenomena is ongoing, but while some very promising articles are being published, most still contain only qualitative descriptions of epigenetic changes. This is not ideal when trying to develop computer-based models, but it is also not unusual. Over a decade ago the basics of HIV infection were understood, but quantitative data were sparse. Yet as early as 1992, differential equation models were proposed, while cell-mediated micro-models date from the 1990s. As more data have become available, these models have improved in sophistication, incorporating features such as shape-space formalism and massively multi-agent, parallel systems.

As a first step, we propose a microscopic model for chromatin structures. From the current biological results, it clearly appears that each unit (eg histone, DNA strand or amino acid) has a distinct role in epigenetic changes, and this role can alter depending on the type or location of the unit (eg which particular amino acid, what part of the DNA strand etc). For efficiency, this is best modelled using an object-oriented approach and a C++ implementation. The main objective of this early model is to provide a description and hierarchy for epigenetic changes at the cell level, as well as an investigation into the dynamics and time scales of the changes. These results will then be used to 'feed into' other models. Already in development are approaches such as agent-based modelling of cell differentiation and complex recurrent networks of cancer initiation by epigenetic changes.

Another early model uses Probabilistic Bayesian Networks. These represent a set of variables and their probabilistic dependencies and are constructed as directed acyclic graphs, for which nodes represent variables and arcs encode conditional dependencies between the variables. The variables can be of any type, ie a measured parameter, a latent variable or even a hypothesis. These networks can be used for inference, parameter estimation and refinement, and structure learning. This approach has been successfully used in medicine (eg breast cancer diagnosis) and biology (eg protein structure prediction), and epigenetic mechanisms appear amenable to such techniques.

Though still in its infancy, the project is gaining momentum and early work on the different approaches looks very promising. Active involvement from biologists and medical researchers is currently being sought in order to secure access to data and guarantee model realism (as highlighted by a presentation at the International Agency for Research on Cancer in early December 2007). Previous modelling experience from the group promises sensible integration of the various approaches and efficient implementations. Several publications and presentations are expected in the coming year, all of which will appear on the group's Web site (link below).

Links:
http://www.computing.dcu.ie/~dperrin/
http://www.computing.dcu.ie/~msc/publications.shtml

Please contact:
Dimitri Perrin
School of Computing, Dublin City University, Ireland
Tel: +353 1 700 8449
E-mail: dimitri.perrin@computing.dcu.ie

Next issue: January 2025
Special theme:
Large-Scale Data Analytics
Call for the next issue
Get the latest issue to your desktop
RSS Feed