Dr J J Hensman
University of Sheffield
Bayesian models of expression in the transcriptome for clinical RNA-seq
Motor neurone diseases
Background — RNA-Seq technology is enabling investigation of gene expression at the transcript level, including the identification of alternatively spliced isoforms. In Motor Neurone Disease, alternative splicing has been strongly implicated as a pathogenic mechanism. RNA-Seq for clinical data and MND in particular requires the development of new statistical methodologies to tackle challenges specific to such data. Bayesian statistical methods for RNA-seq are desirable to deal with the uncertainty in quantifying transcript expression, but existing approaches are prohibitively slow for big data. Aims & Objectives — 1) To develop practical algorithms for transcript quantification from RNA-Seq in the Bayesian statistical framework. 2) To build statistical models *around* the transcript quantification problem, addressing problems specific to clinical data. 3) To use the developed algorithms to investigate the effects of splicing in Motor Neurone Disease Methodology — The Bayesian statistical framework will be the cornerstone of the project. Whilst Bayesian methods are often computationally demanding, I shall make use of approximate posterior inference. I’ll build on recent work in this area to make fast algorithms for the analysis of RNA-Seq data. I’ll collaborate closely with clinical and wet-lab staff in the SITraN neuroscience facility, giving my work immediate impact on research into MND. Scientific opportunities — The quantification of transcripts in RNA-Seq bears a close resemblance to Latent Dirichlet Allocation (LDA), a statistical model used for the analysis of text corpora. Investigation of this link will enable the transfer of knowledge from this field to enable statistical advances for processing RNA-Seq.