Bayesian Discovery of Regression Structures: a tool kit for genetic epidemiology and integrative genomics analyses

Sylvia Richardson

0 Collaborator(s)

Funding source

Medical Research Council (MRC)

Finding important health determinants through regression analyses is a fundamental approach in all the health sciences. In this project, we focus on two domains, genetic epidemiology and integrative genomics, where advances are made by taking full advantage of new high-throughput technologies, leading to the collection of a vast set of explanatory variables. In these domains, desirable statistical outputs are reproducible regression models that select only a few relevant predictors (i.e. risk factors, SNPs, transcripts) amongst a very large set of possible candidates, together with good assessment of how uncertain their role is. Our approach is to build upon the unifying Bayesian hierarchical modelling paradigm to construct parsimonious regression models that can translate the underlying biology and facilitate the interpretation of results. The team of investigators have recently completed the development of a sophisticated algorithm, the Evolutionary Stochastic Search algorithm, which efficiently implements a Bayesian variable selection procedure for linear regression models in spaces containing thousands of predictors. The project's aim is to capitalise on this foundation work and substantially expand it to build a powerful and versatile tool kit of regression models applicable to a wide range of "cross-omics" analyses, i.e. analyses that involve two or more different types of "omics" data, each of large dimensions. Such cross-analyses will become a major focus of research in functional genomics in the years to come, in parallel with the advent of new biotechnologies. We will develop a set of models aimed at Bayesian variable selection (i) in the presence of interactions; (ii) with multiple responses; (iii) including biologically structured prior knowledge. The scope of the algorithms will be considerably expanded by integrating new parallel computing techniques and novel software architecture (CUDA, Compute Unified Device Architecture) that enormously reduce computing time. We will use the methods to discover new associations and structures in three challenging case studies concerning: (a) the genetic regulation of lipid mechanisms in a large Finnish cohort; (b) multifactorial pathways in breast cancer; and (c) the genetic influence on brain activation of psychotic patients. These case studies are embedded in large collaborative projects coordinated by the epidemiology investigators and have been chosen to highlight different facets of the tool kit modules. The computer programmes implemented in the tool kit will be open source and made publicly available. Dissemination plans will benefit from the extensive network of collaborators in the case studies and also include two purposely designed workshops.

Clinical, Basic, Translational and Kinetic Studies of Drug Action and Resistance

Bayesian Discovery of Regression Structures: a tool kit for genetic epidemiology and integrative genomics analyses

Sylvia Richardson

0 Collaborator(s)

Funding source

Related projects

Maltby Vicki

Characterization of Epigenetic Profiles in Patients with Multiple Sclerosis

Kathryn M Rexrode

The Effects of Vitamin D on Mammographic Density and Breast Tissue

Manuela A Orjuela

Unmetabolized Folic Acid and Retinoblastoma

Joann G Elmore

Digital Pathology Accuracy Viewing Behavior and Image Characterization

Jonine L. Bernstein

MRI Background Parenchymal Enhancement as a Risk Factor for Breast Cancer

David E Goldgar

A Comprehensive Approach to Breast Cancer Susceptibility Across the Risk Spectrum

Antonio Fojo

Clinical, Basic, Translational and Kinetic Studies of Drug Action and Resistance

David Rueda

Single Molecule Imaging

Naoto Ueno

Understanding EGFR's Role in TNBC by an Animal Model