Computational Bioinformatics & Bio-imaging Laboratory (CBIL)


Internal Use

Machine Learning to Identify Complex Interactions in Genome-Wide Association Data (NIH/NHLBI R01HL090567)

(PI: David Herrington, Wake Forest University; Co-PI: Yue Wang, Virginia Tech; Co-PI: David Miller, Penn State University)

The focus of this application is the development and validation of new computational approaches to identify complex interactions among genetic and environmental factors (features) which could be used to help identify individuals at high risk for a specific disease or dysfunction, and provide novel insights into the pathophysiology of the conditions in question.

Specific Aims of the application include: 1 )To adapt a variety of statistical machine learning methods to the analysis of simulated high density genome scan and environmental exposure data and to evaluate their ability to identify SNPs and environmental factors that are jointly predictive of a binary trait; 2)To apply the described feature selection and model building techniques to the genome-wide SNP genotype data collected from two NHLBI-funded genome-wide association studies: a) the SNPs and Atherosclerosis (SEA) study predicting premature atherosclerosis, and b) the Cholesterol and Pharmacogenetics of Statins (CAPS) Study predicting LDL cholesterol; 3) to develop a study-specific publicly accessible web-site designed to help disseminate the methods and results of the project and 4) to support the NIH-wide Genes and Environment Initiative (GEI).

This proposal represents a unique collaboration focusing on the development of new methods to more effectively identify interacting genetic and environmental factors that account for variation in risk for common cardiovascular and other disease phenotypes. If the risk is determined, in part by a gene-environment interaction, the preventive intervention could include altering the environmental exposure. Furthermore, determining specific genetic and/or environmental factors that jointly influence risk may reveal new biologic pathways that would be appropriate targets for novel therapeutic interventions. Together, improved risk stratification and new pathophysiologic insights would be expected to reduce the burden of disease and accelerate the realization of true personalized medicine.

Relevance of this research to public health: This project aims to develop new approaches to identify the relationship between genetic and environmental factors which could then be used to identify people at high risk for a disease. Determining specific genetic and/or environmental factors that influence a person's risk of disease may help doctors reduce risk for disease and reveal new treatments for disease.

Learning Maximum Entropy Probability Models for Characterizing Multilocus Genomic Interactions

Supplementary Information: free download MECPM-SNP software and test datasets MECPM-SNP Package

SNP Simulation

Supplementary Information on SNP Simulation: SNP Simulation

Existing Methods Software

Reimplemented software of some existing methods: Interacting SNP Detection


  • G. Yu, D. Herrington, C.D. Langefeld, and Y. Wang, "Detection of complex interactions of multi-locus SNPs," Proc. IEEE Machine Learning for Signal Processing, Cancun, Mexico, pp. 85-90, 2008.

  • DJ Miller, Y Zhang, G Yu, Y Liu, L Chen, CD Langefeld, D Herrington, and Y Wang, "An Algorithm for Learning Maximum Entropy Probability Models of Disease Risk That Efficiently Searches and Sparingly Encodes Multilocus Genomic Interactions," Bioinformatics, 25(19):2478-2485, 2009.

  • L Chen, G Yu, DJ Miller, L Song, CD Langefeld, D Herrington, Y Liu, and Y Wang , "A Ground Truth Based Comparative Study on Detecting Epistatic SNPs," Proc. IEEE International Conference on Bioinformatics & Biomedicine , November, 2009.

  • X Yuan, J Zhang, and Y. Wang, "Probability theory-based SNP association study method for identifying susceptibility loci and genetic disease models in human case-control data," IEEE Trans on NanoBioscience, 2010. in press

  • X Yuan, J Zhang, and Y. Wang, "Mutual information and linkage disequilibrium based SNP association study by grouping case-control," Genes & Genomics, 2010. (accepted)

  • X Yuan, J Zhang, and Y. Wang, "Simulating linkage disequilibrium structures in a human population for SNP association studies," Biochemical Genetics, 2010. (accepted)




    Copyright ©2004, Computational Bioinformatics and Bioimaging Laboratory (CBIL), Advanced Research Institute, Virginia Tech.

    Last Updated: 03/03/2009. Suggestions/Comments - Webmaster