Supplementary Information on SNP Simulation http://www.cbil.ece.vt.edu/software/SNP Simulation.pdf
VIsual and Statistical Data Analyzer(VISDA)
Multivariate visualization has proven to be a powerful yet critical tool for the analysis and interpretation of complex data. To reveal all of the interesting patterns within a data set, we have developed a VIsual and Statistical Data Analyzer (VISDA) for cluster modeling, discovery, and visualization. The model-supported exploration of high-dimensional data space is achieved through two complementary schemes: dimensionality reduction by discriminatory component analysis and cluster formation by soft data clustering, whose parameters are estimated using the weighted Fisher criterion and expectation-maximization algorithm. VISDA uses an adaptive boosting of discriminatory subspaces involving hierarchical mixture modeling of the data set. The hierarchical mixture model, selected optimally by the minimum description length criterion, allows the complete data set to be visualized at the top level and so partitions data set, with clusters and subclusters of data points visualized at deeper levels. Each subspace model is linear while the complete hierarchy maintains overall nonlinearity.
The main application of VISDA is for multivariate cluster modeling, discovery, and visualization, particularly for data sets living in high dimensional space. Many real-world problems, when formulated, are to explore the hidden structure of the data in one way or another. The applications can be found in biomedicine, bio-defense, intelligence analysis, market analysis, etc. For example, define new cancer subtypes based on their gene expression patterns, or discover the correlation between biological agents and environmental changes.
VISDA is capable of navigating into a high dimensional data set to discover the hidden clustered data structure, and model and visualize the discovery. It is particularly effective when dealing with highly complex data sets as compared to existing methods. To reveal all of the hidden clusters, our exploration of high-dimensional data space is both statistically-principled and visually-insightful. Our method can incorporate both the power of statistical methods and the human gift for pattern recognition, and is capable of capturing progressively all interesting aspects of the data set. To the best of our knowledge, it represents state-of-the-art in visual statistical data analysis and exploration. VISDA incorporates most advanced theory, method, and algorithm in statistical learning. It also works for both unsupervised and supervised scenarios. VISDA has recently been adopted as one of the core data analysis components by the National Cancer Institute (NCI) of the National Institutes of Health (NIH) via its new initiative, namely, cancer biomedical informatics grid (caBIG). (http://caBIG.nci.nih.gov)
PICA-ISG-THC Software (supported by the NIH under Grants EB000830, CA109872) http://www.cbil.ece.vt.edu/software/PICA_Demo.zip,
http://www.cbil.ece.vt.edu/software/PICA_Demo_Readme.doc,
http://www.cbil.ece.vt.edu/software/A Tutorial on ISG-PICA.doc
ISN-INR-CPN Software (supported by the NIH under Grants EB000830, CA109872) http://www.cbil.ece.vt.edu/software/CPN-Tutorial.doc,
http://www.cbil.ece.vt.edu/software/dchipCPN.exe
Supplement Information on oMLP (Bioinformatics-2005-1602) http://www.cbil.ece.vt.edu/software/Appendices_oMLP_Bioinformatics.pdf,
http://www.cbil.ece.vt.edu/software/Appendix A1_oMLP_Bioinformatics.pdf
Copyright
©2004, Computational Bioinformatics and Bioimaging Laboratory
(CBIL), Alexandria Research Institute, Virginia Tech. Jointly
with The Catholic University of America.
Last
Updated: 03/22/2004. Suggestions/Comments
- Webmaster