Computational Bioinformatics and Bioimaging Laboratory


caBIG VISDA - Introduction

Multivariate visualization has proven to be a powerful yet critical tool for the analysis and interpretation of complex data. To reveal all of the interesting patterns within a data set, we have developed a VIsual and Statistical Data Analyzer (VISDA) for cluster modeling, discovery, and visualization. The model-supported exploration of high-dimensional data space is achieved through two complementary schemes: dimensionality reduction by discriminatory component analysis and cluster formation by soft data clustering, whose parameters are estimated using the weighted Fisher criterion and expectation-maximization algorithm. VISDA uses an adaptive boosting of discriminatory subspaces involving hierarchical mixture modeling of the data set. The hierarchical mixture model, selected optimally by the minimum description length criterion, allows the complete data set to be visualized at the top level and so partitions data set, with clusters and subclusters of data points visualized at deeper levels. Each subspace model is linear while the complete hierarchy maintains overall nonlinearity.

The main application of VISDA is for multivariate cluster modeling, discovery, and visualization, particularly for data sets living in high dimensional space. Many real-world problems, when formulated, are to explore the hidden structure of the data in one way or another. The applications can be found in biomedicine, bio-defense, intelligence analysis, market analysis, etc. For example, define new cancer subtypes based on their gene expression patterns, or discover the correlation between biological agents and environmental changes.

VISDA is capable of navigating into a high dimensional data set to discover the hidden clustered data structure, and model and visualize the discovery. It is particularly effective when dealing with highly complex data sets as compared to existing methods. To reveal all of the hidden clusters, our exploration of high-dimensional data space is both statistically-principled and visually-insightful. Our method can incorporate both the power of statistical methods and the human gift for pattern recognition, and is capable of capturing progressively all interesting aspects of the data set. To the best of our knowledge, it represents state-of-the-art in visual statistical data analysis and exploration. VISDA incorporates most advanced theory, method, and algorithm in statistical learning. It also works for both unsupervised and supervised scenarios. VISDA has recently been adopted as one of the core data analysis components by the National Cancer Institute (NCI) of the National Institutes of Health (NIH) via its new initiative, namely, cancer biomedical informatics grid (caBIG). (Please see caBIG web site for more information.)


caBIG VISDA - Tool Download Files


caBIG VISDA - Documentation


caBIG VISDA - Demo Files & Exercises


caBIG VISDA - Publication

  • J. Wang, H. Li, Y. Zhu, M. Yousef, M. Nebozhyn, M. Showe, L. Showe, J. Xuan, R. Clarke, and Y. Wang, "VISDA: An open-source caBIG analytical tool for data clustering and beyond," Bioinformatics, vol. 23, no. 15, pp. 2024-2027, 2007. (pdf)

  • Y. Zhu, H. Li, D. J. Miller, Z. Wang, J. Xuan, R. Clarke, E. P. Hoffman, and Y. Wang, "caBIG VISDA: modeling, visualization, and discovery for cluster analysis of genomic data," BMC Bioinformatics, 9:383, 2008. (pdf)





Back to the Software page

Copyright ©2004, Computational Bioinformatics and Bioimaging Laboratory (CBIL), Advanced Research Institute, Virginia Tech.

Last Updated: 02/17/2009. Suggestions/Comments - Webmaster