PhenoCloud – Interactive phenotype data exploration

Phenomics data are usually collected from numerous sources representing different kinds of characteristics and for relatively long periods. The data are far beyond the direct perception of the human eyes. Biologists face challenges of developing efficient and robust computational tools to reduce large and diverse phenomics data into representations that can be interpreted in a biological context. Moreover, there is currently few tools that allow researchers to wander around the phenomics data and make discoveries by following intuition or simple serendipity.

As phenotype data continues to grow in complexity, diversity and volume, existing systems find it harder and harder to maintain a highly interactive experience. New intuitive exploration and visualization tools are required allowing for visualization, mapping, and synchronous adjustment to reduce the barrier to entry for researchers. We develop instant interactive data mining tools designed and optimized to exploratively analyze the massive phenomics data. This topic is novel because traditional data mining aims at finding highly interesting results, with the trade-off of being computationally demanding and time-consuming, and hence not suitable for people without computing background to explore large data.

We have developed an active learning software package with an interactive interface for determining the sampling rate of gene expression experiments. We have also developed a multi-heatmap visualization software tool called OLIVER, representing six types of data exploration: Observe, Link, Investigate, Visualize, Explore and Relate. The main workflow of OLIVER consists four major steps: visualize multiple heatmaps, sort and clustering, select genes in each heatmap with statistical tests, and map gene selections between heatmaps. Using integrative approximation and multi-dimensional visualization methods, OLIVER enables biologists to quickly integrate and compare large amounts of phenomics and genomics data with modest equipment and technical background.

Now the goal has been shifted to developing new techniques that instantly generate high-quality results, which are presented understandably, interactively and adaptively as to allow people to rapidly steer the method to the most informative areas. There are three perceptually-aware optimizations in our system. First, we model human perception as perceptual functions. The system automatically approximates data transformations that are perceptually indistinguishable, thus avoiding unnecessary computation. So the system can remain interactive while visualizing big phenomics data that need to be processed with computationally expensive procedures. Second, we model users' operations as feedback functions. Keeping human in the loop is a key enabling our system to silently adjust itself on the fly. Taking advantage of the existing work that has presented guidelines for using human perceptual insights to justify approximation algorithms, we build an active-learning based phenotype data visualization system that can learn from the user's actions and then iteratively adjusts its models and parameters, and extracts constraints based on human-computer interaction. Third, we develop fast any-time approximation algorithms by taking advantage of the high redundancy property in biomedical data. With a coarse-to-refine procedure, common tasks such as search and match can be finished in sublinear time.

Ultimately, by interactive visualization, exploring phenotype/genotype data will become a pleasant journey. Researchers can quickly grasp highly interesting results from large-scale phenomics data what will potentially lead to advances in understanding biological machinery or breakthroughs in bio-technology.

View My Stats