PhenoEncode – Phenome-genome-environment interaction discovery
Understanding how particular genotypes interact with the environment to produce specific phenotypic properties is a central goal of modern biology. However, associating phenotypes with the interactions of genotype and environment is generally a difficult problem due to a large number of genes and gene products that contribute to multiple phenotypes in concert with complex and dynamic environmental influences. We are developing, testing and applying phenome-genome-environment interaction discovery algorithms to identify emergent phenotype-genotype patterns under dynamic environmental conditions. We will take full advantage of the availability of multi-omics data, and will optimize the computational models using domain knowledge.
Phenotype-genotype-environment interaction identification
Exploring complex interactions among phenome, genome, and environment leads to key scientific discoveries such as new drug discovery, efficient aging treatment, and increased crop yield. By studying the relationships between phenotype and environment, we have developed Dynamic Filter to identify outliers in phenotype data. The tool can characterize abnormalities caused by system errors, which are difficult to remove in the data collection step, thus distinguishing errors from more interesting cases of altered biological responses. Specifically, Dynamic Filter derives a theoretical curve representing the interaction between light intensity and photosynthesis efficiency; adjusts the curve to fit the phenotype data via optimization and studies the deviations of individual phenotype values from theoretical curve. The resulting patterns in residuals indicate abnormalities, and the optimized theoretical curves reveal true biological outliers.
Our current research aims at learning the functional relationships between phenotypes and environmental parameters, with the ultimate goal to learn the relationships between genotypes. While many models assume there is only one function describing the average relationship, it would be more precise to adopt multiple functions, each describing the phenotype-environment relationship in a fixed environment, that are connected following the rule of phenotype plasticity. Using Bayesian theorem, we will develop robust curve-fitting algorithms for function parameter estimation.
We will also develop schemes to evaluate the performance of our methods. Specifically, to test if an algorithm is robust to noise, we will measure how significantly the random noise will affect the similarity between plants using precision/recall or Kendall tau. To determine if a model can capture the overall pattern in phenotype measures, we will identify the most similar and the most dissimilar pairs of plants, and query biological databases to check whether the genotypes of the similar (dissimilar) plants are likely to play similar (different) roles.
Dynamic RIL/GWAS with phenomics and genomics data
With the rapid development of advanced phenotyping tools, there is increasing recognition and appreciation of modeling dynamics of phenotypes and studying how they are evolved in response to perturbation of the genetic and environmental conditions. However, the traditional RIL/GWAS platforms primarily focus on steady-state phenotypes measured at a specified condition or time, regardless the fact that, in the real world, many phenotypes correlate to each other and change over time and conditions. To this clear need, we would like to present a dynamic IRL/GWAS model to estimate key parameters of the dynamic system and to test the association of genetic variants with multiple temporal phenotypes.
This work is based on our recent progress in modeling temporal phenotypes for early plant disease detection (in preparation). The rationale is that the particular disease we study mainly affects plant metabolism, which may disturb photosynthesis phenotypes in disease plants even in the early stage. A simple linear regression showed the phenotypes of the disease and the normal plants are different, but the differences are not statistically significant. However, by modeling temporal phenotypes as continuous functions of time/conditions using kernel smoother, our new model is able to separate disease and normal plants in the early stage with precision as high as 95%. This work allows us to extend the current IRL/GWAS models towards understanding how the genetic variations and environmental perturbation act together to dynamically alter regulations and metabolism leading to the emergence of complex phenotypes.