NETSIM: A Novel Tool for Measuring Functional Similarities Using Gene Ontology and Gene Co-Function Networks

MSU-DOE Plant Research Lab, Michigan State University

Developed by Jiajie Peng (1,2), Sahra Uygun (2,3), Taehyong Kim (4), Yadong Wang (1), Seung Y. Rhee (4), Jin Chen (2,5)
1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
2 MSU-DOE Plant Research Laboratory, Michigan State University, East Lansing, MI 48824
3 Genetics Program, Michigan State University, East Lansing, MI 48824
4 Carnegie Institution for Science, Department of Plant Biology, 260 Panama St, Stanford, CA 94305
5 Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824

Gene Ontology (GO)-based similarity measures, which rely on the GO structure and their annotations, have been used widely to measure functional similarity between genes. However, since the GO structure is taxon-neutral, relying only on it may not account for similarities between functions with taxon-specific relationships that are not explicitly represented in GO. Like-wise, relying only on GO annotations reduces the power of GO-based similarity because of the limited number of genes that are annotated to GO in most organisms.

We introduce a novel approach called NETSIM (network-based similarity measure) that measures functional similarities be-tween genes or GO terms by incorporating information from gene co-function networks in addition to using the GO struc-ture and annotations. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that we can improve the accuracy of GO similarity measurements by incorporating additional biological information from gene co-function net-works. Even for genomes with sparser gene annotation data such as Arabidopsis, NETSIM outperforms the existing measures. We used NETSIM on large Arabidopsis gene fami-lies such as cytochrome P450 monooxygenases to group the members functionally and show that this grouping could facili-tate functional characterization of genes in these families.

NETSIM measures the similarity between all pairs of GO terms within category and all pairs of genes annotated to the GO terms in four steps. First, it calculates the functional distance between a pair of gene sets that are annotated to a pair of GO terms using a gene co-function network. Second, it calculates GO term similarity based on the annotations to the common parent term, but propagates only the annotations to the terms that lie on the paths from the two GO terms to the common parent term. Third, it computes similarity between the two GO terms based on the functional distance of annotated genes from co-function networks and the path-constrained GO annotation. Finally, the similarity between all pairs of GO terms is determined and gene-to-gene functional similarity is calculated based on the similarities of the GO terms annotated to them.

NETSIM User Manual

NETSIM was implemented with Java JDK 1.6 and JUNG library (jung.sourceforge.net) (O’Madadhain, et al., 2005) (Supplementary information section S1). In the current version, we provide two jar packages for calculating gene similarity. One is for yeast and Arabidopsis and the other is for human. Both packages have been tested on Windows 7. To use NETSIM, please place the NETSIM files, the JUNG library files and your data file in the same folder and call "java -jar netsim.jar" in the DOS command window. Please see details in the readme file in the download.

Download

Supplementary document: the supplementary document of NETSIM paper and the supplementary figures and tables.

Package for yeast and Arabidopsis: yeastArabidopsisPackage.zip. This package is developed to compute term-term and gene-gene similarities for genes in yeast and Arabidopsis. The data used in our experiments are in the "data" sub-folder, and the JUNG library files are in the “lib” folder. The netsim.jar, GO data, JUNG library and the readme file are all included in the zip files.

Package for human: humanPackage.zip. This package is developed to compute term-term and gene-gene similarity on human. The netsim.jar, GO data, JUNG library and the readme file are all included in the zip files.

Contact

If you have any questions, please contact Jiajie Peng (jiajiepeng@hit.edu.cn).

Website created on Nov 1st 2013, revised on Nov 25 2013.