Living cells realize complex gene expression programs which are moderated by regulatory proteins called transcription factors (TFs). The TFs control the differential expression of target genes (either individually or in groups) in the context of transcriptional regulatory networks. Deciphering the mechanisms of how TFs control the expression of target genes is a challenging task, especially when multiple TFs collaboratively participate in the transcriptional regulation. To this end, we model the underlying regulatory interactions in terms of the interactions' directions (activation or repression) and their corresponding logical roles (necessary and/or sufficient) with a new algorithm called mTRIM.
mTRIM can efficiently predict the regulatory interactions for all possible collaborative TFs in a TRN. We achieve the feasibility in two steps: First, we develop an EM-based Bayesian inference approach to identify all the significant single-regulatory interactions, meaning that the individual TFs can function independently to the existence of other TFs. Clearly, in an insignificant interaction, a TF requires collaborations with other TFs to drive the target genes, or it is a non-deterministic TF. Second, to discover multiple regulatory interactions, every insignificant individual regulatory interaction is considered as a seed from which 2-TF candidates are generated. While significant 2-regulatory interactions (defined later) are reported to the researchers, the insignificant ones are used to generate 3-TF candidates. This process will repeat till all the interactions are found. Specifically, at the k-th iteration, we start by combining all the insignificant (k-1)-regulatory interactions generated at the previous step, resulting in k-TF candidates, for which we compute their affinity scores and p-values with a time-series gene expression dataset.
Our experiments on Yeast reveal 1,655 regulatory interactions, in which TFs express their functions on common target genes individually or collaboratively. The validation of mTRIM on Genetic Interaction and TF single/double knockout data shows that our algorithm is significantly better than the existing ones.
mTRIM runs in Linux with command "./mTRIM". It reads a TF-DNA binding dataset, a time-series gene expression dataset and a gene clustering result from the current directory.
The TF-DNA binding dataset is a matrix where the first row contains the TFs separated by commas and the first column contains the gene names. Each entry e(i,j) is a p-value indicating the significance of TF i binding to gene j. An example of the TF-DNA binding dataset is here. The time-series gene expression dataset is a matrix where each row is a gene followed with gene expression values separated by commas. An example of the time-series gene expression dataset is here. The gene clustering dataset is a two-column matrix, where the first column is for gene names and the second column is the cluster ID the gene belongs to. An example of the gene clustering dataset is here. By default, mTRIM uses p-value cutoff equal to 0.05 for the binding data and a cutoff equal to +/-0.35 to binaries gene expression values. To change the thresholds, please update the corresponding lines in the configuration file "configure.txt" in current directory of mTRIM. An example of the configration file is here.
All of the significant regulatory interactions are saved in file "TF_Patterns.txt".
An executable version and its source code are both provided.
For any questions, please contact Sherine Awad at firstname.lastname@example.org