A novel method for predicting activity of cis-regulatory modules, based on a diverse training set

We present a novel strategy to improve the above-mentioned approach: to predict if a CRM drives a specific gene expression pattern, assess not only how similar the CRM is to other CRMs with similar activity but also to CRMs with distinct activities. We use a state-of-the-art statistical method to quantify a CRM’s sequence similarity to many different training sets of CRMs, and employ a classification algorithm to integrate these similarity scores into a single prediction of the CRM’s activity. This strategy is shown to significantly improve CRM activity prediction over current approaches. Availability and Implementation: Our implementation of the new method, called IMMBoost, is freely available as source code, at https://github.com/weiyangedward/IMMBoost. Contact: sinhas@illinois.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Source: Bioinformatics - Category: Bioinformatics Authors: Tags: GENOME ANALYSIS Source Type: research