Practical aspects of gene regulatory inference via conditional inference forests from expression data

ABSTRACT Gene regulatory network (GRN) inference is an active area of research that facilitates understanding the complex interplays between biological molecules. We propose a novel framework to create such GRNs, based on Conditional Inference Forests (CIFs) as proposed by Strobl et al. Our framework consists of using ensembles of Conditional Inference Trees (CITs) and selecting an appropriate aggregation scheme for variant selection prior to network construction. We show on synthetic microarray data that taking the original implementation of CIFs with conditional permutation scheme (CIFcond) may lead to improved performance compared to Breiman's implementation of Random Forests (RF). Among all newly introduced CIF‐based methods and five network scenarios obtained from the DREAM4 challenge, CIFcond performed best. Networks derived from well‐tuned CIFs, obtained by simply averaging P‐values over tree ensembles (CIFmean) are particularly attractive, because they combine adequate performance with computational efficiency. Moreover, thresholds for variable selection are based on significance levels for P‐values and, hence, do not need to be tuned. From a practical point of view, our extensive simulations show the potential advantages of CIFmean‐based methods. Although more work is needed to improve on speed, especially when fully exploiting the advantages of CITs in the context of heterogeneous and correlated data, we have shown that CIF methodology can be flexibly ins...
Source: Genetic Epidemiology - Category: Epidemiology Authors: Tags: Research Article Source Type: research