Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP ‐Seq data

Model‐based clustering is a technique widely used to group a collection of units into mutually exclusive groups. There are, however, situations in which an observation could in principle belong to more than one cluster. In the context of next‐generation sequencing (NGS) experiments, for example, the signal observed in the data might be produced by two (or more) different biological processes operating together and a gene could participate in both (or all) of them. We propose a novel approach to cluster NGS discrete data, coming from a ChIP‐Seq experiment, with a mixture model, allowing each unit to belong potentially to more than one group: these multiple allocation clusters can be flexibly defined via a function combining the features of the original groups without introducing new parameters. The formulation naturally gives rise to a ‘zero‐inflation group’ in which values close to zero can be allocated, acting as a correction for the abundance of zeros that manifest in this type of data. We take into account the spatial dependency between observations, which is described through a latent conditional autoregressive process that can reflect different dependency patterns. We assess the performance of our model within a simulation environment and then we apply it to ChIP‐seq real data.
Source: Biometrical Journal - Category: Biotechnology Authors: Tags: Research Paper Source Type: research