L1 regularization facilitates detection of cell type-specific parameters in dynamical systems
In conclusion, the approach constitutes a general method to infer an overarching model with a minimum number of individual parameters for the particular models. Availability and Implementation: A MATLAB implementation is provided within the freely available, open-source modeling environment Data2Dynamics. Source code for all examples is provided online at http://www.data2dynamics.org/. Contact: bernhard.steiert@fdm.uni-freiburg.de (Source: Bioinformatics)
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Steiert, B., Timmer, J., Kreutz, C. Tags: SYSTEMS Source Type: research

A probabilistic model for detecting rigid domains in protein structures
We present a probabilistic model for detecting rigid-body movements in protein structures. Our model aims to approximate alternative conformational states by a few structural parts that are rigidly transformed under the action of a rotation and a translation. By using Bayesian inference and Markov chain Monte Carlo sampling, we estimate all parameters of the model, including a segmentation of the protein into rigid domains, the structures of the domains themselves, and the rigid transformations that generate the observed structures. We find that our Gibbs sampling algorithm can also estimate the optimal number of rigid dom...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Nguyen, T., Habeck, M. Tags: PROTEINS Source Type: research

Simulated linear test applied to quantitative proteomics
Motivation: Omics studies aim to find significant changes due to biological or functional perturbation. However, gene and protein expression profiling experiments contain inherent technical variation. In discovery proteomics studies where the number of samples is typically small, technical variation plays an important role because it contributes considerably to the observed variation. Previous methods place both technical and biological variations in tightly integrated mathematical models that are difficult to adapt for different technological platforms. Our aim is to derive a statistical framework that allows the inclusio...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Pham, T. V., Jimenez, C. R. Tags: PROTEINS Source Type: research

PEPSI-Dock: a detailed data-driven protein-protein interaction potential accelerated by polar Fourier correlation
Motivation: Docking prediction algorithms aim to find the native conformation of a complex of proteins from knowledge of their unbound structures. They rely on a combination of sampling and scoring methods, adapted to different scales. Polynomial Expansion of Protein Structures and Interactions for Docking (PEPSI-Dock) improves the accuracy of the first stage of the docking pipeline, which will sharpen up the final predictions. Indeed, PEPSI-Dock benefits from the precision of a very detailed data-driven model of the binding free energy used with a global and exhaustive rigid-body search space. As well as being accurate, o...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Neveu, E., Ritchie, D. W., Popov, P., Grudinin, S. Tags: PROTEINS Source Type: research

Patterns of amino acid conservation in human and animal immunodeficiency viruses
Motivation: Due to their high genomic variability, RNA viruses and retroviruses present a unique opportunity for detailed study of molecular evolution. Lentiviruses, with HIV being a notable example, are one of the best studied viral groups: hundreds of thousands of sequences are available together with experimentally resolved three-dimensional structures for most viral proteins. In this work, we use these data to study specific patterns of evolution of the viral proteins, and their relationship to protein interactions and immunogenicity. Results: We propose a method for identification of two types of surface residues clus...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Voitenko, O. S., Dhroso, A., Feldmann, A., Korkin, D., Kalinina, O. V. Tags: PROTEINS Source Type: research

SWORD--a highly efficient protein database search
Motivation: Protein database search is one of the fundamental problems in bioinformatics. For decades, it has been explored and solved using different exact and heuristic approaches. However, exponential growth of data in recent years has brought significant challenges in improving already existing algorithms. BLAST has been the most successful tool for protein database search, but is also becoming a bottleneck in many applications. Due to that, many different approaches have been developed to complement or replace it. In this article, we present SWORD, an efficient protein database search implementation that runs 8–...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Vaser, R., Pavlovic, D., Sikic, M. Tags: PROTEINS Source Type: research

AUCpreD: proteome-level protein disorder prediction by AUC-maximized deep convolutional neural fields
This article formulates IDR prediction as a sequence labeling problem and employs a new machine learning method called Deep Convolutional Neural Fields (DeepCNF) to solve it. DeepCNF is an integration of deep convolutional neural networks (DCNN) and conditional random fields (CRF); it can model not only complex sequence–structure relationship in a hierarchical manner, but also correlation among adjacent residues. To deal with highly imbalanced order/disorder ratio, instead of training DeepCNF by widely used maximum-likelihood, we develop a novel approach to train it by maximizing area under the ROC curve (AUC), which...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Wang, S., Ma, J., Xu, J. Tags: PROTEINS Source Type: research

PRED-TMBB2: improved topology prediction and detection of beta-barrel outer membrane proteins
Motivation: The PRED-TMBB method is based on Hidden Markov Models and is capable of predicting the topology of beta-barrel outer membrane proteins and discriminate them from water-soluble ones. Here, we present an updated version of the method, PRED-TMBB2, with several newly developed features that improve its performance. The inclusion of a properly defined end state allows for better modeling of the beta-barrel domain, while different emission probabilities for the adjacent residues in strands are used to incorporate knowledge concerning the asymmetric amino acid distribution occurring there. Furthermore, the training wa...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Tsirigos, K. D., Elofsson, A., Bagos, P. G. Tags: PROTEINS Source Type: research

ModuleAlign: module-based global alignment of protein-protein interaction networks
Motivation: As an increasing amount of protein–protein interaction (PPI) data becomes available, their computational interpretation has become an important problem in bioinformatics. The alignment of PPI networks from different species provides valuable information about conserved subnetworks, evolutionary pathways and functional orthologs. Although several methods have been proposed for global network alignment, there is a pressing need for methods that produce more accurate alignments in terms of both topological and functional consistency. Results: In this work, we present a novel global network alignment algorith...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Hashemifar, S., Ma, J., Naveed, H., Canzar, S., Xu, J. Tags: PROTEINS Source Type: research

Snowball: strain aware gene assembly of metagenomes
Motivation: Gene assembly is an important step in functional analysis of shotgun metagenomic data. Nonetheless, strain aware assembly remains a challenging task, as current assembly tools often fail to distinguish among strain variants or require closely related reference genomes of the studied species to be available. Results: We have developed Snowball, a novel strain aware gene assembler for shotgun metagenomic data that does not require closely related reference genomes to be available. It uses profile hidden Markov models (HMMs) of gene domains of interest to guide the assembly. Our assembler performs gene assembly of...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Gregor, I., Schönhuth, A., McHardy, A. C. Tags: GENES Source Type: research

DeepChrome: deep-learning for predicting gene expression from histone modifications
Motivation: Histone modifications are among the most important factors that control gene regulation. Computational methods that predict gene expression from histone modification signals are highly desirable for understanding their combinatorial effects in gene regulation. This knowledge can help in developing ‘epigenetic drugs’ for diseases like cancer. Previous studies for quantifying the relationship between histone modifications and gene expression levels either failed to capture combinatorial effects or relied on multiple methods that separate predictions and combinatorial analysis. This paper develops a un...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Singh, R., Lanchantin, J., Robins, G., Qi, Y. Tags: GENES Source Type: research

PeakXus: comprehensive transcription factor binding site discovery from ChIP-Nexus and ChIP-Exo experiments
We describe a peak caller PeakXus that is specifically designed to leverage the increased resolution of ChIP-exo/Nexus and developed with the aim of making as few assumptions of the data as possible to allow discoveries of novel binding patterns. We apply PeakXus to ChIP-Nexus and ChIP-exo experiments performed both in Homo sapiens and in Drosophila melanogaster cell lines. We show that PeakXus consistently finds more peaks overlapping with a TF-specific recognition sequence than published methods. As an application example we demonstrate how PeakXus can be coupled with unique molecular identifiers (UMIs) to measure the ef...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Hartonen, T., Sahu, B., Dave, K., Kivioja, T., Taipale, J. Tags: GENES Source Type: research

XGSA: A statistical method for cross-species gene set analysis
In this study, we show that not accounting for the complex homology structure when comparing gene sets in two species can lead to false positive discoveries, especially when comparing gene sets that have complex gene homology relationships. To overcome this bias, we propose a straightforward statistical approach, called XGSA, that explicitly takes the cross-species homology mapping into consideration when doing gene set analysis. Simulation experiments confirm that XGSA can avoid false positive discoveries, while maintaining good statistical power compared to other ad hoc approaches for cross-species gene set analysis. We ...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Djordjevic, D., Kusumi, K., Ho, J. W. K. Tags: GENES Source Type: research

Gene-set association tests for next-generation sequencing data
Motivation: Recently, many methods have been developed for conducting rare-variant association studies for sequencing data. These methods have primarily been based on gene-level associations but have not been proven to be as effective as expected. Gene-set-level tests have shown great advantages over gene-level tests in terms of power and robustness, because complex diseases are often caused by multiple genes that comprise of biological gene sets. Results: Here, we propose several novel gene-set tests that employ rapid and efficient dimensionality reduction. The performance of these tests was investigated using extensive s...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Lee, J., Kim, Y. J., Lee, J., T2D-Genes Consortium, Kim, B.-J., Lee, S., Park, T. Tags: GENES Source Type: research

A unified model based multifactor dimensionality reduction framework for detecting gene-gene interactions
Conclusions: UM-MDR provides a very good supplement of existing MDR method due to its efficiency in achieving significance for every multi-locus model, its power and its flexibility of handling different types of traits. Availability and implementation: A R package "umMDR" and other source codes are freely available at http://statgen.snu.ac.kr/software/umMDR/. Contact: tspark@stats.snu.ac.kr Supplementary information: Supplementary data are available at Bioinformatics online. (Source: Bioinformatics)
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Yu, W., Lee, S., Park, T. Tags: GENES Source Type: research