Predicting the errors of predicted local backbone angles and non-local solvent- accessibilities of proteins by deep neural networks
Motivation: Backbone structures and solvent accessible surface area of proteins are benefited from continuous real value prediction because it removes the arbitrariness of defining boundary between different secondary-structure and solvent-accessibility states. However, lacking the confidence score for predicted values has limited their applications. Here we investigated whether or not we can make a reasonable prediction of absolute errors for predicted backbone torsion angles, Cα-atom-based angles and torsion angles, solvent accessibility, contact numbers and half-sphere exposures by employing deep neural networks. ...
Source: Bioinformatics - December 18, 2016 Category: Bioinformatics Authors: Gao, J., Yang, Y., Zhou, Y. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

PPI4DOCK: large scale assessment of the use of homology models in free docking over more than 1000 realistic targets
Motivation: Protein–protein docking methods are of great importance for understanding interactomes at the structural level. It has become increasingly appealing to use not only experimental structures but also homology models of unbound subunits as input for docking simulations. So far we are missing a large scale assessment of the success of rigid-body free docking methods on homology models. Results: We explored how we could benefit from comparative modelling of unbound subunits to expand docking benchmark datasets. Starting from a collection of 3157 non-redundant, high X-ray resolution heterodimers, we developed t...
Source: Bioinformatics - December 18, 2016 Category: Bioinformatics Authors: Yu, J., Guerois, R. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

DOMINO: development of informative molecular markers for phylogenetic and genome-wide population genetic studies in non-model organisms
Motivation: The development of molecular markers is one of the most important challenges in phylogenetic and genome wide population genetics studies, especially in studies with non-model organisms. A highly promising approach for obtaining suitable markers is the utilization of genomic partitioning strategies for the simultaneous discovery and genotyping of a large number of markers. Unfortunately, not all markers obtained from these strategies provide enough information for solving multiple evolutionary questions at a reasonable taxonomic resolution. Results: We have developed Development Of Molecular markers In Non-model...
Source: Bioinformatics - December 18, 2016 Category: Bioinformatics Authors: Frias-Lopez, C., Sanchez-Herrero, J. F., Guirao-Rico, S., Mora, E., Arnedo, M. A., Sanchez-Gracia, A., Rozas, J. Tags: PHYLOGENETICS Source Type: research

Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types
Motivation: With the rapid increase of infection resistance to antibiotics, it is urgent to find novel infection therapeutics. In recent years, antimicrobial peptides (AMPs) have been utilized as potential alternatives for infection therapeutics. AMPs are key components of the innate immune system and can protect the host from various pathogenic bacteria. Identifying AMPs and their functional types has led to many studies, and various predictors using machine learning have been developed. However, there is room for improvement; in particular, no predictor takes into account the lack of balance among different functional AM...
Source: Bioinformatics - December 18, 2016 Category: Bioinformatics Authors: Lin, W., Xu, D. Tags: SEQUENCE ANALYSIS Source Type: research

H-PoP and H-PoPG: heuristic partitioning algorithms for single individual haplotyping of polyploids
This article models the polyploid haplotyping problem as an optimal poly-partition problem of the reads, called the Polyploid Balanced Optimal Partition model. For the reads sequenced from a k-ploid genome, the model tries to divide the reads into k groups such that the difference between the reads of the same group is minimized while the difference between the reads of different groups is maximized. When the genotype information is available, the model is extended to the Polyploid Balanced Optimal Partition with Genotype constraint problem. These models are all NP-hard. We propose two heuristic algorithms, H-PoP and H-PoP...
Source: Bioinformatics - December 18, 2016 Category: Bioinformatics Authors: Xie, M., Wu, Q., Wang, J., Jiang, T. Tags: SEQUENCE ANALYSIS Source Type: research

Assembly-based inference of B-cell receptor repertoires from short read RNA sequencing data with VDJer
We present here V’DJer, an assembly-based method that reconstructs adaptive immune receptor repertoires from short-read RNA sequencing data. This method captures expressed BCR loci from a standard RNA-seq assay. We applied this method to 473 Melanoma samples from The Cancer Genome Atlas and demonstrate V’DJer’s ability to accurately reconstruct BCR repertoires from short read mRNA-seq data. Availability and Implementation: V’DJer is implemented in C/C ++, freely available for academic use and can be downloaded from Github: https://github.com/mozack/vdjer Contact: benjamin_vincent@med.unc.edu or park...
Source: Bioinformatics - December 18, 2016 Category: Bioinformatics Authors: Mose, L. E., Selitsky, S. R., Bixby, L. M., Marron, D. L., Iglesia, M. D., Serody, J. S., Perou, C. M., Vincent, B. G., Parker, J. S. Tags: SEQUENCE ANALYSIS Source Type: research

A new correlation clustering method for cancer mutation analysis
Motivation: Cancer genomes exhibit a large number of different alterations that affect many genes in a diverse manner. An improved understanding of the generative mechanisms behind the mutation rules and their influence on gene community behavior is of great importance for the study of cancer. Results: To expand our capability to analyze combinatorial patterns of cancer alterations, we developed a rigorous methodology for cancer mutation pattern discovery based on a new, constrained form of correlation clustering. Our new algorithm, named C3 (Cancer Correlation Clustering), leverages mutual exclusivity of mutations, patien...
Source: Bioinformatics - December 18, 2016 Category: Bioinformatics Authors: Hou, J. P., Emad, A., Puleo, G. J., Ma, J., Milenkovic, O. Tags: GENOME ANALYSIS Source Type: research

CSAM: Compressed SAM format
We describe CSAM (Compressed SAM format), a compression approach offering lossless and lossy compression for SAM files. The structures and techniques proposed are suitable for representing SAM files, as well as supporting fast access to the compressed information. They generate more compact lossless representations than BAM, which is currently the preferred lossless compressed SAM-equivalent format; and are self-contained, that is, they do not depend on any external resources to compress or decompress SAM files. Availability and Implementation: An implementation is available at https://github.com/rcanovas/libCSAM. Contact:...
Source: Bioinformatics - December 18, 2016 Category: Bioinformatics Authors: Canovas, R., Moffat, A., Turpin, A. Tags: GENOME ANALYSIS Source Type: research

ReadXplorer 2--detailed read mapping analysis and visualization from one single source
Motivation: The vast amount of already available and currently generated read mapping data requires comprehensive visualization, and should benefit from bioinformatics tools offering a wide spectrum of analysis functionality from just one source. Appropriate handling of multiple mapped reads during mapping analyses remains an issue that demands improvement. Results: The capabilities of the read mapping analysis and visualization tool ReadXplorer were vastly enhanced. Here, we present an even finer granulated read mapping classification, improving the level of detail for analyses and visualizations. The spectrum of automati...
Source: Bioinformatics - December 18, 2016 Category: Bioinformatics Authors: Hilker, R., Stadermann, K. B., Schwengers, O., Anisiforov, E., Jaenicke, S., Weisshaar, B., Zimmermann, T., Goesmann, A. Tags: GENOME ANALYSIS Source Type: research

A computational strategy to adjust for copy number in tumor Hi-C data
Motivation: The Hi-C technology was designed to decode the three-dimensional conformation of the genome. Despite progress towards more and more accurate contact maps, several systematic biases have been demonstrated to affect the resulting data matrix. Here we report a new source of bias that can arise in tumor Hi-C data, which is related to the copy number of genomic DNA. To address this bias, we designed a chromosome-adjusted iterative correction method called caICB. Our caICB correction method leads to significant improvements when compared with the original iterative correction in terms of eliminating copy number bias....
Source: Bioinformatics - December 18, 2016 Category: Bioinformatics Authors: Wu, H.-J., Michor, F. Tags: GENOME ANALYSIS Source Type: research

MorphoLibJ: integrated library and plugins for mathematical morphology with ImageJ
Motivation: Mathematical morphology (MM) provides many powerful operators for processing 2D and 3D images. However, most MM plugins currently implemented for the popular ImageJ/Fiji platform are limited to the processing of 2D images. Results: The MorphoLibJ library proposes a large collection of generic tools based on MM to process binary and grey-level 2D and 3D images, integrated into user-friendly plugins. We illustrate how MorphoLibJ can facilitate the exploitation of 3D images of plant tissues. Availability and Implementation: MorphoLibJ is freely available at http://imagej.net/MorphoLibJ Contact: david.legland@nante...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Legland, D., Arganda-Carreras, I., Andrey, P. Tags: BIOIMAGE INFORMATICS Source Type: research

LEVER: software tools for segmentation, tracking and lineaging of proliferating cells
The analysis of time-lapse images showing cells dividing to produce clones of related cells is an important application in biological microscopy. Imaging at the temporal resolution required to establish accurate tracking for vertebrate stem or cancer cells often requires the use of transmitted light or phase-contrast microscopy. Processing these images requires automated segmentation, tracking and lineaging algorithms. There is also a need for any errors in the automated processing to be easily identified and quickly corrected. We have developed LEVER, an open source software tool that combines the automated image analysis...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Winter, M., Mankowski, W., Wait, E., Temple, S., Cohen, A. R. Tags: BIOIMAGE INFORMATICS Source Type: research

PcircRNA_finder: a software for circRNA prediction in plants
Motivation: Recent studies reveal an important role of non-coding circular RNA (circRNA) in the control of cellular processes. Because of differences in the organization of plant and mammal genomes, the sensitivity and accuracy of circRNA prediction programs using algorithms developed for animals and humans perform poorly for plants. Results: A circRNA prediction software for plants (termed PcircRNA_finder) was developed that is more sensitive in detecting circRNAs than other frequently used programs (such as find_circ and CIRCexplorer), Based on analysis of simulated and real rRNA-/RNAase R RNA-Seq data from Arabidopsis t...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Chen, L., Yu, Y., Zhang, X., Liu, C., Ye, C., Fan, L. Tags: DATA AND TEXT MINING Source Type: research

VISUALGRAPHX: interactive graph visualization within Galaxy
Motivation: We developed VisualGraphX, a web-based, interactive visualization tool for large-scale graphs. Current graph visualization tools that follow the rich-internet paradigm lack an interactive and scalable visualization of graph-based data. VisualGraphX aims to provide a universal graph visualization tool that empowers the users to efficiently explore the data for themselves at a large scale. It is available as a visualization plugin for the Galaxy platform, such that VisualGraphX can be integrated into custom analysis pipelines. Availability and Implementation: VisualGraphX has been released as a visualization plu...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Schäfer, R. A., Voss, B. Tags: DATA AND TEXT MINING Source Type: research

ReactPRED: a tool to predict and analyze biochemical reactions
Motivation: Biochemical pathways engineering is often used to synthesize or degrade target chemicals. In silico screening of the biochemical transformation space allows predicting feasible reactions, constituting these pathways. Current enabling tools are customized to predict reactions based on pre-defined biochemical transformations or reaction rule sets. Reaction rule sets are usually curated manually and tailored to specific applications. They are not exhaustive. In addition, current systems are incapable of regulating and refining data with an aim to tune specificity and sensitivity. A robust and flexible tool that al...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Sivakumar, T. V., Giri, V., Park, J. H., Kim, T. Y., Bhaduri, A. Tags: SYSTEMS BIOLOGY Source Type: research