GHap: an R package for genome-wide haplotyping
The GHap R package was designed to call haplotypes from phased marker data. Given user-defined haplotype blocks (HapBlock), the package identifies the different haplotype alleles (HapAllele) present in the data and scores sample haplotype allele genotypes (HapGenotype) based on HapAllele dose (i.e. 0, 1 or 2 copies). The output is not only useful for analyses that can handle multi-allelic markers, but is also conveniently formatted for existing pipelines intended for bi-allelic markers. Availability and implementation: https://cran.r-project.org/package=GHap Contact: ytutsunomiya@gmail.com Supplementary information: Supple...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Utsunomiya, Y. T., Milanesi, M., Utsunomiya, A. T. H., Ajmone-Marsan, P., Garcia, J. F. Tags: GENETICS AND POPULATION ANALYSIS Source Type: research

CAGEd-oPOSSUM: motif enrichment analysis from CAGE-derived TSSs
With the emergence of large-scale Cap Analysis of Gene Expression (CAGE) datasets from individual labs and the FANTOM consortium, one can now analyze the cis-regulatory regions associated with gene transcription at an unprecedented level of refinement. By coupling transcription factor binding site (TFBS) enrichment analysis with CAGE-derived genomic regions, CAGEd-oPOSSUM can identify TFs that act as key regulators of genes involved in specific mammalian cell and tissue types. The webtool allows for the analysis of CAGE-derived transcription start sites (TSSs) either provided by the user or selected from ~1300 mammalian sa...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Arenillas, D. J., Forrest, A. R. R., Kawaji, H., Lassmann, T., The FANTOM Consortium, Wasserman, W. W., Mathelier, A. Tags: GENE EXPRESSION Source Type: research

compendiumdb: an R package for retrieval and storage of functional genomics data
Summary: Currently, the Gene Expression Omnibus (GEO) contains public data of over 1 million samples from more than 40 000 microarray-based functional genomics experiments. This provides a rich source of information for novel biological discoveries. However, unlocking this potential often requires retrieving and storing a large number of expression profiles from a wide range of different studies and platforms. The compendiumdb R package provides an environment for downloading functional genomics data from GEO, parsing the information into a local or remote database and interacting with the database using dedicated R functi...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Nandal, U. K., van Kampen, A. H. C., Moerland, P. D. Tags: GENE EXPRESSION Source Type: research

DBSI server: DNA binding site identifier
Summary: Protein–nucleic acid interactions are among the most important intermolecular interactions in the regulation of cellular events. Identifying residues involved in these interactions from protein structure alone is an important challenge. Here we introduce the webserver interface to DNA Binding Site Identifier (DBSI), a powerful structure-based SVM model for the prediction and visualization of DNA binding sites on protein structures. DBSI has been shown to be a top-performing model to predict DNA binding sites on the surface of a protein or peptide and shows promise in predicting RNA binding sites. Availabilit...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Sukumar, S., Zhu, X., Ericksen, S. S., Mitchell, J. C. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

MetalPredator: a web server to predict iron-sulfur cluster binding proteomes
Motivation: The prediction of the iron–sulfur proteome is highly desirable for biomedical and biological research but a freely available tool to predict iron–sulfur proteins has not been developed yet. Results: We developed a web server to predict iron–sulfur proteins from protein sequence(s). This tool, called MetalPredator, is able to process complete proteomes rapidly with high recall and precision. Availability and Implementation: The web server is freely available at: http://metalweb.cerm.unifi.it/tools/metalpredator/. Contact: andreini@cerm.unifi.it Supplementary information: Supplementary data are ...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Valasatava, Y., Rosato, A., Banci, L., Andreini, C. Tags: SEQUENCE ANALYSIS Source Type: research

Complex heatmaps reveal patterns and correlations in multidimensional genomic data
Summary: Parallel heatmaps with carefully designed annotation graphics are powerful for efficient visualization of patterns and relationships among high dimensional genomic data. Here we present the ComplexHeatmap package that provides rich functionalities for customizing heatmaps, arranging multiple parallel heatmaps and including user-defined annotation graphics. We demonstrate the power of ComplexHeatmap to easily reveal patterns and correlations among multiple sources of information with four real-world datasets. Availability and Implementation: The ComplexHeatmap package and documentation are freely available from the...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Gu, Z., Eils, R., Schlesner, M. Tags: GENOME ANALYSIS Source Type: research

TaggerOne: joint named entity recognition and normalization with semi-Markov Models
Motivation: Text mining is increasingly used to manage the accelerating pace of the biomedical literature. Many text mining applications depend on accurate named entity recognition (NER) and normalization (grounding). While high performing machine learning methods trainable for many entity types exist for NER, normalization methods are usually specialized to a single entity type. NER and normalization systems are also typically used in a serial pipeline, causing cascading errors and limiting the ability of the NER system to directly exploit the lexical information provided by the normalization. Methods: We propose the firs...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Leaman, R., Lu, Z. Tags: DATA AND TEXT MINING Source Type: research

A knowledge-based approach for predicting gene-disease associations
Motivation: Recent advances of next-generation sequence technologies have made it possible to rapidly and inexpensively identify gene variations. Knowing the disease association of these gene variations is important for early intervention to treat deadly diseases and provide possible targets to cure these diseases. Genome-wide association studies (GWAS) have identified many individual genes associated with common diseases. To exploit the large amount of data obtained from GWAS studies and leverage our understanding of common as well as rare diseases, we have developed a knowledge-based approach to predict gene–diseas...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Zhou, H., Skolnick, J. Tags: SYSTEMS BIOLOGY Source Type: research

Weighted mutual information analysis substantially improves domain-based functional network models
Motivation: Functional protein–protein interaction (PPI) networks elucidate molecular pathways underlying complex phenotypes, including those of human diseases. Extrapolation of domain–domain interactions (DDIs) from known PPIs is a major domain-based method for inferring functional PPI networks. However, the protein domain is a functional unit of the protein. Therefore, we should be able to effectively infer functional interactions between proteins based on the co-occurrence of domains. Results: Here, we present a method for inferring accurate functional PPIs based on the similarity of domain composition betwe...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Shim, J. E., Lee, I. Tags: SYSTEMS BIOLOGY Source Type: research

pong: fast analysis and visualization of latent clusters in population genetic data
Motivation: A series of methods in population genetics use multilocus genotype data to assign individuals membership in latent clusters. These methods belong to a broad class of mixed-membership models, such as latent Dirichlet allocation used to analyze text corpora. Inference from mixed-membership models can produce different output matrices when repeatedly applied to the same inputs, and the number of latent clusters is a parameter that is often varied in the analysis pipeline. For these reasons, quantifying, visualizing, and annotating the output from mixed-membership models are bottlenecks for investigators across mul...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Behr, A. A., Liu, K. Z., Liu-Fang, G., Nakka, P., Ramachandran, S. Tags: GENETICS AND POPULATION ANALYSIS Source Type: research

Integrated gene set analysis for microRNA studies
Motivation: Functional interpretation of miRNA expression data is currently done in a three step procedure: select differentially expressed miRNAs, find their target genes, and carry out gene set overrepresentation analysis. Nevertheless, major limitations of this approach have already been described at the gene level, while some newer arise in the miRNA scenario. Here, we propose an enhanced methodology that builds on the well-established gene set analysis paradigm. Evidence for differential expression at the miRNA level is transferred to a gene differential inhibition score which is easily interpretable in terms of gene ...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Garcia-Garcia, F., Panadero, J., Dopazo, J., Montaner, D. Tags: GENE EXPRESSION Source Type: research

Differential rhythmicity: detecting altered rhythmicity in biological data
We present and benchmark a set of statistical and computational methods for this type of analysis, here termed differential rhythmicity analysis. The methods detect alterations in rhythm amplitude, phase and signal to noise ratio in one set of measurements compared to another. Using these methods, we compared circadian rhythms in liver mRNA expression in mice held under two different lighting conditions: constant darkness and light-dark cycles, respectively. This analysis revealed widespread and reproducible amplitude increases in mice kept in light-dark cycles. Further analysis of the subset of differentially rhythmic tra...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Thaben, P. F., Westermark, P. O. Tags: GENE EXPRESSION Source Type: research

UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling
Motivation: Recent experimental studies have suggested that proteins fold via stepwise assembly of structural units named ‘foldons’ through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo protein conformational sampling from continuous space. However, existing computational approaches for de novo protein structure prediction often randomly sample protein conformational space as opposed to experimentally suggested stepwise sampling. Results: Here, we develop a novel generative, probabil...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Bhattacharya, D., Cao, R., Cheng, J. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

KCMBT: a k-mer Counter based on Multiple Burst Trees
Motivation: A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. A k-mer counter generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection and many other related applications use a k-mer counter as a building block. Very fast and efficient algorithms are necessary to count k-mers in large data sets to be useful in such applications. Results: We propose a novel trie-based algorithm for this k-mer counting problem. We compare our devised algorithm k-mer Coun...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Mamun, A.-A., Pal, S., Rajasekaran, S. Tags: SEQUENCE ANALYSIS Source Type: research

Revealing aperiodic aspects of solenoid proteins from sequence information
Motivation: Repeat proteins, which contain multiple repeats of short sequence motifs, form a large but seldom-studied group of proteins. Methods focusing on the analysis of 3D structures of such proteins identified many subtle effects in length distribution of individual motifs that are important for their functions. However, similar analysis was yet not applied to the vast majority of repeat proteins with unknown 3D structures, mostly because of the extreme diversity of the underlying motifs and the resulting difficulty to detect those. Results: We developed FAIT, a sequence-based algorithm for the precise assignment of i...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Hrabe, T., Jaroszewski, L., Godzik, A. Tags: SEQUENCE ANALYSIS Source Type: research