NVT: a fast and simple tool for the assessment of RNA-seq normalization strategies
Motivation: Measuring differential gene expression is a common task in the analysis of RNA-Seq data. To identify differentially expressed genes between two samples, it is crucial to normalize the datasets. While multiple normalization methods are available, all of them are based on certain assumptions that may or may not be suitable for the type of data they are applied on. Researchers therefore need to select an adequate normalization strategy for each RNA-Seq experiment. This selection includes exploration of different normalization methods as well as their comparison. Methods that agree with each other most likely repre...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Eder, T., Grebien, F., Rattei, T. Tags: GENE EXPRESSION Source Type: research

shinyGEO: a web-based application for analyzing gene expression omnibus datasets
We describe a web application, shinyGEO, that allows a user to download gene expression data sets directly from GEO in order to perform differential expression and survival analysis for a gene of interest. In addition, shinyGEO supports customized graphics, sample selection, data export and R code generation so that all analyses are reproducible. The availability of shinyGEO makes GEO datasets more accessible to non-bioinformaticians, promising to lead to better understanding of biological processes and genetic diseases such as cancer. Availability and Implementation: Web application and source code are available from http...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Dumas, J., Gargano, M. A., Dancik, G. M. Tags: GENE EXPRESSION Source Type: research

PRODIGY: a web server for predicting the binding affinity of protein-protein complexes
Summary: Gaining insights into the structural determinants of protein–protein interactions holds the key for a deeper understanding of biological functions, diseases and development of therapeutics. An important aspect of this is the ability to accurately predict the binding strength for a given protein–protein complex. Here we present PROtein binDIng enerGY prediction (PRODIGY), a web server to predict the binding affinity of protein–protein complexes from their 3D structure. The PRODIGY server implements our simple but highly effective predictive model based on intermolecular contacts and properties der...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Xue, L. C., Rodrigues, J. P., Kastritis, P. L., Bonvin, A. M., Vangone, A. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

WebSTAR3D: a web server for RNA 3D structural alignment
Summary: The WebSTAR3D web server is a user-friendly online interface for the alignment of RNA 3D structures. The website takes as input two files, each of which can be in either PDB or mmCIF format, containing the desired structures to align, via a PDB code or user upload. In return, the user is presented with a visualization of the aligned structures in Jmol or JSmol, along with the corresponding sequence alignment, and the option to download the nucleotide mapping of the structures and a PDB file containing the aligned, superimposed structures. Availability and Implementation: The WebSTAR3D is available at http://rna.uc...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Holzhauser, E., Ge, P., Zhang, S. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

Gene Slider: sequence logo interactive data-visualization for education and research
Summary: Gene Slider helps visualize the conservation and entropy of orthologous DNA and protein sequences by presenting them as one long sequence logo that can be zoomed in and out of, from an overview of the entire sequence down to just a few residues at a time. A search function enables users to find motifs such as cis-elements in promoter regions by simply ‘drawing’ a sequence logo representation of the desired motif as a query. In addition to displaying user-supplied FASTA files, our demonstration version of Gene Slider loads and displays a rich database of 90 000+ conserved non-coding regions across ...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Waese, J., Pasha, A., Wang, T. T., van Weringh, A., Guttman, D. S., Provart, N. J. Tags: SEQUENCE ANALYSIS Source Type: research

oxBS-MLE: an efficient method to estimate 5-methylcytosine and 5-hydroxymethylcytosine in paired bisulfite and oxidative bisulfite treated DNA
Motivation: 5-Methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) are important epigenetic regulators of gene expression. 5mC and 5hmC levels can be computationally inferred at single base resolution using sequencing or array data from paired DNA samples that have undergone bisulfite and oxidative bisulfite conversion. Current estimation methods have been shown to produce irregular estimates of 5hmC level or are extremely computation intensive. Results: We developed an efficient method oxBS-MLE based on binomial modeling of paired bisulfite and oxidative bisulfite data from sequencing or array analysis. Evaluation in s...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Xu, Z., Taylor, J. A., Leung, Y.-K., Ho, S.-M., Niu, L. Tags: GENOME ANALYSIS Source Type: research

FARAO: the flexible all-round annotation organizer
Summary: With decreasing costs of generating DNA sequence data, genome and metagenome projects have become accessible to a wider scientific community. However, to extract meaningful information and visualize the data remain challenging. We here introduce FARAO, a highly scalable software for organization, visualization and integration of annotation and read coverage data that can also combine output data from several bioinformatics tools. The capabilities of FARAO can greatly aid analyses of genomic and metagenomic datasets. Availability and Implementation: FARAO is implemented in Perl and is supported under Unix-like oper...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Hammaren, R., Pal, C., Bengtsson-Palme, J. Tags: GENOME ANALYSIS Source Type: research

genipe: an automated genome-wide imputation pipeline with automatic reporting and statistical tools
Summary: Genotype imputation is now commonly performed following genome-wide genotyping experiments. Imputation increases the density of analyzed genotypes in the dataset, enabling fine-mapping across the genome. However, the process of imputation using the most recent publicly available reference datasets can require considerable computation power and the management of hundreds of large intermediate files. We have developed genipe, a complete genome-wide imputation pipeline which includes automatic reporting, imputed data indexing and management, and a suite of statistical tests for imputed data commonly used in genetic e...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Lemieux Perreault, L.-P., Legault, M.-A., Asselin, G., Dube, M.-P. Tags: GENOME ANALYSIS Source Type: research

CGDM: collaborative genomic data model for molecular profiling data using NoSQL
Motivation: High-throughput molecular profiling has greatly improved patient stratification and mechanistic understanding of diseases. With the increasing amount of data used in translational medicine studies in recent years, there is a need to improve the performance of data warehouses in terms of data retrieval and statistical processing. Both relational and Key Value models have been used for managing molecular profiling data. Key Value models such as SeqWare have been shown to be particularly advantageous in terms of query processing speed for large datasets. However, more improvement can be achieved, particularly thro...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Wang, S., Mares, M. A., Guo, Y.-k. Tags: DATABASES AND ONTOLOGIES Source Type: research

Extensive complementarity between gene function prediction methods
Motivation: The number of sequenced genomes rises steadily but we still lack the knowledge about the biological roles of many genes. Automated function prediction (AFP) is thus a necessity. We hypothesized that AFP approaches that draw on distinct genome features may be useful for predicting different types of gene functions, motivating a systematic analysis of the benefits gained by obtaining and integrating such predictions. Results: Our pipeline amalgamates 5 133 543 genes from 2071 genomes in a single massive analysis that evaluates five established genomic AFP methodologies. While 1227 Gene Ontology (GO) ter...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Vidulin, V., Smuc, T., Supek, F. Tags: DATA AND TEXT MINING Source Type: research

Corpus domain effects on distributional semantic modeling of medical terms
Motivation: Automatically quantifying semantic similarity and relatedness between clinical terms is an important aspect of text mining from electronic health records, which are increasingly recognized as valuable sources of phenotypic information for clinical genomics and bioinformatics research. A key obstacle to development of semantic relatedness measures is the limited availability of large quantities of clinical text to researchers and developers outside of major medical centers. Text from general English and biomedical literature are freely available; however, their validity as a substitute for clinical domain to rep...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Pakhomov, S. V. S., Finley, G., McEwan, R., Wang, Y., Melton, G. B. Tags: DATA AND TEXT MINING Source Type: research

RNAcommender: genome-wide recommendation of RNA-protein interactions
We present RNAcommender, a recommender system capable of suggesting RNA targets to unexplored RNA binding proteins, by propagating the available interaction information taking into account the protein domain composition and the RNA predicted secondary structure. Our results show that RNAcommender is able to successfully suggest RNA interactors for RNA binding proteins using little or no interaction evidence. RNAcommender was tested on a large dataset of human RBP-RNA interactions, showing a good ranking performance (average AUC ROC of 0.75) and significant enrichment of correct recommendations for 75% of the tested RBPs. R...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Corrado, G., Tebaldi, T., Costa, F., Frasconi, P., Passerini, A. Tags: DATA AND TEXT MINING Source Type: research

DTMiner: identification of potential disease targets through biomedical literature mining
In this study, we propose a reliable and efficient framework that takes large biomedical literature repositories as inputs, identifies credible relationships between diseases and genes, and presents possible genes related to a given disease and possible diseases related to a given gene. The framework incorporates name entity recognition (NER), which identifies occurrences of genes and diseases in texts, association detection whereby we extract and evaluate features from gene–disease pairs, and ranking algorithms that estimate how closely the pairs are related. The F1-score of the NER phase is 0.87, which is higher th...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Xu, D., Zhang, M., Xie, Y., Wang, F., Chen, M., Zhu, K. Q., Wei, J. Tags: DATA AND TEXT MINING Source Type: research

Support vector machine model of developmental brain gene expression data for prioritization of Autism risk gene candidates
Motivation: Autism spectrum disorders (ASD) are a group of neurodevelopmental disorders with clinical heterogeneity and a substantial polygenic component. High-throughput methods for ASD risk gene identification produce numerous candidate genes that are time-consuming and expensive to validate. Prioritization methods can identify high-confidence candidates. Previous ASD gene prioritization methods have focused on a priori knowledge, which excludes genes with little functional annotation or no protein product such as long non-coding RNAs (lncRNAs). Results: We have developed a support vector machine (SVM) model, trained usi...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Cogill, S., Wang, L. Tags: DATA AND TEXT MINING Source Type: research

A novel copy number variants kernel association test with application to autism spectrum disorders studies
Motivation: Copy number variants (CNVs) have been implicated in a variety of neurodevelopmental disorders, including autism spectrum disorders, intellectual disability and schizophrenia. Recent advances in high-throughput genomic technologies have enabled rapid discovery of many genetic variants including CNVs. As a result, there is increasing interest in studying the role of CNVs in the etiology of many complex diseases. Despite the availability of an unprecedented wealth of CNV data, methods for testing association between CNVs and disease-related traits are still under-developed due to the low prevalence and complicated...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Zhan, X., Girirajan, S., Zhao, N., Wu, M. C., Ghosh, D. Tags: GENETICS AND POPULATION ANALYSIS Source Type: research