SIFORM: shared informative factor models for integration of multi-platform bioinformatic data
Motivation: High-dimensional omic data derived from different technological platforms have been extensively used to facilitate comprehensive understanding of disease mechanisms and to determine personalized health treatments. Numerous studies have integrated multi-platform omic data; however, few have efficiently and simultaneously addressed the problems that arise from high dimensionality and complex correlations. Results: We propose a statistical framework of shared informative factor models that can jointly analyze multi-platform omic data and explore their associations with a disease phenotype. The common disease-assoc...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: An, X., Hu, J., Do, K.-A. Tags: GENE EXPRESSION Source Type: research

Performance of protein-structure predictions with the physics-based UNRES force field in CASP11
Summary: Participating as the Cornell-Gdansk group, we have used our physics-based coarse-grained UNited RESidue (UNRES) force field to predict protein structure in the 11th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP11). Our methodology involved extensive multiplexed replica exchange simulations of the target proteins with a recently improved UNRES force field to provide better reproductions of the local structures of polypeptide chains. All simulations were started from fully extended polypeptide chains, and no external information was included in the simulati...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Krupa, P., Mozolewska, M. A., Wisniewska, M., Yin, Y., He, Y., Sieradzan, A. K., Ganzynkowicz, R., Lipska, A. G., Karczynska, A., Slusarz, M., Slusarz, R., Giełdon, A., Czaplewski, C., Jagieła, D., Zaborowski, B., Scheraga, H. A., Liwo, A. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

Recognizing metal and acid radical ion-binding sites by integrating ab initio modeling with template-based transferals
Motivation: More than half of proteins require binding of metal and acid radical ions for their structure and function. Identification of the ion-binding locations is important for understanding the biological functions of proteins. Due to the small size and high versatility of the metal and acid radical ions, however, computational prediction of their binding sites remains difficult. Results: We proposed a new ligand-specific approach devoted to the binding site prediction of 13 metal ions (Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+) and acid radical ion ligands (CO32–, NO2–, SO42–, PO43–) t...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Hu, X., Dong, Q., Yang, J., Zhang, Y. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

SwiSpot: modeling riboswitches by spotting out switching sequences
Motivation: Riboswitches are cis-regulatory elements in mRNA, mostly found in Bacteria, which exhibit two main secondary structure conformations. Although one of them prevents the gene from being expressed, the other conformation allows its expression, and this switching process is typically driven by the presence of a specific ligand. Although there are a handful of known riboswitches, our knowledge in this field has been greatly limited due to our inability to identify their alternate structures from their sequences. Indeed, current methods are not able to predict the presence of the two functionally distinct conformatio...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Barsacchi, M., Novoa, E. M., Kellis, M., Bechini, A. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

Application of the MAFFT sequence alignment program to large data--reexamination of the usefulness of chained guide trees
Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally con...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Yamada, K. D., Tomii, K., Katoh, K. Tags: SEQUENCE ANALYSIS Source Type: research

An empirical Bayes method for genotyping and SNP detection using multi-sample next-generation sequencing data
Motivation: The development of next generation sequencing technology provides an efficient and powerful approach to rare variant detection. To identify genetic variations, the essential question is how to quantity the sequencing error rate in the data. Because of the advantage of easy implementation and the ability to integrate data from different sources, the empirical Bayes method is popularly employed to estimate the sequencing error rate for SNP detection. Results: We propose a novel statistical model to fit the observed non-reference allele frequency data, and utilize the empirical Bayes method for both genotyping and...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Huang, G., Wang, S., Wang, X., You, N. Tags: SEQUENCE ANALYSIS Source Type: research

Towards the knowledge-based design of universal influenza epitope ensemble vaccines
Motivation: Influenza A viral heterogeneity remains a significant threat due to unpredictable antigenic drift in seasonal influenza and antigenic shifts caused by the emergence of novel subtypes. Annual review of multivalent influenza vaccines targets strains of influenza A and B likely to be predominant in future influenza seasons. This does not induce broad, cross protective immunity against emergent subtypes. Better strategies are needed to prevent future pandemics. Cross-protection can be achieved by activating CD8+ and CD4+ T cells against highly conserved regions of the influenza genome. We combine available experime...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Sheikh, Q. M., Gatherer, D., Reche, P. A., Flower, D. R. Tags: SEQUENCE ANALYSIS Source Type: research

deBGA: read alignment with de Bruijn graph-based seed and extension
Motivation: As high-throughput sequencing (HTS) technology becomes ubiquitous and the volume of data continues to rise, HTS read alignment is becoming increasingly rate-limiting, which keeps pressing the development of novel read alignment approaches. Moreover, promising novel applications of HTS technology require aligning reads to multiple genomes instead of a single reference; however, it is still not viable for the state-of-the-art aligners to align large numbers of reads to multiple genomes. Results: We propose de Bruijn Graph-based Aligner (deBGA), an innovative graph-based seed-and-extension algorithm to align HTS r...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Liu, B., Guo, H., Brudno, M., Wang, Y. Tags: SEQUENCE ANALYSIS Source Type: research

LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads
Motivation: The deluge of current sequenced data has exceeded Moore’s Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. ...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: El-Metwally, S., Zakaria, M., Hamza, T. Tags: GENOME ANALYSIS Source Type: research

A statistical method for the detection of variants from next-generation resequencing of DNA pools
(Source: Bioinformatics)
Source: Bioinformatics - October 2, 2016 Category: Bioinformatics Authors: Bansal, V., Bansal, V., Libiger, O. Tags: CORRIGENDUM Source Type: research

CellProfiler Analyst: interactive data exploration, analysis and classification of large biological image sets
Summary: CellProfiler Analyst allows the exploration and visualization of image-based data, together with the classification of complex biological phenotypes, via an interactive user interface designed for biologists and data scientists. CellProfiler Analyst 2.0, completely rewritten in Python, builds on these features and adds enhanced supervised machine learning capabilities (Classifier), as well as visualization tools to overview an experiment (Plate Viewer and Image Gallery). Availability and Implementation: CellProfiler Analyst 2.0 is free and open source, available at http://www.cellprofiler.org and from GitHub (http...
Source: Bioinformatics - October 2, 2016 Category: Bioinformatics Authors: Dao, D., Fraser, A. N., Hung, J., Ljosa, V., Singh, S., Carpenter, A. E. Tags: BIOIMAGE INFORMATICS Source Type: research

PhenoScanner: a database of human genotype-phenotype associations
Summary: PhenoScanner is a curated database of publicly available results from large-scale genetic association studies. This tool aims to facilitate ‘phenome scans’, the cross-referencing of genetic variants with many phenotypes, to help aid understanding of disease pathways and biology. The database currently contains over 350 million association results and over 10 million unique genetic variants, mostly single nucleotide polymorphisms. It is accompanied by a web-based tool that queries the database for associations with user-specified variants, providing results according to the same effect and non-effect al...
Source: Bioinformatics - October 2, 2016 Category: Bioinformatics Authors: Staley, J. R., Blackshaw, J., Kamat, M. A., Ellis, S., Surendran, P., Sun, B. B., Paul, D. S., Freitag, D., Burgess, S., Danesh, J., Young, R., Butterworth, A. S. Tags: DATABASES AND ONTOLOGIES Source Type: research

SERAPHIM: studying environmental rasters and phylogenetically informed movements
Summary: SERAPHIM ("Studying Environmental Rasters and PHylogenetically Informed Movements") is a suite of computational methods developed to study phylogenetic reconstructions of spatial movement in an environmental context. SERAPHIM extracts the spatio-temporal information contained in estimated phylogenetic trees and uses this information to calculate summary statistics of spatial spread and to visualize dispersal history. Most importantly, SERAPHIM enables users to study the impact of customized environmental variables on the spread of the study organism. Specifically, given an environmental raster, SERAPHIM computes e...
Source: Bioinformatics - October 2, 2016 Category: Bioinformatics Authors: Dellicour, S., Rose, R., Faria, N. R., Lemey, P., Pybus, O. G. Tags: GENETICS AND POPULATION ANALYSIS Source Type: research

ABAEnrichment: an R package to test for gene set expression enrichment in the adult and developing human brain
We present ABAEnrichment, an R package that tests for expression enrichment in specific brain regions at different developmental stages using expression information gathered from multiple regions of the adult and developing human brain, together with ontologically organized structural information about the brain, both provided by the Allen Brain Atlas. We validate ABAEnrichment by successfully recovering the origin of gene sets identified in specific brain cell-types and developmental stages. Availability and Implementation: ABAEnrichment was implemented as an R package and is available under GPL (≥ 2) from the Biocondu...
Source: Bioinformatics - October 2, 2016 Category: Bioinformatics Authors: Grote, S., Prüfer, K., Kelso, J., Dannemann, M. Tags: GENE EXPRESSION Source Type: research

CONDOP: an R package for CONdition-Dependent Operon Predictions
Summary: The use of high-throughput RNA sequencing to predict dynamic operon structures in prokaryotic genomes has recently gained popularity in bioinformatics. We provide the R implementation of a novel method that uses transcriptomic features extracted from RNA-seq transcriptome profiles to develop ensemble classifiers for condition-dependent operon predictions. The CONDOP package provides a deeper insight into RNA-seq data analysis and allows scientists to highlight the operon organization in the context of transcriptional regulation with a few lines of code. Availability and Implementation: CONDOP is implemented in R a...
Source: Bioinformatics - October 2, 2016 Category: Bioinformatics Authors: Fortino, V., Tagliaferri, R., Greco, D. Tags: GENE EXPRESSION Source Type: research