SDEAP: a splice graph based differential transcript expression analysis tool for population data
Motivation: Differential transcript expression (DTE) analysis without predefined conditions is critical to biological studies. For example, it can be used to discover biomarkers to classify cancer samples into previously unknown subtypes such that better diagnosis and therapy methods can be developed for the subtypes. Although several DTE tools for population data, i.e. data without known biological conditions, have been published, these tools either assume binary conditions in the input population or require the number of conditions as a part of the input. Fixing the number of conditions to binary is unrealistic and may d...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Yang, E.-W., Jiang, T. Tags: GENE EXPRESSION Source Type: research

ChemTreeMap: an interactive map of biochemical similarity in molecular datasets
Motivation: What if you could explain complex chemistry in a simple tree and share that data online with your collaborators? Computational biology often incorporates diverse chemical data to probe a biological question, but the existing tools for chemical data are ill-suited for the very large datasets inherent to bioinformatics. Furthermore, existing visualization methods often require an expert chemist to interpret the patterns. Biologists need an interactive tool for visualizing chemical information in an intuitive, accessible way that facilitates its integration into today’s team-based biological research. Result...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Lu, J., Carlson, H. A. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

Metrics for rapid quality control in RNA structure probing experiments
Motivation: The diverse functionalities of RNA can be attributed to its capacity to form complex and varied structures. The recent proliferation of new structure probing techniques coupled with high-throughput sequencing has helped RNA studies expand in both scope and depth. Despite differences in techniques, most experiments face similar challenges in reproducibility due to the stochastic nature of chemical probing and sequencing. As these protocols expand to transcriptome-wide studies, quality control becomes a more daunting task. General and efficient methodologies are needed to quantify variability and quality in the w...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Choudhary, K., Shih, N. P., Deng, F., Ledda, M., Li, B., Aviran, S. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

A profile-based method for identifying functional divergence of orthologous genes in bacterial genomes
We present a hidden Markov model based approach we call delta-bitscore (DBS) for identifying orthologous proteins that have diverged at the amino acid sequence level in a way that is likely to impact biological function. We benchmark this approach with several widely used datasets and apply it to a proof-of-concept study of orthologous proteomes in an investigation of host adaptation in Salmonella enterica. We highlight the value of the method in identifying functional divergence of genes, and suggest that this tool may be a better approach than the commonly used dN/dS metric for identifying functionally significant geneti...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Wheeler, N. E., Barquist, L., Kingsley, R. A., Gardner, P. P. Tags: SEQUENCE ANALYSIS Source Type: research

SRinversion: a tool for detecting short inversions by splitting and re-aligning poorly mapped and unmapped sequencing reads
Motivation: Rapid development in sequencing technologies has dramatically improved our ability to detect genetic variants in human genome. However, current methods have variable sensitivities in detecting different types of genetic variants. One type of such genetic variants that is especially hard to detect is inversions. Analysis of public databases showed that few short inversions have been reported so far. Unlike reads that contain small insertions or deletions, which will be considered through gap alignment, reads carrying short inversions often have poor mapping quality or are unmapped, thus are often not further con...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Chen, R., Lau, Y. L., Zhang, Y., Yang, W. Tags: SEQUENCE ANALYSIS Source Type: research

Multivariate Welch t-test on distances
Motivation: Permutational non-Euclidean analysis of variance, PERMANOVA, is routinely used in exploratory analysis of multivariate datasets to draw conclusions about the significance of patterns visualized through dimension reduction. This method recognizes that pairwise distance matrix between observations is sufficient to compute within and between group sums of squares necessary to form the (pseudo) F statistic. Moreover, not only Euclidean, but arbitrary distances can be used. This method, however, suffers from loss of power and type I error inflation in the presence of heteroscedasticity and sample size imbalances. Re...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Alekseyenko, A. V. Tags: GENOME ANALYSIS Source Type: research

EnhancerAtlas: a resource for enhancer annotation and analysis in 105 human cell/tissue types
Motivation: Multiple high-throughput approaches have recently been developed and allowed the discovery of enhancers on a genome scale in a single experiment. However, the datasets generated from these approaches are not fully utilized by the research community due to technical challenges such as lack of consensus enhancer annotation and integrative analytic tools. Results: We developed an interactive database, EnhancerAtlas, which contains an atlas of 2,534,123 enhancers for 105 cell/tissue types. A consensus enhancer annotation was obtained for each cell by summation of independent experimental datasets with the relative ...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Gao, T., He, B., Liu, S., Zhu, H., Tan, K., Qian, J. Tags: GENOME ANALYSIS Source Type: research

LCA*: an entropy-based measure for taxonomic assignment within assembled metagenomes
Motivation: A perennial problem in the analysis of environmental sequence information is the assignment of reads or assembled sequences, e.g. contigs or scaffolds, to discrete taxonomic bins. In the absence of reference genomes for most environmental microorganisms, the use of intrinsic nucleotide patterns and phylogenetic anchors can improve assembly-dependent binning needed for more accurate taxonomic and functional annotation in communities of microorganisms, and assist in identifying mobile genetic elements or lateral gene transfer events. Results: Here, we present a statistic called LCA* inspired by Information and Vo...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Hanson, N. W., Konwar, K. M., Hallam, S. J. Tags: GENOME ANALYSIS Source Type: research

Cyclo-lib: a database of computational molecular dynamics simulations of cyclodextrins
Motivation: Cyclodextrins (CDs) are amongst the most versatile/multi-functional molecules used in molecular research and chemical applications. They are natural cyclic oligosaccharides typically employed to encapsulate hydrophobic groups in their central cavity. This allows solubilizing, protecting or reducing the toxicity of a large variety of different molecules including drugs, dyes and surfactant agents. In spite of their great potential, atomic level information of these molecules, which is key for their function, is really scarce. Computational Molecular Dynamics (MD) simulations have the potential to efficiently fil...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Mixcoha, E., Rosende, R., Garcia-Fandino, R., Pineiro, A. Tags: DATABASES AND ONTOLOGIES Source Type: research

MIMEAnTo: profiling functional RNA in mutational interference mapping experiments
Summary: The mutational interference mapping experiment (MIME) is a powerful method that, coupled to a bioinformatics analysis pipeline, allows the identification of domains and structures in RNA that are important for its function. In MIME, target RNAs are randomly mutated, selected by function, physically separated and sequenced using next-generation sequencing (NGS). Quantitative effects of each mutation at each position in the RNA can be recovered with statistical certainty using the herein developed user-friendly, cross-platform software MIMEAnTo (MIME Analysis Tool). Availability and implementation: MIMEAnTo is imple...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Smith, M. R., Smyth, R. P., Marquet, R., von Kleist, M. Tags: SYSTEMS BIOLOGY Source Type: research

BioNetGen 2.2: advances in rule-based modeling
Summary: BioNetGen is an open-source software package for rule-based modeling of complex biochemical systems. Version 2.2 of the software introduces numerous new features for both model specification and simulation. Here, we report on these additions, discussing how they facilitate the construction, simulation and analysis of larger and more complex models than previously possible. Availability and Implementation: Stable BioNetGen releases (Linux, Mac OS/X and Windows), with documentation, are available at http://bionetgen.org. Source code is available at http://github.com/RuleWorld/bionetgen. Contact: bionetgen.help@gmail...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Harris, L. A., Hogg, J. S., Tapia, J.-J., Sekar, J. A. P., Gupta, S., Korsunsky, I., Arora, A., Barua, D., Sheehan, R. P., Faeder, J. R. Tags: SYSTEMS BIOLOGY Source Type: research

PyPanda: a Python package for gene regulatory network reconstruction
Summary: PANDA (Passing Attributes between Networks for Data Assimilation) is a gene regulatory network inference method that uses message-passing to integrate multiple sources of ‘omics data. PANDA was originally coded in C ++. In this application note we describe PyPanda, the Python version of PANDA. PyPanda runs considerably faster than the C ++ version and includes additional features for network analysis. Availability and implementation: The open source PyPanda Python package is freely available at http://github.com/davidvi/pypanda. Contact: mkuijjer@jimmy.harvard.edu or d.g.p.van_ijzendoorn@lumc....
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: van IJzendoorn, D. G. P., Glass, K., Quackenbush, J., Kuijjer, M. L. Tags: SYSTEMS BIOLOGY Source Type: research

SYNBADm: a tool for optimization-based automated design of synthetic gene circuits
Motivation: The design of de novo circuits with predefined performance specifications is a challenging problem in Synthetic Biology. Computational models and tools have proved to be crucial for a successful wet lab implementation. Natural gene circuits are complex, subject to evolutionary tradeoffs and playing multiple roles. However, most synthetic designs implemented to date are simple and perform a single task. As the field progresses, advanced computational tools are needed in order to handle greater levels of circuit complexity in a more flexible way and considering multiple design criteria. Results: This works presen...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Otero-Muras, I., Henriques, D., Banga, J. R. Tags: SYSTEMS BIOLOGY Source Type: research

AMIGO2, a toolbox for dynamic modeling, optimization and control in systems biology
Motivation: Many problems of interest in dynamic modeling and control of biological systems can be posed as non-linear optimization problems subject to algebraic and dynamic constraints. In the context of modeling, this is the case of, e.g. parameter estimation, optimal experimental design and dynamic flux balance analysis. In the context of control, model-based metabolic engineering or drug dose optimization problems can be formulated as (multi-objective) optimal control problems. Finding a solution to those problems is a very challenging task which requires advanced numerical methods. Results: This work presents the AMIG...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Balsa-Canto, E., Henriques, D., Gabor, A., Banga, J. R. Tags: SYSTEMS BIOLOGY Source Type: research

Heat*seq: an interactive web tool for high-throughput sequencing experiment comparison with public data
Summary: Better protocols and decreasing costs have made high-throughput sequencing experiments now accessible even to small experimental laboratories. However, comparing one or few experiments generated by an individual lab to the vast amount of relevant data freely available in the public domain might be limited due to lack of bioinformatics expertise. Though several tools, including genome browsers, allow such comparison at a single gene level, they do not provide a genome-wide view. We developed Heat*seq, a web-tool that allows genome scale comparison of high throughput experiments chromatin immuno-precipitation follow...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Devailly, G., Mantsoki, A., Joshi, A. Tags: SYSTEMS BIOLOGY Source Type: research