BioPartsDB: a synthetic biology workflow web-application for education and research
Summary: Synthetic biology has become a widely used technology, and expanding applications in research, education and industry require progress tracking for team-based DNA synthesis projects. Although some vendors are beginning to supply multi-kilobase sequence-verified constructs, synthesis workflows starting with short oligos remain important for cost savings and pedagogical benefit. We developed BioPartsDB as an open source, extendable workflow management system for synthetic biology projects with entry points for oligos and larger DNA constructs and ending with sequence-verified clones. Availability and Implementation:...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Stracquadanio, G., Yang, K., Boeke, J. D., Bader, J. S. Tags: SYSTEMS BIOLOGY Source Type: research

PReFerSim: fast simulation of demography and selection under the Poisson Random Field model
Summary: The Poisson Random Field (PRF) model has become an important tool in population genetics to study weakly deleterious genetic variation under complicated demographic scenarios. Currently, there are no freely available software applications that allow simulation of genetic variation data under this model. Here we present PReFerSim, an ANSI C program that performs forward simulations under the PRF model. PReFerSim models changes in population size, arbitrary amounts of inbreeding, dominance and distributions of selective effects. Users can track summaries of genetic variation over time and output trajectories of sele...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Ortega-Del Vecchyo, D., Marsden, C. D., Lohmueller, K. E. Tags: GENETICS AND POPULATION ANALYSIS Source Type: research

LAMPLINK: detection of statistically significant SNP combinations from GWAS data
Summary: One of the major issues in genome-wide association studies is to solve the missing heritability problem. While considering epistatic interactions among multiple SNPs may contribute to solving this problem, existing software cannot detect statistically significant high-order interactions. We propose software named LAMPLINK, which employs a cutting-edge method to enumerate statistically significant SNP combinations from genome-wide case–control data. LAMPLINK is implemented as a set of additional functions to PLINK, and hence existing procedures with PLINK can be applicable. Applied to the 1000 Genomes Project...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Terada, A., Yamada, R., Tsuda, K., Sese, J. Tags: GENETICS AND POPULATION ANALYSIS Source Type: research

Online interactive analysis of protein structure ensembles with Bio3D-web
Summary: Bio3D-web is an online application for analyzing the sequence, structure and conformational heterogeneity of protein families. Major functionality is provided for identifying protein structure sets for analysis, their alignment and refined structure superposition, sequence and structure conservation analysis, mapping and clustering of conformations and the quantitative comparison of their predicted structural dynamics. Availability: Bio3D-web is based on the Bio3D and Shiny R packages. All major browsers are supported and full source code is available under a GPL2 license from http://thegrantlab.org/bio3d-web. Con...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Skjaerven, L., Jariwala, S., Yao, X.-Q., Grant, B. J. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

WALT: fast and accurate read mapping for bisulfite sequencing
We present the WALT tool for mapping WGBS reads. WALT uses a strategy of hashing periodic spaced seeds, which leads to significant speedup compared with the most efficient methods currently available. Although many existing WGBS mappers slow down with read length, WALT improves in speed. Importantly, these speed gains do not sacrifice accuracy. Availability and implementation WALT is available under the GPL v3 license, and downloadable from https://github.com/smithlabcode/walt. Contact andrewds@usc.edu or tingchen@usc.edu Supplementary information Supplementary data are available at Bioinformatics online. (Source: Bioinformatics)
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Chen, H., Smith, A. D., Chen, T. Tags: SEQUENCE ANALYSIS Source Type: research

Motif comparison based on similarity of binding affinity profiles
Summary: Measuring motif similarity is essential for identifying functionally related transcription factors (TFs) and RNA-binding proteins, and for annotating de novo motifs. Here, we describe Motif Similarity Based on Affinity of Targets (MoSBAT), an approach for measuring the similarity of motifs by computing their affinity profiles across a large number of random sequences. We show that MoSBAT successfully associates de novo ChIP-seq motifs with their respective TFs, accurately identifies motifs that are obtained from the same TF in different in vitro assays, and quantitatively reflects the similarity of in vitro bindin...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Lambert, S. A., Albu, M., Hughes, T. R., Najafabadi, H. S. Tags: SEQUENCE ANALYSIS Source Type: research

MSAViewer: interactive JavaScript visualization of multiple sequence alignments
Summary: The MSAViewer is a quick and easy visualization and analysis JavaScript component for Multiple Sequence Alignment data of any size. Core features include interactive navigation through the alignment, application of popular color schemes, sorting, selecting and filtering. The MSAViewer is ‘web ready’: written entirely in JavaScript, compatible with modern web browsers and does not require any specialized software. The MSAViewer is part of the BioJS collection of components. Availability and Implementation: The MSAViewer is released as open source software under the Boost Software License 1.0. Documentat...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Yachdav, G., Wilzbach, S., Rauscher, B., Sheridan, R., Sillitoe, I., Procter, J., Lewis, S. E., Rost, B., Goldberg, T. Tags: SEQUENCE ANALYSIS Source Type: research

sBWT: memory efficient implementation of the hardware-acceleration-friendly Schindler transform for the fast biological sequence mapping
Motivation: The Full-text index in Minute space (FM-index) derived from the Burrows–Wheeler transform (BWT) is broadly used for fast string matching in large genomes or a huge set of sequencing reads. Several graphic processing unit (GPU) accelerated aligners based on the FM-index have been proposed recently; however, the construction of the index is still handled by central processing unit (CPU), only parallelized in data level (e.g. by performing blockwise suffix sorting in GPU), or not scalable for large genomes. Results: To fulfill the need for a more practical, hardware-parallelizable indexing and matching appro...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Chang, C.-H., Chou, M.-T., Wu, Y.-C., Hong, T.-W., Li, Y.-L., Yang, C.-H., Hung, J.-H. Tags: SEQUENCE ANALYSIS Source Type: research

TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization
We present TopPIC, a tool that efficiently identifies and characterizes complex proteoforms with unknown primary structure alterations, such as amino acid mutations and post-translational modifications, by searching top-down tandem mass spectra against a protein database. Availability and Implementation: http://proteomics.informatics.iupui.edu/software/toppic/ Contact: xwliu@iupui.edu Supplementary information: Supplementary data are available at Bioinformatics online. (Source: Bioinformatics)
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Kou, Q., Xun, L., Liu, X. Tags: SEQUENCE ANALYSIS Source Type: research

ntHash: recursive nucleotide hashing
We present ntHash, a hashing algorithm tuned for processing DNA/RNA sequences. It performs the best when calculating hash values for adjacent k-mers in an input sequence, operating an order of magnitude faster than the best performing alternatives in typical use cases. Availability and implementation: ntHash is available online at http://www.bcgsc.ca/platform/bioinfo/software/nthash and is free for academic use. Contacts: hmohamadi@bcgsc.ca or ibirol@bcgsc.ca Supplementary information: Supplementary data are available at Bioinformatics online. (Source: Bioinformatics)
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Mohamadi, H., Chu, J., Vandervalk, B. P., Birol, I. Tags: SEQUENCE ANALYSIS Source Type: research

NET-GE: a web-server for NETwork-based human gene enrichment
Motivation: Gene enrichment is a requisite for the interpretation of biological complexity related to specific molecular pathways and biological processes. Furthermore, when interpreting NGS data and human variations, including those related to pathologies, gene enrichment allows the inclusion of other genes that in the human interactome space may also play important key roles in the emergency of the phenotype. Here, we describe NET-GE, a web server for associating biological processes and pathways to sets of human proteins involved in the same phenotype Results: NET-GE is based on protein–protein interaction network...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Bovo, S., Di Lena, P., Martelli, P. L., Fariselli, P., Casadio, R. Tags: GENOME ANALYSIS Source Type: research

Joint sparse canonical correlation analysis for detecting differential imaging genetics modules
Motivation: Imaging genetics combines brain imaging and genetic information to identify the relationships between genetic variants and brain activities. When the data samples belong to different classes (e.g. disease status), the relationships may exhibit class-specific patterns that can be used to facilitate the understanding of a disease. Conventional approaches often perform separate analysis on each class and report the differences, but ignore important shared patterns. Results: In this paper, we develop a multivariate method to analyze the differential dependency across multiple classes. We propose a joint sparse cano...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Fang, J., Lin, D., Schulz, S. C., Xu, Z., Calhoun, V. D., Wang, Y.-P. Tags: BIOIMAGE INFORMATICS Source Type: research

SSBD: a database of quantitative data of spatiotemporal dynamics of biological phenomena
Motivation: Rapid advances in live-cell imaging analysis and mathematical modeling have produced a large amount of quantitative data on spatiotemporal dynamics of biological objects ranging from molecules to organisms. There is now a crucial need to bring these large amounts of quantitative biological dynamics data together centrally in a coherent and systematic manner. This will facilitate the reuse of this data for further analysis. Results: We have developed the Systems Science of Biological Dynamics database (SSBD) to store and share quantitative biological dynamics data. SSBD currently provides 311 sets of quantitativ...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Tohsato, Y., Ho, K. H. L., Kyoda, K., Onami, S. Tags: DATABASES AND ONTOLOGIES Source Type: research

MetaCoMET: a web platform for discovery and visualization of the core microbiome
We present a web platform named MetaCoMET that enables the discovery and visualization of the core microbiome and provides a comparison of the relative abundance and diversity patterns between subsets of samples within a microbiome dataset. MetaCoMET provides an efficient and interactive graphical interface for analyzing each subset defined by the union or disjunction of groups within the Venn diagram, and includes a graphical taxonomy summary, alpha diversity metrics, Principal Coordinate analysis, abundance-based heatmaps, and a chart indicating the geographic distribution of each sample. Availability and Implementation:...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Wang, Y., Xu, L., Gu, Y. Q., Coleman-Derr, D. Tags: DATA AND TEXT MINING Source Type: research

An efficient method to estimate the optimum regularization parameter in RLDA
Motivation: The biomarker discovery process in high-throughput genomic profiles has presented the statistical learning community with a challenging problem, namely learning when the number of variables is comparable or exceeding the sample size. In these settings, many classical techniques including linear discriminant analysis (LDA) falter. Poor performance of LDA is attributed to the ill-conditioned nature of sample covariance matrix when the dimension and sample size are comparable. To alleviate this problem, regularized LDA (RLDA) has been classically proposed in which the sample covariance matrix is replaced by its ri...
Source: Bioinformatics - November 28, 2016 Category: Bioinformatics Authors: Bakir, D., James, A. P., Zollanvari, A. Tags: DATA AND TEXT MINING Source Type: research