TarPmiR: a new approach for microRNA target site prediction
Motivation: The identification of microRNA (miRNA) target sites is fundamentally important for studying gene regulation. There are dozens of computational methods available for miRNA target site prediction. Despite their existence, we still cannot reliably identify miRNA target sites, partially due to our limited understanding of the characteristics of miRNA target sites. The recently published CLASH (crosslinking ligation and sequencing of hybrids) data provide an unprecedented opportunity to study the characteristics of miRNA target sites and improve miRNA target site prediction methods. Results: Applying four different ...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Ding, J., Li, X., Hu, H. Tags: SEQUENCE ANALYSIS Source Type: research

MetaFast: fast reference-free graph-based comparison of shotgun metagenomic data
Motivation: High-throughput metagenomic sequencing has revolutionized our view on the structure and metabolic potential of microbial communities. However, analysis of metagenomic composition is often complicated by the high complexity of the community and the lack of related reference genomic sequences. As a start point for comparative metagenomic analysis, the researchers require efficient means for assessing pairwise similarity of the metagenomes (beta-diversity). A number of approaches were used to address this task, however, most of them have inherent disadvantages that limit their scope of applicability. For instance,...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Ulyantsev, V. I., Kazakov, S. V., Dubinkina, V. B., Tyakht, A. V., Alexeev, D. G. Tags: SEQUENCE ANALYSIS Source Type: research

Top-down analysis of protein samples by de novo sequencing techniques
We describe a method for analysis of protein samples from top-down tandem mass spectrometry data, which capitalizes on de novo sequencing of fragments of the proteins present in the sample. Our algorithm takes as input a set of de novo amino acid strings derived from the given mass spectra using the recently proposed Twister approach, and combines them into aggregated strings endowed with offsets. The former typically constitute accurate sequence fragments of sufficiently well-represented proteins from the sample being analyzed, while the latter indicate their location in the protein sequence, and also bear information on ...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Vyatkina, K., Wu, S., Dekker, L. J. M., VanDuijn, M. M., Liu, X., Tolic, N., Luider, T. M., Pasa-Tolic, L., Pevzner, P. A. Tags: SEQUENCE ANALYSIS Source Type: research

Bayesian nonparametrics in protein remote homology search
Motivation: Wide application of modeling of three-dimensional protein structures in biomedical research motivates developing protein sequence alignment computer tools featuring high alignment accuracy and sensitivity to remotely homologous proteins. In this paper, we aim at improving the quality of alignments between sequence profiles, encoded multiple sequence alignments. Modeling profile contexts, fixed-length profile fragments, is engaged to achieve this goal. Results: We develop a hierarchical Dirichlet process mixture model to describe the distribution of profile contexts, which is able to capture dependencies between...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Margelevicius, M. Tags: SEQUENCE ANALYSIS Source Type: research

Group-combined P-values with applications to genetic association studies
Motivation: In large-scale genetic association studies with tens of hundreds of single nucleotide polymorphisms (SNPs) genotyped, the traditional statistical framework of logistic regression using maximum likelihood estimator (MLE) to infer the odds ratios of SNPs may not work appropriately. This is because a large number of odds ratios need to be estimated, and the MLEs may be not stable when some of the SNPs are in high linkage disequilibrium. Under this situation, the P-value combination procedures seem to provide good alternatives as they are constructed on the basis of single-marker analysis. Results: The commonly use...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Hu, X., Zhang, W., Zhang, S., Ma, S., Li, Q. Tags: GENOME ANALYSIS Source Type: research

Predicting regulatory variants with composite statistic
Motivation: Prediction and prioritization of human non-coding regulatory variants is critical for understanding the regulatory mechanisms of disease pathogenesis and promoting personalized medicine. Existing tools utilize functional genomics data and evolutionary information to evaluate the pathogenicity or regulatory functions of non-coding variants. However, different algorithms lead to inconsistent and even conflicting predictions. Combining multiple methods may increase accuracy in regulatory variant prediction. Results: Here, we compiled an integrative resource for predictions from eight different tools on functional ...
Source: Bioinformatics - September 10, 2016 Category: Bioinformatics Authors: Li, M. J., Pan, Z., Liu, Z., Wu, J., Wang, P., Zhu, Y., Xu, F., Xia, Z., Sham, P. C., Kocher, J.-P. A., Li, M., Liu, J. S., Wang, J. Tags: GENOME ANALYSIS Source Type: research

4DGenome: a comprehensive database of chromatin interactions
(Source: Bioinformatics)
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Teng, L., He, B., Wang, J., Tan, K. Tags: CORRIGENDUM Source Type: research

TOPDOM: database of conservatively located domains and motifs in proteins
Summary: The TOPDOM database—originally created as a collection of domains and motifs located consistently on the same side of the membranes in α-helical transmembrane proteins—has been updated and extended by taking into consideration consistently localized domains and motifs in globular proteins, too. By taking advantage of the recently developed CCTOP algorithm to determine the type of a protein and predict topology in case of transmembrane proteins, and by applying a thorough search for domains and motifs as well as utilizing the most up-to-date version of all source databases, we managed to reach a 6...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Varga, J., Dobson, L., Tusnady, G. E. Tags: DATABASES AND ONTOLOGIES Source Type: research

Tools4miRs - one place to gather all the tools for miRNA analysis
Summary: MiRNAs are short, non-coding molecules that negatively regulate gene expression and thereby play several important roles in living organisms. Dozens of computational methods for miRNA-related research have been developed, which greatly differ in various aspects. The substantial availability of difficult-to-compare approaches makes it challenging for the user to select a proper tool and prompts the need for a solution that will collect and categorize all the methods. Here, we present tools4miRs, the first platform that gathers currently more than 160 methods for broadly defined miRNA analysis. The collected tools a...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Lukasik, A., Wojcikowski, M., Zielenkiewicz, P. Tags: DATABASES AND ONTOLOGIES Source Type: research

ProbOnto: ontology and knowledge base of probability distributions
Motivation: Probability distributions play a central role in mathematical and statistical modelling. The encoding, annotation and exchange of such models could be greatly simplified by a resource providing a common reference for the definition of probability distributions. Although some resources exist, no suitably detailed and complex ontology exists nor any database allowing programmatic access. Results: ProbOnto, is an ontology-based knowledge base of probability distributions, featuring more than 80 uni- and multivariate distributions with their defining functions, characteristics, relationships and re-parameterization...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Swat, M. J., Grenon, P., Wimalaratne, S. Tags: DATABASES AND ONTOLOGIES Source Type: research

XLinkDB 2.0: integrated, large-scale structural analysis of protein crosslinking data
We present XLinkDB 2.0 which integrates tools for network analysis, Protein Databank queries, modeling of predicted protein structures and modeling of docked protein structures. The novel, integrated approach of XLinkDB 2.0 enables the holistic analysis of XL-MS protein interaction data without limitation to the cross-linker or analytical system used for the analysis. Availability and Implementation: XLinkDB 2.0 can be found here, including documentation and help: http://xlinkdb.gs.washington.edu/. Contact: jimbruce@uw.edu Supplementary information: Supplementary data are available at Bioinformatics online. (Source: Bioinformatics)
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Schweppe, D. K., Zheng, C., Chavez, J. D., Navare, A. T., Wu, X., Eng, J. K., Bruce, J. E. Tags: SYSTEMS BIOLOGY Source Type: research

DyNet: visualization and analysis of dynamic molecular interaction networks
Summary: The ability to experimentally determine molecular interactions on an almost proteome-wide scale under different conditions is enabling researchers to move from static to dynamic network analysis, uncovering new insights into how interaction networks are physically rewired in response to different stimuli and in disease. Dynamic interaction data presents a special challenge in network biology. Here, we present DyNet, a Cytoscape application that provides a range of functionalities for the visualization, real-time synchronization and analysis of large multi-state dynamic molecular interaction networks enabling users...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Goenawan, I. H., Bryan, K., Lynn, D. J. Tags: SYSTEMS BIOLOGY Source Type: research

PRESS: PRotEin S-Sulfenylation server
Motivation: Transient S-sulfenylation of cysteine thiols mediated by reactive oxygen species plays a critical role in pathology, physiology and cell signaling. Therefore, discovery of new S-sulfenylated sites in proteins is of great importance towards understanding how protein function is regulated upon redox conditions. Results: We developed PRESS (PRotEin S-Sulfenylation) web server, a server which can effectively predict the cysteine thiols of a protein that could undergo S-sulfenylation under redox conditions. We envisage that this server will boost and facilitate the discovery of new and currently unknown functions of...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Sakka, M., Tzortzis, G., Mantzaris, M. D., Bekas, N., Kellici, T. F., Likas, A., Galaris, D., Gerothanassis, I. P., Tzakos, A. G. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences
Summary: Simple Sequence Repeats (SSRs) are used to address a variety of research questions in a variety of fields (e.g. population genetics, phylogenetics, forensics, etc.), due to their high mutability within and between species. Here, we present an innovative algorithm, SA-SSR, based on suffix and longest common prefix arrays for efficiently detecting SSRs in large sets of sequences. Existing SSR detection applications are hampered by one or more limitations (i.e. speed, accuracy, ease-of-use, etc.). Our algorithm addresses these challenges while being the most comprehensive and correct SSR detection software available....
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Pickett, B. D., Karlinsey, S. M., Penrod, C. E., Cormier, M. J., Ebbert, M. T. W., Shiozawa, D. K., Whipple, C. J., Ridge, P. G. Tags: SEQUENCE ANALYSIS Source Type: research

SimLoRD: Simulation of Long Read Data
Motivation: Third generation sequencing methods provide longer reads than second generation methods and have distinct error characteristics. While there exist many read simulators for second generation data, there is a very limited choice for third generation data. Results: We analyzed public data from Pacific Biosciences (PacBio) SMRT sequencing, developed an error model and implemented it in a new read simulator called SimLoRD. It offers options to choose the read length distribution and to model error probabilities depending on the number of passes through the sequencer. The new error model makes SimLoRD the most realis...
Source: Bioinformatics - August 31, 2016 Category: Bioinformatics Authors: Stöcker, B. K., Köster, J., Rahmann, S. Tags: SEQUENCE ANALYSIS Source Type: research