MetaCycle: an integrated R package to evaluate periodicity in large scale data
Summary: Detecting periodicity in large scale data remains a challenge. While efforts have been made to identify best of breed algorithms, relatively little research has gone into integrating these methods in a generalizable method. Here, we present MetaCycle, an R package that incorporates ARSER, JTK_CYCLE and Lomb-Scargle to conveniently evaluate periodicity in time-series data. MetaCycle has two functions, meta2d and meta3d, designed to analyze two-dimensional and three-dimensional time-series datasets, respectively. Meta2d implements N-version programming concepts using a suite of algorithms and integrating their resul...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Wu, G., Anafi, R. C., Hughes, M. E., Kornacker, K., Hogenesch, J. B. Tags: SYSTEMS BIOLOGY Source Type: research

TwoPhaseInd: an R package for estimating gene-treatment interactions and discovering predictive markers in randomized clinical trials
Summary: In randomized clinical trials, identifying baseline genetic or genomic markers for predicting subgroup treatment effects is of rising interest. Outcome-dependent sampling is often employed for measuring markers. The R package TwoPhaseInd implements a number of efficient statistical methods we developed for estimating subgroup treatment effects and gene–treatment interactions, exploiting the gene–treatment independence dictated by randomization, including the case-only estimator, the maximum estimated likelihood estimator and the semiparametric maximum likelihood estimator for parameters in a logistic m...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Wang, X., Dai, J. Y. Tags: GENETICS AND POPULATION ANALYSIS Source Type: research

samExploreR: exploring reproducibility and robustness of RNA-seq results based on SAM files
Motivation: Data from RNA-seq experiments provide us with many new possibilities to gain insights into biological and disease mechanisms of cellular functioning. However, the reproducibility and robustness of RNA-seq data analysis results is often unclear. This is in part attributed to the two counter acting goals of (i) a cost efficient and (ii) an optimal experimental design leading to a compromise, e.g. in the sequencing depth of experiments. Results: We introduce an R package called samExploreR that allows the subsampling (m out of n bootstraping) of short-reads based on SAM files facilitating the investigation of sequ...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Stupnikov, A., Tripathi, S., de Matos Simoes, R., McArt, D., Salto-Tellez, M., Glazko, G., Dehmer, M., Emmert-Streib, F. Tags: GENE EXPRESSION Source Type: research

Accounting for pairwise distance restraints in FFT-based protein-protein docking
Summary: ClusPro is a heavily used protein–protein docking server based on the fast Fourier transform (FFT) correlation approach. While FFT enables global docking, accounting for pairwise distance restraints using penalty terms in the scoring function is computationally expensive. We use a different approach and directly select low energy solutions that also satisfy the given restraints. As expected, accounting for restraints generally improves the rank of near native predictions, while retaining or even improving the numerical efficiency of FFT based docking. Availability and Implementation: The software is freely a...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Xia, B., Vajda, S., Kozakov, D. Tags: STRUCTURAL BIOINFORMATICS Source Type: research

MetaPred2CS: a sequence-based meta-predictor for protein-protein interactions of prokaryotic two-component system proteins
Motivation: Two-component systems (TCS) are the main signalling pathways of prokaryotes, and control a wide range of biological phenomena. Their functioning depends on interactions between TCS proteins, the specificity of which is poorly understood. Results: The MetaPred2CS web-server interfaces a sequence-based meta-predictor specifically designed to predict pairing of the histidine kinase and response-regulator proteins forming TCSs. MetaPred2CS integrates six sequence-based methods using a support vector machine classifier and has been intensively tested under different benchmarking conditions: (i) species specific gene...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Kara, A., Vickers, M., Swain, M., Whitworth, D. E., Fernandez-Fuentes, N. Tags: SEQUENCE ANALYSIS Source Type: research

CRISPR-DO for genome-wide CRISPR design and optimization
In this study, we propose a web application for the Design and Optimization (CRISPR-DO) of guide sequences that target both coding and non-coding regions in spCas9 CRISPR system across human, mouse, zebrafish, fly and worm genomes. CRISPR-DO uses a computational sequence model to predict sgRNA efficiency, and employs a specificity scoring function to evaluate the potential of off-target effect. It also provides information on functional conservation of target sequences, as well as the overlaps with exons, putative regulatory sequences and single-nucleotide polymorphisms (SNPs). The web application has a user-friendly genom...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Ma, J., Köster, J., Qin, Q., Hu, S., Li, W., Chen, C., Cao, Q., Wang, J., Mei, S., Liu, Q., Xu, H., Liu, X. S. Tags: GENOME ANALYSIS Source Type: research

w4CSeq: software and web application to analyze 4C-seq data
Summary: Circularized Chromosome Conformation Capture followed by deep sequencing (4C-Seq) is a powerful technique to identify genome-wide partners interacting with a pre-specified genomic locus. Here, we present a computational and statistical approach to analyze 4C-Seq data generated from both enzyme digestion and sonication fragmentation-based methods. We implemented a command line software tool and a web interface called w4CSeq, which takes in the raw 4C sequencing data (FASTQ files) as input, performs automated statistical analysis and presents results in a user-friendly manner. Besides providing users with the list o...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Cai, M., Gao, F., Lu, W., Wang, K. Tags: GENOME ANALYSIS Source Type: research

The SMAL web server: global multiple network alignment from pairwise alignments
Motivation: Alignments of protein-protein interaction networks (PPIN) can be used to predict protein function, study conserved aspects of the interactome, and to establish evolutionary correspondences. Within this problem context, determining multiple network alignments (MNA) is a significant challenge that involves high computational complexity. A limited number of public MNA implementations are available currently and the majority of the pairwise network alignment (PNA) algorithms do not have MNA counterparts. Furthermore, current MNA algorithms do not allow choosing a specific PPIN relative to which an MNA could be cons...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Dohrmann, J., Singh, R. Tags: GENOME ANALYSIS Source Type: research

ConsPred: a rule-based (re-)annotation framework for prokaryotic genomes
Motivation: The rapidly growing number of available prokaryotic genome sequences requires fully automated and high-quality software solutions for their initial and re-annotation. Here we present ConsPred, a prokaryotic genome annotation framework that performs intrinsic gene predictions, homology searches, predictions of non-coding genes as well as CRISPR repeats and integrates all evidence into a consensus annotation. ConsPred achieves comprehensive, high-quality annotations based on rules and priorities, similar to decision-making in manual curation and avoids conflicting predictions. Parameters controlling the annotatio...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Weinmaier, T., Platzer, A., Frank, J., Hellinger, H.-J., Tischler, P., Rattei, T. Tags: GENOME ANALYSIS Source Type: research

ChAsE: chromatin analysis and exploration tool
We present ChAsE, a cross-platform desktop application developed for interactive visualization, exploration and clustering of epigenomic data such as ChIP-seq experiments. ChAsE is designed and developed in close collaboration with several groups of biologists and bioinformaticians with a focus on usability and interactivity. Data can be analyzed through k-means clustering, specifying presence or absence of signal in epigenetic data and performing set operations between clusters. Results can be explored in an interactive heat map and profile plot interface and exported for downstream analysis or as high quality figures sui...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Younesy, H., Nielsen, C. B., Lorincz, M. C., Jones, S. J. M., Karimi, M. M., Möller, T. Tags: GENOME ANALYSIS Source Type: research

Icarus: visualizer for de novo assembly evaluation
Summary: Data visualization plays an increasingly important role in NGS data analysis. With advances in both sequencing and computational technologies, it has become a new bottleneck in genomics studies. Indeed, evaluation of de novo genome assemblies is one of the areas that can benefit from the visualization. However, even though multiple quality assessment methods are now available, existing visualization tools are hardly suitable for this purpose. Here, we present Icarus—a novel genome visualizer for accurate assessment and analysis of genomic draft assemblies, which is based on the tool QUAST. Icarus can be used...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Mikheenko, A., Valin, G., Prjibelski, A., Saveliev, V., Gurevich, A. Tags: GENOME ANALYSIS Source Type: research

Unbiased classification of spatial strategies in the Barnes maze
Motivation: Spatial learning is one of the most widely studied cognitive domains in neuroscience. The Morris water maze and the Barnes maze are the most commonly used techniques to assess spatial learning and memory in rodents. Despite the fact that these tasks are well-validated paradigms for testing spatial learning abilities, manual categorization of performance into behavioral strategies is subject to individual interpretation, and thus to bias. We have previously described an unbiased machine-learning algorithm to classify spatial strategies in the Morris water maze. Results: Here, we offer a support vector machine&md...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Illouz, T., Madar, R., Clague, C., Griffioen, K. J., Louzoun, Y., Okun, E. Tags: DATA AND TEXT MINING Source Type: research

A subpopulation model to analyze heterogeneous cell differentiation dynamics
We present statistical methodology that can be used to quantify the effect of heterogeneity and to infer the subpopulation specific molecular interactions. After a proof of principle study with simulated data, we apply our methodology to analyze the differentiation of human Th17 cells using time-course RNA sequencing data. We construct putative molecular networks driving the T cell activation and Th17 differentiation and allow the cell populations to be split into two subpopulations in the case of heterogeneous samples. Our analysis shows that the heterogeneity indeed has a statistically significant effect on observed dyna...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Chan, Y. H., Intosalmi, J., Rautio, S., Lähdesmäki, H. Tags: SYSTEMS BIOLOGY Source Type: research

New quality measure for SNP array based CNV detection
Motivation: Only a few large systematic studies have evaluated the impact of copy number variants (CNVs) on common diseases. Several million individuals have been genotyped on single nucleotide variation arrays, which could be used for genome-wide CNVs association studies. However, CNV calls remain prone to false positives and only empirical filtering strategies exist in the literature. To overcome this issue, we defined a new quality score (QS) estimating the probability of a CNV called by PennCNV to be confirmed by other software. Results: Out-of-sample comparison showed that the correlation between the consensus CNV sta...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Mace, A., Tuke, M. A., Beckmann, J. S., Lin, L., Jacquemont, S., Weedon, M. N., Reymond, A., Kutalik, Z. Tags: GENETICS AND POPULATION ANALYSIS Source Type: research

cisASE: a likelihood-based method for detecting putative cis-regulated allele-specific expression in RNA sequencing data
Motivation: Allele-specific expression (ASE) is a useful way to identify cis-acting regulatory variation, which provides opportunities to develop new therapeutic strategies that activate beneficial alleles or silence mutated alleles at specific loci. However, multiple problems hinder the identification of ASE in next-generation sequencing (NGS) data. Results: We developed cisASE, a likelihood-based method for detecting ASE on single nucleotide variant (SNV), exon and gene levels from sequencing data without requiring phasing or parental information. cisASE uses matched DNA-seq data to control technical bias and copy number...
Source: Bioinformatics - October 24, 2016 Category: Bioinformatics Authors: Liu, Z., Gui, T., Wang, Z., Li, H., Fu, Y., Dong, X., Li, Y. Tags: GENE EXPRESSION Source Type: research