GORpipe: a query tool for working with sequence data based on a Genomic Ordered Relational (GOR) architecture
Motivation: Our aim was to create a general-purpose relational data format and analysis tools to provide an efficient and coherent framework for working with large volumes of DNA sequence data. Results: For this purpose we developed the GORpipe software system. It is based on a genomic ordered architecture and uses a declarative query language that combines features from SQL and shell pipe syntax in a novel manner. The system can for instance be used to annotate sequence variants, find genomic spatial overlap between various types of genomic features, filter and aggregate them in various ways. Availability and Implementati...
Source: Bioinformatics - October 2, 2016 Category: Bioinformatics Authors: Guthbjartsson, H., Georgsson, G. F., Guthjonsson, S. A., Valdimarsson, R. t., Sigurthsson, J. H., Stefansson, S. K., Masson, G., Magnusson, G., Palmason, V., Stefansson, K. Tags: SEQUENCE ANALYSIS Source Type: research

Vecuum: identification and filtration of false somatic variants caused by recombinant vector contamination
Motivation: Advances in sequencing technologies have remarkably lowered the detection limit of somatic variants to a low frequency. However, calling mutations at this range is still confounded by many factors including environmental contamination. Vector contamination is a continuously occurring issue and is especially problematic since vector inserts are hardly distinguishable from the sample sequences. Such inserts, which may harbor polymorphisms and engineered functional mutations, can result in calling false variants at corresponding sites. Numerous vector-screening methods have been developed, but none could handle co...
Source: Bioinformatics - October 2, 2016 Category: Bioinformatics Authors: Kim, J., Maeng, J. H., Lim, J. S., Son, H., Lee, J., Lee, J. H., Kim, S. Tags: GENOME ANALYSIS Source Type: research

Inheritance-mode specific pathogenicity prioritization (ISPP) for human protein coding genes
Conclusion: The inheritance-mode specific pathogenicity prioritization (ISPP) outperformed other well-known methods including Haploinsufficiency, Recessive, Network centrality, Genic Intolerance, Gene Damage Index and Gene Constraint scores. This systematic study suggests that genes manifesting disease inheritance modes tend to have unique characteristics. Availability and implementation: ISPP is included in KGGSeq v1.0 (http://grass.cgs.hku.hk/limx/kggseq/), and source code is available from (https://github.com/jacobhsu35/ISPP.git). Contact: mxli@hku.hk Supplementary information: Supplementary data are available at Bioinf...
Source: Bioinformatics - October 2, 2016 Category: Bioinformatics Authors: Hsu, J. S., Kwan, J. S. H., Pan, Z., Garcia-Barcelo, M.-M., Sham, P. C., Li, M. Tags: GENOME ANALYSIS Source Type: research

Genome puzzle master (GPM): an integrated pipeline for building and editing pseudomolecules from fragmented sequences
Motivation: Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool—Genome Puzzle Master (GPM)—that enables the integration of additional genomic signposts to edit and build ‘new-g...
Source: Bioinformatics - October 2, 2016 Category: Bioinformatics Authors: Zhang, J., Kudrna, D., Mu, T., Li, W., Copetti, D., Yu, Y., Goicoechea, J. L., Lei, Y., Wing, R. A. Tags: GENOME ANALYSIS Source Type: research

Structural distinctions of fast and slow bacterial luciferases revealed by phylogenetic analysis
Motivation: Bacterial luciferases are heterodimeric enzymes that catalyze a chemical reaction, so called bioluminescence, which causes light emission in bacteria. Bioluminescence is vastly used as a reporter system in research tools and commercial developments. However, the details of the mechanisms that stabilize and transform the reaction intermediates as well as differences in the enzymatic kinetics amongst different bacterial luciferases remain to be elucidated. Results: Amino acid sequences alignments for 21 bacterial luciferases (both α- and β-subunits) were analyzed. For α-subunit, containing the en...
Source: Bioinformatics - October 2, 2016 Category: Bioinformatics Authors: Deeva, A. A., Temlyakova, E. A., Sorokin, A. A., Nemtseva, E. V., Kratasyuk, V. A. Tags: SEQUENCE ANALYSIS Source Type: research

What time is it? Deep learning approaches for circadian rhythms
(Source: Bioinformatics)
Source: Bioinformatics - September 26, 2016 Category: Bioinformatics Authors: Agostinelli, F., Ceglia, N., Shahbaba, B., Sassone-Corsi, P., Baldi, P. Tags: CORRIGENDUM Source Type: research

bdvis: visualizing biodiversity data in R
Summary: Biodiversity studies are relying increasingly on primary biodiversity records (PBRs) for modelling and analysis. Because biodiversity data are frequently ‘harvested’—i.e. not collected by the researcher for that particular study, but obtained from data aggregators such as the Global Biodiversity Information Facility—researchers need to be aware of strengths and weaknesses of their data before they venture into further analysis. R is becoming a lingua franca of data exploration and analysis. Here, we describe an R package, bdvis, which facilitates efforts to understand the gaps and strengths...
Source: Bioinformatics - September 26, 2016 Category: Bioinformatics Authors: Barve, V., Otegui, J. Tags: DATABASES AND ONTOLOGIES Source Type: research

MultiQC: summarize analysis results for multiple tools and samples in a single report
We present MultiQC, a tool to create a single report visualising output from multiple tools across many samples, enabling global trends and biases to be quickly identified. MultiQC can plot data from many common bioinformatics tools and is built to allow easy extension and customization. Availability and implementation: MultiQC is available with an GNU GPLv3 license on GitHub, the Python Package Index and Bioconda. Documentation and example reports are available at http://multiqc.info Contact: phil.ewels@scilifelab.se (Source: Bioinformatics)
Source: Bioinformatics - September 26, 2016 Category: Bioinformatics Authors: Ewels, P., Magnusson, M., Lundin, S., Käller, M. Tags: DATA AND TEXT MINING Source Type: research

Meshable: searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms
Summary: Medical Subject Headings (MeSH®) is a controlled vocabulary for indexing and searching biomedical literature. MeSH terms and subheadings are organized in a hierarchical structure and are used to indicate the topics of an article. Biologists can use either MeSH terms as queries or the MeSH interface provided in PubMed® for searching PubMed abstracts. However, these are rarely used, and there is no convenient way to link standardized MeSH terms to user queries. Here, we introduce a web interface which allows users to enter queries to find MeSH terms closely related to the queries. Our method relies on co-occ...
Source: Bioinformatics - September 26, 2016 Category: Bioinformatics Authors: Kim, S., Yeganova, L., Wilbur, W. J. Tags: DATA AND TEXT MINING Source Type: research

Web-based network analysis and visualization using CellMaps
Summary: CellMaps is an HTML5 open-source web tool that allows displaying, editing, exploring and analyzing biological networks as well as integrating metadata into them. Computations and analyses are remotely executed in high-end servers, and all the functionalities are available through RESTful web services. CellMaps can easily be integrated in any web page by using an available JavaScript API. Availability and Implementation: The application is available at: http://cellmaps.babelomics.org/ and the code can be found in: https://github.com/opencb/cell-maps. The client is implemented in JavaScript and the server in C and J...
Source: Bioinformatics - September 26, 2016 Category: Bioinformatics Authors: Salavert, F., Garcia-Alonso, L., Sanchez, R., Alonso, R., Bleda, M., Medina, I., Dopazo, J. Tags: SYSTEMS BIOLOGY Source Type: research

scphaser: haplotype inference using single-cell RNA-seq data
Summary: Determination of haplotypes is important for modelling the phenotypic consequences of genetic variation in diploid organisms, including cis-regulatory control and compound heterozygosity. We realized that single-cell RNA-seq (scRNA-seq) data are well suited for phasing genetic variants, since both transcriptional bursts and technical bottlenecks cause pronounced allelic fluctuations in individual single cells. Here we present scphaser, an R package that phases alleles at heterozygous variants to reconstruct haplotypes within transcribed regions of the genome using scRNA-seq data. The devised method efficiently and...
Source: Bioinformatics - September 26, 2016 Category: Bioinformatics Authors: Edsgärd, D., Reinius, B., Sandberg, R. Tags: GENETICS AND POPULATION ANALYSIS Source Type: research

SELAM: simulation of epistasis and local adaptation during admixture with mate choice
Summary: SELAM is a forward time population genetic simulation program that provides a flexible framework for simulating admixture between any number of ancestral populations. The program can be used to simulate complex demographic and selection models, including dioecious or monoecious populations, autosomal or sex chromosomes, local adaptation, dominance, epistasis, and mate choice. Availability and Implementation: The SELAM package (C ++ source code, examples and manuals) is available via github at https://github.com/russcd/SELAM. This package is distributed under version 3 of the GNU general public license. Contact: ru...
Source: Bioinformatics - September 26, 2016 Category: Bioinformatics Authors: Corbett-Detig, R., Jones, M. Tags: GENETICS AND POPULATION ANALYSIS Source Type: research

ARGON: fast, whole-genome simulation of the discrete time Wright-fisher process
We present a simulator (ARGON) for the DTWF process that scales up to hundreds of thousands of samples and whole-chromosome lengths, with a time/memory performance comparable or superior to currently available methods for coalescent simulation. The simulator supports arbitrary demographic history, migration, Newick tree output, variable mutation/recombination rates and gene conversion, and efficiently outputs pairwise identical-by-descent sharing data. Availability: ARGON (version 0.1) is written in Java, open source, and freely available at https://github.com/pierpal/ARGON. Contact: ppalama@hsph.harvard.edu Supplementary ...
Source: Bioinformatics - September 26, 2016 Category: Bioinformatics Authors: Palamara, P. F. Tags: GENETICS AND POPULATION ANALYSIS Source Type: research

GSA-Lightning: ultra-fast permutation-based gene set analysis
This article introduces GSA-Lightning, a fast implementation of permutation-based gene set analysis. GSA-Lightning achieves significant speedup compared with existing methods, particularly when the number of gene sets and permutations are large. Availability and implementation: The GSA-Lightning R package is available on Github at https://github.com/billyhw/GSALightning and on R Bioconductor. The package also contains a comprehensive user's guide with a step-by-step tutorial vignette. Contact: weidong.tian@fudan.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. (Source: Bioinformatics)
Source: Bioinformatics - September 26, 2016 Category: Bioinformatics Authors: Chang, B. H. W., Tian, W. Tags: GENE EXPRESSION Source Type: research

RADIS: analysis of RAD-seq data for interspecific phylogeny
In an attempt to make the processing of RAD-seq data easier and allow rapid and automated exploration of parameters/data for phylogenetic inference, we introduce the perl pipeline RADIS. Users of RADIS can let their raw Illumina data be processed up to phylogenetic tree inference, or stop (and restart) the process at some point. Different values for key parameters can be explored in a single analysis (e.g. loci building, sample/loci selection), making possible a thorough exploration of data. RADIS relies on Stacks for demultiplexing of data, removing PCR duplicates and building individual and catalog loci. Scripts have bee...
Source: Bioinformatics - September 26, 2016 Category: Bioinformatics Authors: Cruaud, A., Gautier, M., Rossi, J.-P., Rasplus, J.-Y., Gouzy, J. Tags: PHYLOGENETICS Source Type: research