sBWT: memory efficient implementation of the hardware-acceleration-friendly Schindler transform for the fast biological sequence mapping

Motivation: The Full-text index in Minute space (FM-index) derived from the Burrows–Wheeler transform (BWT) is broadly used for fast string matching in large genomes or a huge set of sequencing reads. Several graphic processing unit (GPU) accelerated aligners based on the FM-index have been proposed recently; however, the construction of the index is still handled by central processing unit (CPU), only parallelized in data level (e.g. by performing blockwise suffix sorting in GPU), or not scalable for large genomes. Results: To fulfill the need for a more practical, hardware-parallelizable indexing and matching approach, we herein propose sBWT based on a BWT variant (i.e. Schindler transform) that can be built with highly simplified hardware-acceleration-friendly algorithms and still suffices accurate and fast string matching in repetitive references. In our tests, the implementation achieves significant speedups in indexing and searching compared with other BWT-based tools and can be applied to a variety of domains. Availability and implementation: sBWT is implemented in C ++ with CPU-only and GPU-accelerated versions. sBWT is open-source software and is available at http://jhhung.github.io/sBWT/ Supplementary information: Supplementary data are available at Bioinformatics online. Contact: chyee@ntu.edu.tw or jhhung@nctu.edu.tw (also juihunghung@gmail.com)
Source: Bioinformatics - Category: Bioinformatics Authors: Tags: SEQUENCE ANALYSIS Source Type: research
More News: Bioinformatics