Combining classifiers generated by multi-gene genetic programming for protein fold recognition using genetic algorithm.
In this study the problem of protein fold recognition, that is a classification task, is solved via a hybrid of evolutionary algorithms namely multi-gene Genetic Programming (GP) and Genetic Algorithm (GA). Our proposed method consists of two main stages and is performed on three datasets taken from the literature. Each dataset contains different feature groups and classes. In the first step, multi-gene GP is used for producing binary classifiers based on various feature groups for each class. Then, different classifiers obtained for each class are combined via weighted voting so that the weights are determined through GA....
Source: International Journal of Bioinformatics Research and Applications - March 21, 2015 Category: Bioinformatics Authors: Bardsiri MK, Eftekhari M, Mousavi R Tags: Int J Bioinform Res Appl Source Type: research

To study percentage distribution of target genes encoding proteins of different classes in Helicobacter pylori strain J99 and identification of potential therapeutic targets to reduce its proliferation.
Abstract Helicobacter pylori are one of the most common bacterial pathogens in humans whose seropositivity increases with age and low socio-economic status. Due to presence of its pathogenic-island causes chronic persistent and atrophic gastritis in adults and children that often culminate in development of gastric and duodenal ulcers. Studies indicate that infected individuals have two to sixfold increased risk of developing gastric cancer and mucosal associated lymphoid tissue lymphoma compared to their uninfected counterparts. The complete genome sequences have provided a plethora of potential drug targ...
Source: International Journal of Bioinformatics Research and Applications - February 15, 2015 Category: Bioinformatics Authors: Vaidya M, Panchal H Tags: Int J Bioinform Res Appl Source Type: research

RadixHap: a radix tree-based heuristic for solving the single individual haplotyping problem.
In this study, we introduce a greedy approach, named RadixHap, to handle data sets with high error rates. The experimental results show that RadixHap can generate highly reliable results in most cases. Furthermore, the algorithm structure of RadixHap is particularly suitable for whole-genome scale data sets. PMID: 25667383 [PubMed - in process] (Source: International Journal of Bioinformatics Research and Applications)
Source: International Journal of Bioinformatics Research and Applications - February 15, 2015 Category: Bioinformatics Authors: Wang TC, Taheri J, Zomaya AY Tags: Int J Bioinform Res Appl Source Type: research

Identifying protein complexes based on the integration of PPI network and gene expression data.
Abstract Identification of protein complexes is crucial to understand principles of cellular organisation and predict protein functions. In this paper, a novel protein complex discovery algorithm IPCIPG is proposed based on the integration of Protein-Protein Interaction network (PPI network) and gene expression data. IPCIPG is a local search algorithm which has two versions: IPCIPG-n for identifying non-overlapping clusters and IPCIPG-o for detecting overlapping clusters. The experimental results on the yeast PPI network show that IPCIPG can identify protein complexes with specific biological meaning more ...
Source: International Journal of Bioinformatics Research and Applications - February 15, 2015 Category: Bioinformatics Authors: Chen W, Li M, Wu X, Wang J Tags: Int J Bioinform Res Appl Source Type: research

TDAC: co-expressed gene pattern finding using attribute clustering.
Abstract A number of clustering methods introduced for analysis of gene expression data for extracting potential relationships among the genes are studied and reported in this paper. An effective unsupervised method (TDAC) is proposed for simultaneous detection of outliers and biologically relevant co-expressed patterns. Effectiveness of TDAC is established in comparison to its other competing algorithms over six publicly available benchmark gene expression datasets in terms of both internal and external validity measures. Main attractions of TDAC are: (a) it does not require discretisation, (b) it is capa...
Source: International Journal of Bioinformatics Research and Applications - February 15, 2015 Category: Bioinformatics Authors: Rahman TA, Bhattacharyya DK Tags: Int J Bioinform Res Appl Source Type: research

Two scenarios for overcoming drug resistance by co-targeting.
Abstract Removal of proteins on an essential pathway of a pathogen is expected to prohibit the pathogen from performing a vital function. To disrupt these pathways, we consider a cut set S of simple graph G, where G representing the PPI network of the pathogen. After removing S, if the difference of sizes of two partitions is high, the probability of existence of a functioning pathway is increased. We need to partition the graph into balanced partitions and approximate it with spectral bipartitioning. We consider two scenarios: in the first, we do not have any information on drug targets; in second, we con...
Source: International Journal of Bioinformatics Research and Applications - February 15, 2015 Category: Bioinformatics Authors: Taheri G, Ayati M, Wong L, Eslahchi C Tags: Int J Bioinform Res Appl Source Type: research

Editorial.
PMID: 25115022 [PubMed - in process] (Source: International Journal of Bioinformatics Research and Applications)
Source: International Journal of Bioinformatics Research and Applications - August 17, 2014 Category: Bioinformatics Authors: Mujahid SN, Korenkevych D, Pardalos PM Tags: Int J Bioinform Res Appl Source Type: research

Pairwise sequence alignment for very long sequences on GPUs.
Abstract We develop novel single-GPU parallelisations of the Smith-Waterman algorithm for pairwise sequence alignment. Our algorithms, which are suitable for the alignment of a single pair of very long sequences, can be used to determine the alignment score as well as the actual alignment. Experimental results demonstrate an order of magnitude reduction in run time relative to competing GPU algorithms. PMID: 24989857 [PubMed - in process] (Source: International Journal of Bioinformatics Research and Applications)
Source: International Journal of Bioinformatics Research and Applications - July 11, 2014 Category: Bioinformatics Authors: Li J, Ranka S, Sahni S Tags: Int J Bioinform Res Appl Source Type: research

PMS6: a fast algorithm for motif discovery.
Abstract We propose a new algorithm, PMS6, for the (l,d)-motif discovery problem in which we are to find all strings of length l that appear in every string of a given set of strings with at most d mismatches. The run time ratio PMS5/PMS6, where PMS5 is the fastest previously known algorithm for motif discovery in large instances, ranges from a high of 2.20 for the (21,8) challenge instances to a low of 1.69 for the (17,6) challenge instances. Both PMS5 and PMS6 require some amount of pre-processing. The pre-processing time for PMS6 is 34 times faster than that for PMS5 for (23,9) instances. When pre-proce...
Source: International Journal of Bioinformatics Research and Applications - July 11, 2014 Category: Bioinformatics Authors: Bandyopadhyay S, Sahni S, Rajasekaran S Tags: Int J Bioinform Res Appl Source Type: research

Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps.
Abstract Some biological sequences contain subsequences of unusual composition; e.g. some proteins contain DNA binding domains, transmembrane regions and charged regions, and some DNA sequences contain repeats. The linear-time Ruzzo-Tompa (RT) algorithm finds subsequences of unusual composition, using a sequence of scores as input and the corresponding 'maximal segments' as output. In principle, permitting gaps in the output subsequences could improve sensitivity. Here, the input of the RT algorithm is generalised to a finite, totally ordered, weighted graph, so the algorithm locates paths of maximal weigh...
Source: International Journal of Bioinformatics Research and Applications - July 11, 2014 Category: Bioinformatics Authors: Spouge JL, Mariño-Ramírez L, Sheetlin SL Tags: Int J Bioinform Res Appl Source Type: research

Accuracy and efficiency of algorithms for the demarcation of bacterial ecotypes from DNA sequence data.
Abstract Identification of closely related, ecologically distinct populations of bacteria would benefit microbiologists working in many fields including systematics, epidemiology and biotechnology. Several laboratories have recently developed algorithms aimed at demarcating such 'ecotypes'. We examine the ability of four of these algorithms to correctly identify ecotypes from sequence data. We tested the algorithms on synthetic sequences, with known history and habitat associations, generated under the stable ecotype model and on data from Bacillus strains isolated from Death Valley where previous work has...
Source: International Journal of Bioinformatics Research and Applications - July 11, 2014 Category: Bioinformatics Authors: Francisco JC, Cohan FM, Krizanc D Tags: Int J Bioinform Res Appl Source Type: research

Effects of rooting via out-groups on in-group topology in phylogeny.
Abstract Users of phylogenetic methods require rooted trees, because the direction of time depends on the placement of the root. While phylogenetic trees are typically rooted by using an out-group, this mechanism is inappropriate when the addition of an out-group changes the in-group topology. We perform a formal analysis of phylogenetic algorithms under the inclusion of distant out-groups. It turns out that linkage-based algorithms (including UPGMA) and a class of bisecting methods do not modify the topology of the in-group when an out-group is included. By contrast, the popular neighbour joining algorith...
Source: International Journal of Bioinformatics Research and Applications - July 11, 2014 Category: Bioinformatics Authors: Ackerman M, Brown DG, Loker D Tags: Int J Bioinform Res Appl Source Type: research

Scaling up genome annotation using MAKER and work queue.
We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even duri...
Source: International Journal of Bioinformatics Research and Applications - July 11, 2014 Category: Bioinformatics Authors: Thrasher A, Musgrave Z, Kachmarck B, Thain D, Emrich S Tags: Int J Bioinform Res Appl Source Type: research

Mapping genomic features to functional traits through microbial whole genome sequences.
Abstract Recently, the utility of trait-based approaches for microbial communities has been identified. Increasing availability of whole genome sequences provide the opportunity to explore the genetic foundations of a variety of functional traits. We proposed a machine learning framework to quantitatively link the genomic features with functional traits. Genes from bacteria genomes belonging to different functional traits were grouped to Cluster of Orthologs (COGs), and were used as features. Then, TF-IDF technique from the text mining domain was applied to transform the data to accommodate the abundance a...
Source: International Journal of Bioinformatics Research and Applications - July 11, 2014 Category: Bioinformatics Authors: Zhang W, Zeng E, Liu D, Jones SE, Emrich S Tags: Int J Bioinform Res Appl Source Type: research

Discovering non-coding RNA elements in Drosophila 3' untranslated regions.
Abstract The Non-Coding RNA (ncRNA) elements in the 3' Untranslated Regions (3'-UTRs) are known to participate in the genes' post-transcriptional regulations. Inferring co-expression patterns of the genes through clustering these 3'-UTR ncRNA elements will provide invaluable insights for studying their biological functions. In this paper, we propose an improved RNA structural clustering pipeline. Benchmark of the new pipeline on Rfam data demonstrates over 10% performance improvements compared to the traditional hierarchical clustering pipeline. By applying the new clustering pipeline to 3'-UTRs of Drosoph...
Source: International Journal of Bioinformatics Research and Applications - July 11, 2014 Category: Bioinformatics Authors: Zhong C, Andrews J, Zhang S Tags: Int J Bioinform Res Appl Source Type: research