DTMiner: identification of potential disease targets through biomedical literature mining

In this study, we propose a reliable and efficient framework that takes large biomedical literature repositories as inputs, identifies credible relationships between diseases and genes, and presents possible genes related to a given disease and possible diseases related to a given gene. The framework incorporates name entity recognition (NER), which identifies occurrences of genes and diseases in texts, association detection whereby we extract and evaluate features from gene–disease pairs, and ranking algorithms that estimate how closely the pairs are related. The F1-score of the NER phase is 0.87, which is higher than existing studies. The association detection phase takes drastically less time than previous work while maintaining a comparable F1-score of 0.86. The end-to-end result achieves a 0.259 F1-score for the top 50 genes associated with a disease, which performs better than previous work. In addition, we released a web service for public use of the dataset. Availability and Implementation: The implementation of the proposed algorithms is publicly available at http://gdr-web.rwebox.com/public_html/index.php?page=download.php. The web service is available at http://gdr-web.rwebox.com/public_html/index.php. Contact: jenny.wei@astrazeneca.com or kzhu@cs.sjtu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
Source: Bioinformatics - Category: Bioinformatics Authors: Tags: DATA AND TEXT MINING Source Type: research