Improving the text classification using clustering and a novel HMM to reduce the dimensionality

• A dimensionality reduction method based on document content is proposed• The technique utilizes a document clustering to separate data into groups• It introduces a similarity-based document representation based on a Text HMM• The model is tested with the SVM and k-NN classifiers using two medical corpora• Results show the method outperforms other dimensionality reduction approximations
Source: Computer Methods and Programs in Biomedicine - Category: Bioinformatics Authors: Source Type: research