Application of Efficient Data Cleaning Using Text Clustering for Semistructured Medical Reports to Large-Scale Stool Examination Reports: Methodology Study
Conclusions: Our data cleaning process based on the combinatorial use of key collision and nearest neighbor methods provides an efficient cleaning of large-scale text data and hence improves data accuracy.
Source: Journal of Medical Internet Research - Category: General Medicine Authors: Hyunki Woo Kyunga Kim KyeongMin Cha Jin-Young Lee Hansong Mun Soo Jin Cho Ji In Chung Jeung Hui Pyo Kun-Chul Lee Mira Kang Source Type: research