Mining for Gold: Identifying Content-Related MOOC Discussion Threads across Domains through Linguistic Modeling

This study addresses overload and chaos in MOOC discussion forums by developing a model to categorize threads based on whether or not they are substantially related to course content. A linguistic model was built based on manually-coded starting posts in threads from a statistics MOOC, and tested on the second offering of the course, another statistics MOOC, a psychology MOOC, a physiology MOOC and a test-set of reply posts. Results showed content-related starting posts had distinct linguistic features that appeared unrelated to the domain. The model demonstrated good reliability for all starting posts in statistics and psychology as well as for reply posts (accuracy ranged from .80 to .85). Reliability for starting posts in physiology was lower, but still provided reasonably good predictive ability (accuracy was .73). The classification model was useful across all time segments of the courses; the number of views and votes threads received were not helpful.
Source: The Internet and Higher Education - Category: Information Technology Source Type: research