Twitter trolls: a linguistic profile of anti-democratic discourse

This article focuses on anti-democratic discourse and investigates the linguistic profile of Twitter trolls. The troll data consist of some 3.5 million messages in English obtained through Twitter in late 2018. These data originate from potentially state-backed information operations aimed at sowing discord in Western societies. The baseline data, against which the troll data are compared, contain circa 4.4 million messages in English drawn from the Nordic Tweet Stream corpus. A machine learning application that enables us to select genuine personal messages in this corpus is used to prune the data. The empirical part investigates frequency-based characteristics of the two datasets. We utilize a set of automatically-extracted word-list information and the observed frequencies of personal pronouns. Our empirical findings show considerable quantitative differences so that the troll data are shorter, make use of a smaller number of lexical types and tokens, and resemble more formal registers, while the personal messages are more spoken-like. The results could be used to improve automated detection systems whose purpose is to identify troll accounts.
Source: Language Sciences - Category: Speech-Language Pathology Source Type: research