CLASS BLENDING: Simpson's Paradox

For the past two days, we've been posting on Class Blending. Simpson's paradox is a special case that demonstrates what may happen when classes of information are blended. Simpson's paradox is a well-known problem for statisticians. The paradox is based on the observation that findings that apply to each of two data sets may be reversed when the two data sets are combined. One of the most famous examples of Simpson's paradox was demonstrated in the 1973 Berkeley gender bias study RbicaR. A preliminary review of admissions data indicated that women had a lower admissions rate than men: Men Number of applicants.. 8,442 Percent applicants admitted.. 44%Women Number of applicants.. 4,321 Percent applicants admitted.. 35%A nearly 10% difference is highly significant, but what does it mean? Was the admissions office guilty of gender bias? A closer look at admissions department-by-department showed a very different story. Women were being admitted at higher rates than men, in almost every department. The department-by-department data seemed incompatible with the combined data. The explanation was simple. Women tended to apply to the most popular and oversubscribed departments, such as English and History, that had a high rate of admission denials. Men tended to apply to departments that the women of 1973 avoided, such as mathematics, engineering and physics. Men tended not to apply to the high occupancy departments that women preferred. Though women had an equal foot...
Source: Specified Life - Category: Information Technology Tags: classification classifications complexity data science data simplification irreproducible results ontologies ontology Simpson ' s paradox Source Type: blogs