Datasets for Statistical Analysis Home Techniques Index Application Index

Datasets for Statistical Analysis

Back to Datasets home

Access data file
Keywords:
smoking, survival
Categories:
health; generalized linear models
Description:
The data gives the smoking and survival data for 1314 women in Whickham (north England). A survey was originally conducted in 1972--1974; a subsequent survey twenty years later followed up the women to determine how many women from the original survey had died. (Of the original women in the survey, 180 have been excluded here: 18 whose smoking habits were not recorded, and 162 who were smokers before the first survey but were non-smokers at the time of the second survey.)
Variables:
Age The age of the women in completed years, split into the categories 18-24, 25-34, 35-44, 45-54, 55-64, 65-74, and 75+
Smoking Smoking status: either Smoker or NonSmoker
Alive The number of women who were still alive twenty years after the original survey
Dead The number of women who has died in the twenty years since the original survey
Data Quality:
There are no missing values.
Source:
From D R Appleton, J M French, and M P J Vanderpump. Ignoring a covariate: An example of Simpson's paradox. The American Statistician, 50:340--341, 1996.
Notes:
The data should not be collapsed over Age. Doing so demonstrates Simpson's paradox.
References:
The data also appear in Anthony C Davison. Statistical models, Number 11 in Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, UK, 2003.