Datasets for Statistical Analysis Home Techniques Index Application Index

Datasets for Statistical Analysis

Back to Datasets home

Participation of women in the paid workforce

Access data file | Access full data file
Keywords:
Categories:

social, finance;
generalized linear models

Description:

The data indicate whether women participate in the paid workforce for a group of married white women between 30 and 60 years of age.

There are 753 observations on eight variables in the reduced data file. There are 22 variables in the full data file, but I don't know what they all are for; only the variables in the reduced data set are explained below.

Variables:
Working Whether the woman particpiates in the paid workforce; either Yes or No
Children.young The number of children five years old or younger
Children.old The number of children between 6 and 18 years of age inclusive
Age The woman's age (in completed years)
College.wife Whether the wife went to college or not; one of Yes or No
College.husband Whether the husband went to college or not; one of Yes or No
WageRate The log of the expected wage rate. For women in the paid workforce, this is the actual wage rate. For women not in the paid workforce, this is imputed based on regression (see Mroz (1987); the explanatory variables in this regression are in sets B, C, I and F3 in his Table V)
Income The family income excluding the wife's income (in thousands of dollars)
For the full data set, it appears (from an R file reading the data; see below) that hours is the number of hours worked by the woman; inlf is 0 (not in paid workforce) or 1 (in paid workforce); unem is the unemplyment rate in the county of residence; city is 1 if the woman lives in SMSA (?); exper is the actual labour marker experience of the woman; educ is the number of years of education; nwifeinc is the non-wife income; age is the womans age; kidslt6 is the number of children under 6; kids ge6 is the number of kids between 6 and 18 inclusive; motheduc and fatheduc are the mothers and fathers education (in years?)
Data Quality:

There are no missing values.

Source:

  • Reduced data set: John Fox's course materials for the 2005 York Summer Programme in Data Analysis course Logistic Regression and Generalized Linear Models, accessed 1:30pm on 05 September 2007.
  • Full data set: A Google search found the data in Stata binary format at this location on 05 September 2007. I loaded this into R, then wrote it as a text file. Reading the R file that came with this file, I deduced the meaning of some of the other variables. I don't know who to credit for this data file and the R source file referenced above; I tried to find out. The files come from the Department of Economics at the University of Copenhagen; that's the best I could do.
  • Notes:

    References: