Categories

It was looking so pretty! and other tales of woe

I thought I was so straightforward and careful, going through the 2700 records of the dataset for participants who were born in July 1963 or later (i.e., plausibly in 8th or 9th grade), adding variables for each year to recode the attendance, grade, and graduation to 1s and 0s, and then adding summary variables to indicate total years of observed attendance, regular high school graduation sometime in the first decade of the study, retentions in K-12 grades, and retentions specifically in 9th grade.

Then I filtered the set for only those participants who were in 9th grade for the first time in 1980 (i.e., in 8th in 1979 and 9th in 1980). And when I divided those participants into graduates vs. nongraduates, the average number of years with observed attendance fit roughly with what I expected: an average of 2.5 observed attendance years for nongraduates, an average of almost precisely 4 observed years of attendance for graduates, with the number of years distributed widely for nongraduates and the vast majority of graduates attending for 4 years. Wonderful! Great! Good step forward to the simulation process!

And then I thought I’d be clever and figure out how many participants had left school and then returned. So I calculated “return” variables for each year after 1979, summed them up, and then realized the problem: about 85% of participants “returned,” at least by this method.

No, that’s not what happened. I realized instantly that this was an artifact of a flawed assumption I had made a few steps above: recoding a year of survey nonparticipation as nonattendance. It’s easy for someone to skip a year of participation, be counted as nonattending, and then come back into participation and have an artifactual “return.” That sounds like a “so what? big deal” situation, except that it also screws up my estimates of years of attendance, because I am artificially deflating the number of years (or cycles) attending by counting survey nonparticipation as nonattendance. Shoot shoot shoot.

This is a standard problem of data censorship with nonparticipation, both right censorship (censorship on the right end of the study timeline) with study attrition and middle censorship when participants at the beginning of the survey and in later years skip a year or more of participation. But it creates some interesting problems and requires that I reread the NLSY79 interview protocols so I know how to interpret the vast majority of missing-variable codes (valid skips and skip-interview codes).

Comments are closed.