Most likely, the least important results people will be looking for in Tuesday’s presidential election will be the accuracy with which the surveys predicted the outcome. Professional opinion takers, of course, will be studying the data carefully, but us retired statisticians with some background in polling also will be paying attention.
I would suggest that the general public take an interest as well. Studies show that survey results can determine if or how people vote, and inaccurate or biased surveys can make a difference in a close election.
In the 2000 presidential and the 2002 midterm elections, the survey results were particularly inaccurate compared to previous years. Not just the presidential election, but also many House and Senate seats showed Democrats substantially ahead right up to voting time but ended going to the GOP. One of the most respected pollsters, Zogby Survey, explained, “We blew it.”
The purpose of polling is to obtain an estimate of the opinions of a total population by randomly sampling only a small number of people in that population. To obtain a sample that truly represents the total, each person in the population must have an equal chance of being selected in the sample. The difficulty faced by survey organizations today is not how to make sure each person has an equal chance of being selected (well validated random selection algorithms do that) but rather ensuring that the population database from which the sample is drawn includes all the people in the population.
For many years, telephone listings were used as the database because almost everyone had a land-line phone. Today, with the increased use of cell phones with several different providers and unrelated numbering sequences, and where many cellphone users have given up their land-line telephones, assuring that important segments of the population are not underrepresented in the population database has become increasingly complex.
Furthering the complexity is the increasing ethnic diversity in the population. Each ethnic group may have quite different voting patterns, and it is becoming harder to weight them properly in the sample. It is no longer sufficient to just get the black and white weighting right.
Most difficult of all for survey organizations to correct is the increasing frequency of people refusing to participate. In order to retain the integrity of the sample, the data-collection process must contact (with repeated attempts if necessary) a high percentage of the people selected in the sample and get them to answer questions. If the non-response rate becomes large, the sample is compromised and the error estimate is no longer valid.
Twenty years ago, two thirds or more of Americans were willing to accept calls and answer questions from surveys. Today, that number often falls below 20 percent. Caller ID and voice mail have made it easier to avoid being contacted.
For every none-response, the pollster must substitute someone who can be reached and is willing to cooperate. But substitutes, even if randomly obtained, increase the possibility of bias entering the results. The people in the sample are no longer a random selection from the population, but rather a sample of those willing to be interviewed.
Finally, the media interpretation of polling results introduces another potential source of misinformation. TV commentators and journalists often report that X or Y is 1 or 2 percentage points ahead as though the difference is true of the total population, when the sampling error is 3 percentage points. The sampling error is a measure of how accurately the results from the survey sample can be generalized to the total population. Differences in opinions found in the sample when they are within the sampling error boundaries must be considered as a random fluctuation when generalized to the total population.
I recently watched two well-respected TV commentators on the national news speculate about the reasons they believed Obama’s poll numbers had increased from 1 to 3 percentage points over Romney’s while never mentioning the 3.5 percent sampling error displayed at the bottom of the TV screen. They were telling their audience that there was a true increase of 2 percentage points in public support for Obama in the total population, when the survey was not accurate enough to detect differences smaller than plus or minus 3.5 percentage points.
Tuesday’s voting results will tell us how effectively survey organizations have adjusted their methods to accommodate the increasing complexities of obtaining a representative sample of the voting public. They know what the problems are and have spent considerable effort to make those adjustments. In contrast, I see no evidence that the media even understand they are often providing misleading interpretations of the results.
Garth Buchanan holds a doctorate in applied science and has 35 years of experience in operations research. Reach him at firstname.lastname@example.org.