Every other year, during election campaigns, the American public is polled, surveyed, and canvassed for their opinions, and the news media continuously inform us of the results. The media report polls in the same breathless way that race track announcers describe horse races: "As they round the corner of the convention, the Republican is pulling ahead on the right! Now, they're entering the home stretch and the Democrat is pulling up on the left!" Et cetera.
There is little drama in simply waiting until after the election to report the results. Instead, reporters use polls to add suspense to their coverage, with a leader and an underdog to root for. Moreover, every news outlet is trying to scoop the others by being the first to correctly predict the winner. Unfortunately, much of this coverage sacrifices accuracy for artificial excitement.
This article explains how to read a news report of a poll without being duped by the hype. You don't need to be a statistician to understand enough about polls to not be taken in, because the problems are often not with the polls themselves but with the way that they're reported.
First, please take the following unscientific poll:
Are you looking for the results of the poll you just took? Read on!
Opinion polls, like other surveys, are a way of inferring the characteristics of a large group—called "the population"—from a small sample of it. In order for this inference to be cogent, the sample must represent the population. Thus, the main error to avoid is an unrepresentative sample. For example, the most infamous polling fiasco in history is the Literary Digest poll in the 1936 presidential election1. The magazine surveyed over two million people, primarily chosen from subscribers, telephone books, and car registrations. Even though the sample was enormous, the poll showed Republican Alf Landon beating the actual winner, Democrat Franklin Roosevelt. Despite its size, the Digest's sample was unrepresentative of the population of voters because not everyone could afford a phone or car during the Depression year of 1936, and those who could tended to vote Republican in greater numbers than those who couldn't2.
So, the first question that you should ask of a poll report you read is: "Was the sample chosen scientifically?" If the poll is a scientific one, then an effort has been made to choose the sample randomly from the population. However, many polls are unscientific, such as most online polls, telephone surveys in which you must call a certain number, or mail-in questionaires in magazines or sent to you by interest groups. Such surveys suffer from the fault that the sample is self-selected, that is, you decide whether you wish to participate. Self-selected samples are not likely to be representative of the population for various reasons:
For example, some media outlets sponsor scientific polls but, when the results are reported in their online edition, they are sometimes accompanied by an online poll using a self-selected sample and asking some of the same questions. It is instructive to compare the two, as the results are usually very different.
So, self-selected samples are likely to be biased and are, at best, a form of entertainment, as opposed to a source of information about the population as a whole.
Because polls question only a sample of the population, there is always a chance of sampling error, that is, drawing a sample that is unrepresentative of the population as a whole. For instance, in a political poll, it is possible that a random sample of voters would consist entirely of Democrats, though this is highly unlikely. However, less extreme errors of the same kind are not so unlikely, and this means that every poll has some degree of imprecision or fuzziness. Because the sample may not be precisely representative of the population as a whole, there is some chance that the poll results will be off by a certain amount. Statisticians measure the chance of this kind of error by the "margin of error", or "MoE" for short.
The MoE takes the form "±N percentage points4", where usually N=3 in national polls. This margin determines what is called a "confidence interval": for example, if the percentage of a sample that supports candidate R is 51%, and the MoE is ±3 percentage points, then the confidence interval is 48-54%. In turn, the confidence interval and the MoE are determined by the "level of confidence", which is usually set at 95% in national polls―for more on the confidence level, see the next section.
The MoE is a common source of error in news reports of poll results. Most reputable news sources require their reporters to include the MoE in a report on a poll, at least in a note at the end. However, many reporters ignore the MoE in the body of their articles, perhaps because they don't understand what the number means.
Reporters often use polls for "horse race" reporting by comparing the poll numbers of candidates, or to compare current polls to past ones to see if the results are changing. The MoE needs to be factored into such comparisons. There are two kinds of errors about MoEs frequently committed in news reports of poll results:
So, when you read a news report of a poll, always look for the margin of error, then take that margin into account when evaluating the results. Don't expect the reporter to do this for you: some will but many won't.
You may well wonder why the MoE is usually around plus-or-minus three percentage points in most national polls. The reason is that most such polls have about a thousand respondents, and that's the MoE for samples of that size.
An important mathematical fact about polls is that the sample size and margin of error (MoE) are inversely related5. What that means is that the larger the sample, the smaller the MoE, and the smaller the sample, the larger the MoE. The MoE is a measure of the precision of a poll, which means that the smaller it is, the more precise the poll results are, and the larger it is, the less precise they are. So, if you want more precise poll results, you need a larger sample.
The problem is that the sample size and the cost of polling in both time and money are directly related, that is, the larger the sample, the more time it takes to do the poll and the more it costs. This is why you seldom see national polls with a sample size of much more than a thousand respondents. To reduce the usual MoE by a single percentage point requires more than doubling the sample size6, and thus probably doubling the time and money spent on the poll. So, most pollsters decide that it isn't worth it.
What is true of an entire sample is equally true of its sub-samples: a "sub-sample" is simply a subset of the whole sample, and is itself a sample. Since sub-samples are smaller than the whole sample, they have larger MoEs. Journalists writing about polls and discussing the results for various subgroups seldom mention that the MoE is larger for such results, let alone actually supplying that larger MoE7, so you need to be on the lookout for it.
So, keep this fact in mind when reading a report about poll results: if it reports on the opinions of female independents, or black Republicans, or whatever, these are almost certainly sub-samples that will have a larger MoE than that reported for the whole sample.
In the previous section, I mentioned the level of confidence—usually 95%—used to determine the MoE and, therefore, the confidence interval. The purpose of a survey is to measure some characteristic of a sample, such as support for a candidate, in order to infer its level in the whole population. A 95% confidence level means that in 19 out of 20 samples, the percentage of the sample with the characteristic should be within the confidence interval of the percentage of the population with the characteristic. However, we can also expect one out of twenty samples to differ from the population by more than the MoE.
95% confidence sounds pretty confident—and it is!—however, there are a lot of polls done these days. In fact, there are hundreds of national polls conducted in the U.S. during a presidential election year. This means that with a confidence level of 95%, we can expect 5% of those polls to be off by more than the MoE as a result of sampling error.
How can we tell when the results of a poll are off by more than the MoE? If a poll gives very different results from others taken around the same time, or shows a sudden and large change from previous polls, this suggests that the unusual result may be due to sampling error. No one can know for sure whether sampling error is responsible for polls with surprising results, but the fact that 1 in 20 polls can be expected to be significantly in error should encourage us to regard such poll results with skepticism. Moreover, it's important to pay attention to all of the polls taken on a given topic at a particular time, otherwise you'll have no way of knowing whether a poll you're looking at is giving wildly different results than comparable ones.
Here's another reason to pay attention to all the comparable polls, as opposed to concentrating on just one. Suppose that five polls are conducted at about the same time showing the following results with a MoE of ±3 percentage points:
Poll | Candidate D | Candidate R | Undecided |
---|---|---|---|
1 | 43% | 42% | 15% |
2 | 42% | 41% | 17% |
3 | 44% | 42% | 14% |
4 | 44% | 44% | 12% |
5 | 46% | 43% | 15% |
Each of these results is within the MoE so, taken individually, you would have to conclude that neither candidate is really ahead. However, four of the five polls show candidate D with a lead, and the other shows a tie; no poll shows candidate R leading. Of course, it's highly improbable that both candidates have exactly the same level of support, but if they are within a percentage point of each another you would expect the polls showing one candidate ahead to be about evenly divided between the two. Instead, in this example, all of the polls showing one candidate ahead favor candidate D, which is unlikely unless D has a real, albeit small, lead. Thus, even when individual polls do not show a clear leader, the consensus of all polls may do so.
Another possible explanation for the results in the table above are a systematic bias in favor of candidate D, which is a possibility that cannot be ruled out as we've seen examples in recent elections. Such biases may result from unrepresentative sampling due to some groups of people being less likely to respond to pollsters, which is the same thing that happened in the Literary Digest debacle, discussed above. Unfortunately, the source of such biases is not well understood, and there's little that the poll consumer can do to allow for it.
Unfortunately, news stories on polls usually concentrate on one poll at the expense of all others. Many polls are sponsored by news outlets, which get their money's worth by reporting only the results of their own polls, ignoring those sponsored by their competitors. Therefore, it's up to you to check to see whether there are other polls on the same topic, and to compare the results of any comparable polls. Thankfully, there are now more than one site than collects and aggregates the results of all recent polls, which makes the task of comparing poll results much easier than it used to be.
Even the best, least-biased polls are only a snapshot of public opinion during a brief period of time. Public opinion can, and does, change over the course of time, depending on what's happening, so it's important to know when a poll was conducted. Most polls take a few days to conduct, and the dates during which the survey was made should be indicated somewhere―most often at the end of a news article reporting on the poll. Unless there is some major event in the news, you should be skeptical of a poll or polls that seem to show a large change in public opinion over the course of a few days. If you encounter such polling, check the fine print for the dates the poll or polls were conducted and the MoEs7.
When elections are very close, as many presidential elections have been in the last few decades, polls cannot be expected to predict a winner. A close election is like a photo finish in a horse race when one candidate wins by a nose. Given its usual MoE, it's unrealistic to expect a poll to call such a race. Even poll aggregators often miscall such races, since the polls that are aggregated may be systematically biased.
People are often disappointed with polling because they expect too much. When a close election surprises people, as happened in 2016, many conclude that the polls were wrong. Yet, an average of polls in 2016 was only about one percentage point shy of the popular vote total8, which is better than we have any right to expect.
When you are confronted with a new poll, ask the following questions about it:
If the poll you are confronted with fails at any step of this checklist, or if you can't find the answer to these questions in the report, then your confidence in the poll should be much less than 95%.
If you haven't guessed by now, the online poll was bogus, but not much more bogus than most such polls. If you go back and retake the poll having read the entire article, I hope that you will agree to disagree with all of the questions!
Poll results are most often reported in terms of percentages; for instance, a news article may report that the president's job approval rating is 40%. The approval rating is based on a poll that asks respondents whether they approve of the president's job performance, and the report indicates that 40% of those who answered the poll did so. In a later poll, the approval rating might be 45%. Both of these approval ratings are percentages, and the difference between them is measured in percentage points: five percentage points, in this case. In other words, if you subtract 40% from 45%, you get five percentage points. So, percentage points are the way to measure the difference between two percentages.
Confusingly, the percentage notation "N%" is often used to mean N percentage points―I did it myself in earlier versions of this article. Thus, the percent sign is ambiguous between percentages and percentage points. This ambiguity can lead to confusion when percentage points are taken to be percentages, or vice versa9. To avoid such potential confusion, I refer to the MoE in terms of percentage points and not percentages throughout this article.
Keep in mind, however, that news articles usually report the MoE of a poll as "±3%", rather than "±3 percentage points". So, if the president's approval rating is reported to be "40% ±3%", that means that it is between 37% and 43%.
Notes:
Revised: 12/6/2022, 10/28/2024