The real margin of error for polls


Please consider donating to Behind the Black, by giving either a one-time contribution or a regular subscription, as outlined in the tip jar to the right. Your support will allow me to continue covering science and culture as I have for the past twenty years, independent and free from any outside influence.

In my daily scanning of the news, looking for stories that are both educational as well as entertaining, I came across this particular post: “WATCH – This Viral Video Perfectly Illustrates Why Americans Don’t Trust the Lamestream Media”.

The title is typical click-bait, hinting at something truly revealing that nine times out of ten turns out to be immensely disappointing. This time, however, I found that the post revealed a lie about political polls, almost as an aside, that is simply never noted.

The video itself is entertaining. It shows one particularly bad performance by a MSNBC political reporter, where in only about five minutes he used NBC polls to make a string of predictions about the presidential election, every single one of which turned out to be spectacularly wrong. I’ve embedded the video below the fold for your enjoyment.

What the post however noted that I found revealing was something else:

After all, these were NBC polls that Kornacki cited time and time again. Polls that showed Hillary Clinton leading Donald Trump in places like North Carolina, Georgia, and Ohio. The polls were terribly off-base. In some cases, the NBC numbers showed Clinton with a double-digit lead in states that she went on to lose. In other words, the polls were not by any means scientific, fair, or truthful.

Does the phrase “margin of error” ring a bell? Typically, it is between three and four percent, in order to be deemed usable, anyway. But NBC’s margin of error in Pennsylvania was 11 percent. [emphasis mine]

In the past decade or so political polls have routinely included what they call their “margin of error,” which generally for most polls ranges, as noted above, about three to four percent. This number is, and has always been a lie, however, as shown by the highlighted text. The real margin of error is the difference between what the poll predicted and what the actual results were. And for all of these NBC polls, the margin of error was not 3 to 4 percent, but anywhere from 11 to 30 percent!

In other words, these polls were worthless. Worse, they suggest some intentional manipulation, as they all made their error in only one direction (against Trump and for Clinton), much like the tampered global temperature data that we see coming from NASA and NOAA. It could be that there is confirmation bias going on here, producing results these liberal news outlets wish, but I do not think so. NBC, and its sister station MSNBC, have repeatedly in the past five years committed some egregious journalist frauds, all of which designed to make conservatives and Republicans look bad and to promote the interests of the Democratic Party. The network has made no moves to correct the problems. Nor has it fired anyone.

I think it very reasonable to suspect intentional fraud here, specifically aimed at helping the Democrats.

More important, this story illustrates why we should all laugh uproariously the next time we see a mainstream media journalist note pompously that the poll he or she is citing has a margin of error of 3%. He or she either doesn’t know what they are talking about, or they know very well and think you are too stupid to notice.

16 comments

  • Margin of error is a statistical calculation which can be done prior to finding out what the reality is. Margin of error has to do with sample size. The larger your sample size the smaller your margin of error. It is a useful feature in statistics.

    But, what margin of error cannot tell is if there is a systematic bias in which the statistical mean is off base for some reason. So, for example, it may be that Trump supporters were less likely to participate in a poll conducted by the “lame stream media”. Or it could be that the predominance of the late undecided voters were Republicans who didn’t want to vote for Trump but when it came down to the day, they decided to risk Trump rather than vote for the known corrupt Hillary. Or some other bias.

    I want the margin of error calculated because it gives an indication of how well the statistical method was applied. But you are correct that it gives the misleading impression that the calculated mean is the center of where the final outcome is likely to be. It often is not.

  • DougSpace wrote: “Margin of error is a statistical calculation which can be done prior to finding out what the reality is. Margin of error has to do with sample size. The larger your sample size the smaller your margin of error. It is a useful feature in statistics.”

    This only proves how incredibly misleading the term “margin of error” is as used by statisticians, pollsters, and liars (but I repeat myself). Say for example you purposely bias your poll so that it includes 25% more Democrats than Republicans. Your sample size could then be a million, with the so-called statistical margin of error reduced to a tiny number, and your real margin of error would be gigantic.

    I am a writer, and thus care very deeply about the meaning of words. The words “margin of error” mean a very specific thing to the general public. They suggest that the poll will be accurate to within that margin. This however is demonstrably false, and thus illustrates why these words are being misused every time a pollster uses them.

  • There are three kinds of lies: Lies, damn lies, and statistics.

    I would say the problem is not with the confrontational bias but more with the people running these things have no idea what a true, non-biased statistical poll would even look like. They have not been given the training to know what to do.

    I call that a problem with the college education system.

  • diane wilson

    Doug is correct that “margin of error” can be calculated based on sample size. What Robert is concerned about (correctly) is sampling methodology, which should also be published. That would reveal sources of error, such as over-reliance on land lines, geographical concentrations that don’t match distribution of party membership, etc. A serious pollster would analyze this and run test surveys simply to validate sampling methodology before running a real poll for publishable results.

    Two other useful statistical terms in this context: A study is “reliable” if it can be repeated, with the same or similar results. A study is “valid” if it is reliable, and if it also measures what it purports to measure.

    One of my college stat courses included a book called “Statistics, A Spectator Sport” which was a thorough survey of all the possible ways to screw up a study or poll, and how to spot these errors in publication. It was lots of fun.

  • wodun

    All good points above guys.

    The margin of error isn’t just about sample size but also includes attributes of the respondents. The more people who are included, then the less they would need to worry about weighting the pool of respondents. These are all voluntary polls, meaning that there is a self selection bias. What Doug said about people not wanting to say they would vote for Trump could play in, or maybe not.

    What I noticed was that the media no longer talked about the margin of error. In the video, about the 1:15 mark, the poll guy is talking about Iowa. He never once mentions the margin of error, +/- 3.3%. He does talk up Hillary being 4 points ahead. Taking the MOE into account, her lead was very slim as it was in many of the states that Trump flipped.

    The poll in the video was from August and Trump ended up winning by 10 points. The media always talks about polls as sure things, or unchanging, but in reality, they are just snapshots. Its like with Nate Silver’s predictions. He wasn’t forecasting the chances that a candidate would win on election day but rather who could win on that day. And the numbers changed every day, which no one in the media took into account.

    I don’t know if the media has just become unserious, if they were trying to depress the vote, or create a bandwagon effect, but they weren’t accurately talking about how polls work.

  • LocalFluff

    The only margin of error that can be calculated is based on the assumption that the sample is perfectly representative for the voting turnout. The systematic bias cannot be estimated, other than using some history of the outcomes and make some rough adjustment for it. Still having to unrealistically assume that the biases are equal on average. The reason that polls rarely ask more than thousand or so, is because the calculable error bar gets so low (like 4%) that it is getting overshadowed of the biases which easily could represent upwards to 10% error bar. Doubling the sample size would double the cost but hardly improve accuracy.

    Apart from some polls obviously being politically manipulated, there are also real problems growing with the advent of digital media and a real upset in political sympathies. Fixed phone lines and traditional demographics are giving more and more biased samples compared with how people actually vote.

    But it is easy to publish a single number, a point estimate. It’s like reporting the daily move on the stock market or the exchange rate. Pretty useless to know it was down -.3% or something today, there’s nothing to analyze with that noise. But journalists make up stories about “profit takings” for example. As if there’s not a buyer for each seller. Completely senseless.

  • Very well put, LocaFluff. Larger samples reduces the error bars (MOE) but does little to address the systematic bias. I’m just not so sure if there is a fail safe way of determing who exactly is going to be voting and then sampling from all of those people. Ultimately we should favor or disfavor candidates on their merits rather than be swayed by what other people are thinking about them. But apparently poll numbers are considered very “newsworthy”.

    Statisticians have a pretty hard job dealing with spotty data. They have develop all sorts of techniques to try to deal with missing and biased sampling. Perhaps a scoring system for how accurate their previous statistics were could be helpful in interpreting their latest poll results.

  • D.K. Williams

    The overall Real Clear Politics national average of polls in the 2016 general election was quite accurate in predicting the total vote. However, in several key states, Trump’s narrow margin of victory led to a big Electoral College majority. Thus, it is always better to consider polls state-by- state.

  • D.K. Williams wrote: “It is always better to consider polls state-by- state.”

    Yup, and that is exactly what the MSNBC political reporter did in the video in my post. He went state-by-state, and by using those state polls he proved beyond a shadow of a doubt that Hillary Clinton was going to win by a smashing landslide.

    The problem is that the polls were garbage. Which was the point of my post in the first place.

  • wayne

    Good stuff by all.

    Highly (highly) recommend this Econ Talk podcast:
    “Rivers on Polling”
    http://www.econtalk.org/archives/2008/07/rivers_on_polli.html
    “Doug Rivers of Stanford University and YouGov.com talks with EconTalk host Russ Roberts about the world of political polling.”

    He touches on all the points in this thread as they relate to political polling.

    One thing I would bring up with regards to Real Clear Politics; averaging Polls is a good idea, but– you can’t average Polls that have differing methodologies.
    The errors within each poll are just magnified rather than balancing out over a large number of polls.
    And… with reference to Presidential elections– we have the election on the same day, but it’s not a “national election,” — the outcome is determined by 50 separate State elections.

    LocalFluff — good stuff.
    For every seller there is a buyer. The financial media is perhaps even more disingenuous in their treatment of “statistics,” in that they present spurious causation and correlation, as-if it was “scientific.” The average person is even less-equipped to correctly evaluate all these financial “statistics.”

    It is interesting to note as well, they refer to “Political Science,” as a “science.”

    Future Sampling is largely driven by past outcomes, and they don’t strive to get a strictly “random sample,” but rather what they believe from their Models to a be “representative samples” of people who actually vote.

  • Sayomara

    wayne I remember that Podcast!

    The only thing I would keep an eye out for with that podcast and I think they guy he talks is from yougov which is a uk polling firm that also works in the US. Based on what I’ve seen while I think there approach is interesting and maybe might be a good option in the future right now based on the Scottish vote, Brexit, and US Presidental election of 2016. They are having just as many problems getting good samples as everyone else.

    That said i do think there are real issues in polling that need to be address. This isn’t just an issue in the US, its everywhere. And press needs to stop treating polling that they paid for as news. Polling is fine and its clearly not going away but how we can spend day after day talking about polling months away from an election but not report government failure of the VA hospitals is appalling

  • wayne

    Sayomara–
    Yowza. Cool.
    > I knew I wasn’t the only one on Earth, that listened to Russ Robert’s! (I have literally every single episode. Prof Mike Munger is my favorite guest.)

    Yes– one needs to be careful about the Guest’s on Econ-Talk. They all have an agenda of some sort in play. I don’t necessarily agree with his specific solutions for polling, but they do a fairly good job, in-general, of describing the problems.

    >Highly recommend Econ Talk; weekly 1 hour episode, free, and with 500+ archived shows.

  • PeterF

    “much like the tampered global temperature data that we see coming from NASA and NOAA.”

    Bob, Don’t forget to add USGS to this list now!

  • LocalFluff

    D.K. Williams
    I noticed a day (or two days) before election that betting odds were up to 5 times the money for Trump! While the polls were tight. RCP average of half a dozen or so polls had all but one 0-5% in favor of trump, and one poll +11% for Clinton, resulting in a .1% lead for Clinton in the average. Assuming that one was an outlier, Trump would win in the polls on the eve of election. Now, he DID lose by about .1%, so in that case the unlikely looking average actually happened. Trying to analyze poll statistics with statistical criticism won’t make you much wiser either.

    It’s a bit like (ancient) astrology. Some guidelines are carefully followed by tradition, but no one really has any understanding of what is going on. Just making up whatever prophecies that feel convenient at the moment. But maybe not so convenient any more the day after.

  • LocalFluff

    Above in 3rd sentence I meant the NEW HAMPSHIRE poll average by RCP had the several 0% to +5% vs the single -11% polls one or two days before election day. Not national polls, but polls for the state of NH. (Where that seemingly flawed poll average turned out to be spot on, thanks to that “obvious” outlier).

  • Edward

    Statistics is a tricky science, because it is more art than science. Even when the data is not biased (by the researcher or the person being surveyed), the number used for the margin of error may be wrong 5% of the time.

    Statisticians can mess up, because statistics is so tricky. William Briggs is a statistician and global warming scientist who concentrates on how statistics can be misinterpreted or misused, even if accidentally. He is discussing other topics on his first page, today, but his discussion is ongoing. This page of his gives links to his posts by topic: http://wmbriggs.com/classic-posts/

    There is the classic book, “How To Lie With Statistics,” from decades ago. It shows purposeful and accidental ways to collect or present data that mislead the audience.

    Statistics is not even an exact science. Statisticians have an ongoing debate between three schools of thought on how to interpret probability: Bayesian, Frequentist, and Propensity. The first two seem to be the most debated.

    That we have pollsters — so named, many decades, ago in order to make it sound like they are hucksters — whose polls were so wildly biased in one direction shows that either the hucksters — er — pollsters do not know what they are doing, or know that they are purposefully trying to influence the election as though they were the Russian government or were Sanders sympathizers (which is where Wikileaks admits the emails came from).

    And as Robert notes, people who do not understand the polls or the underlying science all too often use the data from these polls in ways that are even further biased.

    Just as Dan Rather’s premature call of Florida for Gore in the 2000 election led to great national distress, the abuse of poor polling in 2016 has led to great stress in international relations, as Obama falsely accused Russia of tampering with our election (as though he, Obama, did not tamper with Israel’s, but then, maybe his own tampering is why he thinks others would tamper).

    Poorly applied science in global warming (especially as misinterpreted by Al Gore and others) has led to hundreds of billions of dollars in wasted spending in Europe alone, and to lost productivity worldwide. Poorly applied science in polling (especially as misinterpreted by Dan Rather, Steve Kornacki, and others) has led to national and international tensions and distrust.

Leave a Reply

Your email address will not be published. Required fields are marked *