(I promise this part is relevant)
Let’s say you live somewhere in a rural town in the middle of the Australian outback. You come across a random person ranting about how there’s a giant arctic wolf running around and eating people. Do you believe them? Do you take shelter immediately and start checking for weaponisable implements?
Probably not. You probably don’t even pay them any mind.
Well, how about if a mate of yours came up talking about the same thing? You know they’re generally sane, even if a little paranoid from their work on dangerous predators (they lost a good colleague to one, once, and it’s scarred them ever since).
You…may or may not believe them. But at least you might talk to them, ask a few questions to determine if their worry is legitimate or not. You probably wouldn’t dismiss them out of hand, like you did the random stranger on the street.
Alright, what if your town’s police chief said the same thing? They drop by your house, shotgun in hand, saying very seriously, “Sir/Ma’am, there’s an arctic wolf which got loose from containment as it was being transported through our area. Please stay indoors, draw the blinds, close all doors and remain silent.”
Well…you’d probably believe them. After all, it’s their job to protect the town from such threats, and what they’re saying is pretty plausible. You probably do as they say, peeking out your blinds with shovel/rake/kitchen knife in hand.
Congratulations! You have just successfully applied an intuitive understanding of Bayesian statistics.
The first part, common to all three situations, is what you believe before you’re presented with the new evidence, also known as a prior. As someone living in the Australian outback, you probably consider the chance of running across a wild arctic wolf very, very low (as close to 0 as possible, for practical purposes). In statistical lingo, you have a very strong prior that there are no wild arctic wolves in your area.
The second part is the new evidence you’re presented with, and how strong it is. You might hear this being described as the likelihood or the likelihood function. In order to determine how strong evidence is for something, we need to have a rough understanding of how likely it is that we would see this evidence if the thing it is evidence for is false. There’s many ways of computing the likelihood function, but for the purposes of this piece we can just use broad categories.
For example, how likely would it be that the police chief is going door-to-door with a shotgun if there wasn’t really an arctic wolf wandering about? Probably pretty low; while there is a chance that the police chief has gone insane, that chance is very small compared to the chance that there really is an arctic wolf wandering about that they’ve been warned about.
On the other hand, the chance of a random stranger you don’t know ranting about something that turns out to be untrue is pretty high; there are definitely plenty of people who hallucinate or get intoxicated or just want to mess with people. Your friend is probably somewhere in between; while they are probably sane and not messing with you, you also know that they have a bit of a bias towards being worried about large predators so you will likely take a bit more persuading to actually consider this large-arctic-wolf threat plausible.
In other words, there’s a smaller chance that they’re wrong than the total stranger, but a higher chance that they’re just imagining it than the police chief going door-to-door with a shotgun.
Hence, if your alternative explanations are relatively less likely, that thing (the sheriff with the shotgun) is stronger evidence.
Finally, the last part of Bayesian statistics is what you believe after you’ve updated your pre-existing beliefs with new evidence, also known as the posterior. Generally speaking, the more sure you are that the person who’s warning you about the arctic wolf is onto something, the more you’ll change your beliefs to match the evidence. You’ll also be less certain of your beliefs around the issue – for example, before any of the new evidence is presented to you, you might not have thought it likely that someone painted their large dog white, but you’re now willing to think it possible as a way of explaining whatever’s going on in your town.
You’ll also change your beliefs more if you don’t have a very strong belief that the warning is wrong (i.e. if your prior is relatively weak). For example, if you lived in Greenland, or Alaska instead, you would be more willing to believe the random stranger ranting about arctic wolves – they’re pretty common around such parts after all.
So what does this have to do with polling?
Massive outliers in polling
Usually, I’m not in the business of criticising outliers in published opinion polls. I broadly agree with other psephologists who point out that polls are usually much closer together than they should be by chance (a phenomenon known as herding), and worry about being the sole pollster to “get it wrong” can often lead to pollsters massaging the data or even shelving polls which seem to say something different from everyone else.
(NOTE: I am not accusing Australian pollsters of data fraud (although I will say that given how little is usually published about their methods, we don’t know enough to conclusively rule it out). There are plenty of legitimate assumptions involved in producing a poll (e.g. how do you weight for certain things, are certain demographics ones you need to account for etc); most of the problems arise when changes of assumptions are made not on the basis of legitimacy but on the concern of publishing an outlier.)
In this case, my concerns with the 18/Feb/21 Newspoll of WA state voting intention isn’t really so much a worry about them being outliers compared to other pollsters of a similar timeframe (although they are), but rather them being massive outliers in historical context.
While the 2pp result found in this poll isn’t that big a stretch by historical standards, the primary vote results are. Ever since the 1980s and the rise of a strong minor party presence (Democrats/Greens/One Nation), just one party/grouping has managed to win more than 50% of the first-preference vote (Coalition, NSW 2011) in states which run single-winner elections (NSW, Vic, QLD, WA, SA). Even in the heyday of the two-party system and the largest re-election win federally (1943), the highest primary vote share recorded at that election was 50.2%.
Yet the Newspoll claims that 59% of Western Australians intend to vote for Labor. With another 8% intending to vote for the left-wing Greens.
What is one to do with such an outlier?
Bayesian statistics in vote modelling
Well, the first thing is to look at what other polls of the same place conducted on similar timeframes say. Both of the other polls we know of have Labor winning about 46% of the primary, which is more in-line with other historical blowouts. Hence, these polls effectively form our prior – what we believe about Labor’s primary vote before knowing about the new Newspoll.
The next thing is to consider how strong evidence this Newspoll constitutes. There are actual mathematical functions to do this, but I’ll take you through the broad strokes:
- The average expected error on a Newspoll of the Labor primary vote, conducted right before the election, is about 3.3%. So, if Labor’s true primary vote was 46%, we would only see polls this wrong or worse about 0.5% of the time, or once in every 200 elections. That sounds like some pretty strong evidence!
- However, we also expect Newspoll to overestimate Labor’s primary vote by about 0.3% based on historical skews. Additionally, thanks to incumbent skew, we also expect them to overestimate Labor by a further 0.6%. This means that based on historical trends (adjusted for how strong evidence they themselves are – it’s priors and posteriors all the way down), we would expect Labor to be at about 58% of the primary vote in an average election where Newspoll tells us they have 59%.
This takes us to a 0.64% chance of a poll being this wrong or more so, or about once in every 150 elections.
- Now, we also have to consider the time to the election. Generally speaking, polls conducted this far out are about an additional 1% off compared to polls conducted right before the election. In other words, when accounting for changes in voting intention from here to the election, the average expected “error” is about 4.3% instead of 3.3%.
This brings us to a 1.6% chance of a poll being this wrong or more so, or about once every 60 elections.
(For the statistically-inclined: if you’re unable to get the same figures as us, we model the Labor primary vote using a t-distribution with 4 degrees of freedom, not a normal distribution. The scale parameters for step 1 and 3 are 2.3 and 3.3 respectively)
So in other words, we expect a Newspoll to experience a > 13% error on Labor’s primary vote about once every 60 elections. That still sounds like some fairly strong evidence, doesn’t it?
Except for one little snag. While we have some pretty strong evidence, we also have a very strong prior that parties don’t win 59% of the vote. While we expect a Newspoll to be this wrong or worse once every 60 elections, historically, we see a party win 59% or more of the first-preference vote once in every…never.
Huh. Oh dear.
(that’s not to mention the fact that the model is trained on relatively recent elections, where polling improved quite a bit from its strangers-randomly-doing-house-calls days. A model which included those would be trained on things like the Gallup poll which was off by 20% and place the odds of Newspoll stuffing up much higher)
What our model ended up opting for
At the end of the day, when updating a strong prior (of hard-to-estimate strength: for those thinking about just fitting a distribution to pre-existing primary votes, how do you intend to account for changes in the minor-party environment at each election?) with strong evidence, our model simply throws its histogram bars up and tosses the Newspoll into the polling average.
Newspoll in this case is somewhere between the paranoid friend and the police chief; they’re a quirky zoologist who’s well-regarded in their field but has made some mistakes in the past and who doesn’t always tell you much about how they know it’s an arctic wolf.
Hearing from someone like that probably makes you less confident that there aren’t any arctic wolves in the area, but you’re probably not going to believe them right away without more proof. Plus, there’s good theoretical reasons to believe that they’re wrong – arctic wolves aren’t adapted to survive in the heat of the Australian outback, and where would it even come from anyway?
(the theoretical reasons in the context of the 59% primary vote figure are that parties very rarely win such large vote shares in elections with a significant minor party presence, a lot of voters like checks and balances so we might expect this figure to decline come election day, and popular governments usually don’t win as large vote shares as oppositions)
At the same time, you’re less certain of your town’s safety-from-large-canines; it might not be a wolf but that doesn’t mean it’s not something that won’t tear you apart. In vote modelling terms, this means that the model opted to move the forecasted Labor vote up by a fair bit (moving towards the new evidence, but not fully), and also increase the uncertainty it expects in the Labor vote as compared to before.
This is why our average expected Labor 2pp is about 63% instead of the 60% it was hovering around before, but the range of possible outcomes widened from ± 4% to about ± 6% (if you look at the seat graphs, you’ll notice that most bars are a fair bit lower than they were before). As a result, based on historical data and Bayesian statistics, our model only narrowly includes the latest Newspoll in its 95% confidence interval (more commonly known as the margin-of-error), rather than making it the expected outcome.