No truth stands alone.

Approval Ratings Are Rubbish, But They Have Value

(My analysis in this piece builds on the excellent work done by Dr Kevin Bonham on the same topic, linked here; it is very much worth a read)

If you’ve read articles dissecting opinion polls in the media, you’ve almost certainly come across a statement of one of the following forms:

Alongside strong results on voting intention for the government, the Prime Minister continues to widen his/her margin over (Opposition Leader) as the preferred Prime Minister…

Despite weak voting intention figures for the government, the Prime Minister maintains a lead over (Opposition Leader) as respondents’ preferred Prime Minister…

With the Prime Minister having fallen behind (Opposition Leader) as the preferred PM, colleagues have started…

For a variety of reasons, media articles on polling tend to focus heavily on the Preferred Prime Minister/Better Prime Minister question, as well as approval ratings for the Prime Minister and Opposition Leader. This is despite the fact, as Dr Bonham has demonstrated repeatedly using Newspoll’s historical figures, any model which uses Better PM or PM approval/net-satisfaction (netsat) is less accurate than simply using voting-intention figures.

This isn’t just a Newspoll thing, either. An average of final Better PM polling (which should, in theory, be more accurate than simply using the figures from a single pollster) still explains less of the variance in election results than the final polling average:

Scatterplot of final average Better PM ratings vs incumbent two-party-preferred

(my thanks to Dr Kevin Bonham for having provided archives of old Morgan polls and Newspolls, as well as to William Bowe for having provided an archive of old AGB McNair/ACNielsen/Nielsen polls)

This is especially in contrast with a simple average of final pre-election polling, which can explain over half of the variance in election results: For polls which did not provide a 2-party-preferred estimate, I calculated my own using last-election preference flows. Note that the graph below overwhelmingly relies on the 2pp estimate as published by pollsters; as some pollsters publish respondent-allocated 2-party-preferred estimates, using last-election preference flows for all polls (instead of just the ones which didn’t publish a 2pp estimate) would have produced more accurate 2-party-preferred estimates.

Final polling average vs election results

More importantly, however, Better Prime Minister and the Prime Minister’s approval/netsat rating are both fairly weak predictors. I’ve used cross-one-out-validation to estimate the prediction error of each model; where I remove the data for one election (e.g. the 2019 election), run a regression and attempt to “predict” the 2-party-preferred for that election without having “seen” the result for that election:

Models based on Better PM or PM approval scores are less accurate than simply using voting-intention polling as given

(if you’re on a mobile device, scroll right for full data or turn your device landscape. Click the Previous and Next buttons to view all data.)

ElectionPolling average error (%)Better PM model error (%)PM approval model error (%)
2019-2.9-2.2-1.4
20160.2-0.5-0.3
20130.32.92.7
20101.2-0.20.3
2007-1.11.53.4
2004-1.4-2.9-1.6
2001-2.2-0.81
1998-0.501.4
19962.13.32.8
1993-1.6-3.1-4.1
1990-0.52.6-0.7
19871.70.60
Average error1.3%1.7%1.6%
Figures are in two-party-preferred errors; positive values mean the model overestimated the incumbent's 2pp while negative values mean the model underestimated the incumbent's 2pp.

On average, a model using Better PM ratings would have a retrodiction error of 1.7%, correctly calling 6 of 12 elections (50%); while a model using Prime Minister approval/netsat would have an average error of 1.6% and a correct-call rate of 8 of 12 elections (67%). In contrast, a simple average of final voting-intention polling for the same elections would be off by just 1.3% and would have called 9 of 12 elections correctly (75%).

We can improve on this somewhat by combining the Prime Minister’s approval/netsat score with the Opposition Leader’s approval/netsat score. Using multiple regression, I estimate that in terms of voting intention, the effect of a 1% gain by the PM on their approval rating is approximately equal to a 0.5% decline by the Opposition Leader on their approval rating. This means that all else being equal, the two-party-preferred for a party whose PM holds a +2 approval rating and who faces an Opposition Leader with a +4 approval rating has historically been about the same as the two-party-preferred for a party whose PM has a -4 approval rating and is facing an Opposition with a -8 approval rating.

(Interestingly, this seems to suggest two routes to winning government: “be a heck of a lot more popular than the PM” (Hawke 1983, Rudd 2007), or “hope the PM implodes, and get out of the way” (Howard 1996, Abbott 2013). Neither works consistently – in 1980 Bill Hayden held a stellar +31 rating against Prime Minister Fraser’s weak -4 netsat but narrowly lost in the popular vote, while in 1993 John Hewson held a weakly positive +2 rating against Keating’s horrendous -23 approval score but still lost by a decent margin anyway)

With this in mind, I’ve estimated an “Prime Minister’s Approval Margin”, which is (Prime Minister netsat – (Opp Leader netsat)/2). The approval margin measure correlates more strongly with the government’s 2-party-preferred result than Better PM: I’m still not sure what to make of the idea that a calculation using PM netsat + OL netsat works better than Better PM; you would think directly asking voters who they thought would do a better job as PM would work better.

At the same time, the regression coefficients are fairly robust, meaning that when I go through the dataset and delete an election, the 2:1 (PM netsat) : (OL netsat) importance ratio stays fairly consistent. And the idea that PM netsats are about twice as important as the OL netsat is something we would probably expect even before running any analysis.

Approval margin vs two-party-preferred result
Interestingly this comes very close to the R2 calculated by Dr Bonham for Newspoll’s 2pp estimates (0.4382).
Govt2pp = 50 + 0.086 * (PM approval) – 0.043 * (OL approval)

Approval margin also does decently well as a predictor. On average, using the same cross-one-out-validation method described above, a two-party-preferred model using approval margin would have an average error of 1.5%, comparable to individual polls taken the week leading up to the election:

Retrodictions using approval margin have about the same average error as final-week voting-intention polls

Retrodicted govt 2pp using approval margin, vs actual govt 2pp result
The solid diagonal line represents where the points would be, if the retrodictions were perfect. The largest errors by this method were Keating 1993 (because of course) and the two times Fraser won re-election (1977, 1980).

However, the approval margin model is still worse than a simple average of all final voting-intention polls (1.5% average error vs 1.3% average error), despite being based on an average of all polls which asked about approval ratings. That does not mean that it has no predictive value; a predictor with worse error can still be useful if:

  1. The predictor corrects the bias of another predictor. For example, if voting-intention polls (polls that directly ask people who they want to vote for) consistently under-estimated the incumbent government, but a model based on Better PM scores tended to over-estimate the government’s vote, then it might make sense to add the Better PM model into the average with some weighting in order to correct the bias of voting-intention polls and produce a more accurate vote forecast.
  2. The predictor is independent of, or less correlated with, another predictor. This is particular relevant after 2019, where every pollster over-estimated the Labor two-party-preferred by 2.5% – 3.5%. Basically, whenever a poll errs in one direction (e.g. over-estimating Labor), other polls tend to err in the same direction as well (another example is the 2018 Victorian state election, where every pollster under-estimated Labor). If a predictor – e.g. approval margin – is less likely to make the same error as voting-intention polling, then it can be useful to include that predictor in a combined forecast to increase the likelihood that errors “cancel out” (click footnote for an example). For example, let’s say that you have three polls, Coalition 51%, Coalition 52%, and Coalition 52%. The actual result of the election is going to be Coalition 48%; a simple polling average would experience a 3.7% error (average Coalition vote in polling, 51.7%).

    However, let’s say that a model based off approval margin would output Coalition 42%. Even though the error on the approval-margin model is higher (a 6% error, compared to 3.7% for the polling average), if you added the approval-margin model output as a “poll”, then the average of polling + approval margin model would be 49.3%, for an error of just 1.3%.

    If the error on a approval margin model (or some other predictor) is less correlated with polling error, then it can actually make sense to add the other predictor in (once you have multiple polls) so as to reduce the likelihood of correlated errors.
  3. The predictor is a leading indicator. Although some indicators are worse “on-the-day” predictors than final polling, they may be better indicators further out (the usual assumption is that these indicators point us to how people will vote once they tune into the election campaign). Annualised GDP growth from the start of a government’s term is a decent indicator for this purpose; I estimate that on average, a model using this variable as measured six months out would have been off by 2% on the election day two-party-preferred.

    In contrast, the polling average six months out is usually off by 2.7% compared to the election day vote. Hence, although the final polling is better than the final GDP-growth model, the GDP-growth model can be more accurate further out.


    For example, in early 2010, the Rudd government was still riding high with a 54-46 lead in the polls despite mediocre economic metrics which suggested Labor was due to lose 49-51. As the election approached, Labor’s voting-intention took a hit, with Labor winding up with just 50.12% of the two-party-preferred on election day. Although the economic model was “wrong” in this case, the fact that the government’s economic numbers were so much worse than their polling at the time strongly suggested that their polling would eventually take a hit once campaigning started in earnest; hence combining an economic model with polling could have “guided” expectations about how voting-intention was likely to change by election day.

Going down the list:

There is minimal skew in the final polling average

(if you’re on a mobile device, scroll right for full data or turn your device landscape. Click the Previous and Next buttons to view all data.)

ElectionFinal polling average error (%)Approval margin model error (%)
2019-2.9-1
20160.20.1
20130.32.4
20101.20.6
2007-1.11.7
2004-1.4-2
2001-2.20.7
1998-0.50.6
19962.12
1993-1.6-3.9
1990-0.60.8
19871.70.6
1983-0.91.3
1980-1.4-2.3
Average skew-0.50.1
Figures are in two-party-preferred errors; positive values mean the model overestimated the incumbent's 2pp while negative values mean the model underestimated the incumbent's 2pp.

There’s pretty much no skew to or against the incumbent in the final polling average, and as I’ve noted before, there hasn’t been any bias against the Coalition in final polls (nor any bias to the Coalition, for that matter). This means that adding approval-based models into a poll average isn’t going to correct anything, especially when you consider that many of the incumbent under-estimates in the above table (especially 2001 and 2004) were primarily due to pollsters’ use of the less-accurate respondent-allocated method of estimated 2-party-preferred (which few pollsters use today).

The approval margin model is about as correlated with individual polls as the polls are with each other

I measure the correlation between polls using a modified form of the co-variance formula (full formula and calculation details below) I call correlated error. If correlated error is high, when one poll errs in a certain direction, other polls conducted at the same election will also likely err in the same direction. If correlated error is low, then polls are generally more independent e.g. whether a Newspoll over-estimates Coalition would tell me relatively little about whether a Morgan poll is likely to also over-estimate the Coalition. Correlated error can be negative at individual elections, where polling errors in opposite directions by different pollsters cancel out. For example, at the 1998 federal election (Coalition 49%), Newspoll under-estimated the Coalition (Coalition 47%) but Nielsen over-estimated the Coalition (Coalition 50%). When averaged, the over- and under-estimates roughly cancelled out (average, Coalition 48.5%), producing a fairly accurate average.

In 1998, the correlated error would indeed have been negative. However over lots of elections, the average correlated error will tend to be positive (or in other words, over many elections, polling errors will tend to be correlated).

Let CoE = correlated error

If I’m measuring correlated error between two randomly-selected polls, then I would create a vector containing each poll’s error for that election, and then generate every unique combination of two polls split into two vectors (x and y).

If I’m measuring correlated error between something else (e.g. approval margin model) and voting-intention polling, then I would create two vectors; the first (x) would contain the other thing (e.g. predicted 2pp for the relevant election, from the approval margin model), with as many duplicates as there are polls at the election. The second (y) would contain the polling errors for the election.

CoE = sum(x * y)/n

Then, I average the CoE for each election to estimate the overall correlated error.

While it’s not as intuitive as some other metrics for how-correlated-two-things-are (e.g. R2), it has the advantage of being able to be plugged into the variance formula to be used in error modelling.


The correlated error between the 2pp predicted by the approval-margin model and individual polls is 1.18, whereas the correlated error between any two randomly-selected polls from the same election is 1.23. In other words, if the polls at a randomly-selected election are likely to over-estimate the Coalition, then the approval margin model is also likely to over-estimate the Coalition (though it may do so by a slightly smaller amount).

Another way to measure this is to treat the approval-margin model as a “poll”, and add it into the polling average for each election. By this method, including approval-margin very slightly improves the accuracy of the polling average: These polling averages were calculated using the 2pp published by the pollster; however in some elections (most notably 2001 and 2004), many pollsters used the less-accurate respondent-allocated method of estimating two-party-preferred.

If you look at the table below, these two are also two of the elections which would have seen a large improvement by adding in the approval margin model, which did make me concerned that the improvement might be solely due to cancelling out errors from respondent-allocated preferences (which few pollsters still use today).

However, if I calculate 2pp estimates for all polls using the more-accurate last-elections preference flow method, the average absolute error on the simple polling average decreases to 1.25% while the average absolute error on the approval margin inclusive polling average sits at about 1.1%.

Including the approval-margin model as a “poll” has minimal effects on polling average accuracy

(if you’re on a mobile device, scroll right for full data or turn your device landscape. Click the Previous and Next buttons to view all data.)

ElectionFinal polling average error (%)Incl. approval model error (%)
2019-2.9-2.5
20160.20.1
20130.30.5
20101.21.1
2007-1.1-0.2
2004-1.4-1.5
2001-2.2-1.5
1998-0.50
19962.12
1993-1.6-2.1
1990-0.60.8
19871.71.3
1983-0.9-0.4
1980-1.4-1.8
Average error1.31.1
Figures are in two-party-preferred errors; positive values mean the model overestimated the incumbent's 2pp while negative values mean the model underestimated the incumbent's 2pp.

The difference is fairly small, and a large chunk of it is probably due to the problems with respondent-allocated preferences in 2001 and 2004 (which most pollsters have stopped using to estimate 2pp). Given how small the shift in average error is and the miniscule difference on correlated error, it’s probably fair to say that the approval-margin model does not seem to significantly improve or reduce the accuracy of the polling average and hence the final approval polling probably has little, if any, predictive value if included in a polling average.

This shouldn’t be too surprising; the approval-margin model is still based off polls after all, and there isn’t any reason why the same respondent samples and weighting methods would produce an error on one set of questions (the who-you-intend-to-vote-for ones) but no error on another set of questions (the are-you-satisfied-with-performance ones). This is something to also keep in mind when dealing with issue polling or polls conducted by pollsters who don’t release voting-intention polling – if there is a problem with voting-intention polls, the same issues almost certainly exist in other polls as well, it’s just that there’s no election to check those other polls’ figures at.

Approval margin may work as a leading indicator for voting-intention

To measure whether approval margin can act as a leading indicator (i.e. predicts future movement in voting-intention), I calculated the approval margin using polls taken approximately 2 months and 6 months out from each election, then used cross-one-out validation to retrodict the government’s two-party-preferred for each set of data.

Before I go into that, however, it’s worth noting that voting-intention polls taken far out from an election are very poor predictors of the election result. This is not to say that polls taken further out from an election are “wrong”; it may be the case that they accurately capture voting intention at the time but voting intention rapidly shifts.

For example, here’s the polling average 2 months out from each election, compared with the actual 2-party-preferred won by each government at that election (I’ve outlined elections where either the PM or Opposition Leader was changed 4 months or less from election day):

Polling average 2 months out vs election result
Solid diagonal line represents where the points would land, if the polling average 2 months out was perfectly accurate.

If we go back a bit further, the polling average has a higher error six months out and tends to under-estimate the incumbent government (elections where PM or Opposition Leader was changed up to 9 months out outlined):

Polling average 6 months out vs election result
Interestingly, the R2 is higher for the polling average six months out than it is for the polling average 2 months out. This is mostly because the polls 6 months out have a clear bias which a model could correct for (notice how most of the points are to the left of the diagonal line). However, if I use cross-one-out validation to build a model which “predicts” what the election result would be, given the polling average 6 months out, the average error would still be a very high 2.1%, while an adjustment to correct for the incumbent under-estimate would produce an average error of 2.3%.

In comparison, approval margin seems to be somewhat less affected by the passage of time than the polling average. While it is definitely less predictive 2 months out, approval margin tends to come closer to the election result than the polling average at the same point in time (elections where PM or Opposition Leader changed up to 4 months out outlined):

Retrodicted 2pp using approval margin 2 months out, vs govt 2pp result
Solid diagonal line represents where the points would be if the retrodictions were perfectly accurate. The biggest outlier in this case is the 1983 election, which shouldn’t be too surprising considering Fraser called that election early in an attempt to lock Labor into contesting with Hayden instead of Hawke.
If I exclude elections where the PM or Opposition Leader was changed less than 4 months out from the election, the average error declines to 1.48%, comparable to final-week voting-intention polling.

Even 6 months out, approval margin still has some predictive value (though it’s definitely weaker):

Retrodicted 2pp using approval margin 6 months out, vs govt 2pp result
Solid diagonal line represents where the points would be if the retrodictions were perfectly accurate. If I calculate the average error excluding elections where the PM or Opposition Leader was changed less than 9 months out from the election, there’s no change.

Although the correlation is fairly weak, the average error on a model using approval margin is still lower than that of the polling average six months out (1.7% vs 2.7%). It still remains lower even if I adjust the 6-months-out polling average for the tendency of voting-intention to move towards the government (average error 2.3%) or build a linear model using the 6-months-out polling average (average error 2.1%).

In summary:

  1. The final polling average is not skewed in either direction. Hence including approval margin modelling isn’t going to correct any pre-existing skew.
  2. Approval margin modelling using final polling acts a lot like a final voting-intention poll. Hence, including approval margin models in the final polling average is unlikely to improve accuracy over lots of elections (it may do so for some elections e.g. 2019, but it will hurt accuracy in others e.g. 1993).
  3. Approval margin models may act as a leading indicator. When we analyse polls taken several months out from the election, approval margin modelling seems to be closer to the final election result than polls taken at the same time. This advantage is somewhat reduced at 6 months out if I build a linear model using the polling average but still exists.

Hence, approval margin may have some value as a leading indicator, suggesting to us which direction the polls are likely to move as election day approaches. This might have some uses in an election forecast – e.g. including approval margin models, but giving them less weight as election day approaches. While approval ratings and the Better Prime Minister indicator are not very predictive on their own, it does appear that with some modelling, approval ratings do have some predictive value after all.

(For anyone who’s curious about what this might imply for the 2021/2022 federal election: currently (23/Aug/21), with PM Morrison at -2 and OL Albanese at -8 approval/netsat, the model for polls taken 6 months out expects election-day 2pp to be almost exactly 50-50. In other words this would seem to suggest that the current (23/Aug/21) Labor lead in the polls is likely to shrink to some extent. This isn’t exactly new information, however; historically, whoever’s leading 9 months out from an election usually loses about 60% of their 2pp lead by election day.)


Add Your Comment

* Indicates Required Field

Your email address will not be published.

*

 

This site uses Akismet to reduce spam. Learn how your comment data is processed.

× Home Politics Projects
×

You appear to be using an outdated browser, for which this site is not optimised.

For your security, we strongly recommend you download a newer browser.

Although any of the latest browsers will do, we suggest the latest version of Firefox.

Download Firefox here

(Click on the button in the top-right to close this reminder)