I'm focusing a lot on BC recently. It's the next scheduled election and it allows me to try new things. If you have read my recent posts, you already know that one improvement to the model is to now account which parties are gaining and which ones are losing compared to the last election. For instance, are the Liberals going down because the NDP is up, or is it because the Conservatives are increasing? Depending on the situation, the geographical swing will not be the same. In BC, if the Liberals lose most of their votes to the CP, then they will drop massively in the interior. On the other hand, if they lose votes to the NDP, they will also be in danger in Vancouver and the lower mainland for instance.
The next improvement is to add uncertainty to the model. I still mostly see my job as transposing percentages into seats. But obviously, in order to do so, we need the correct percentages. I use the average of the polls to estimate the percentages, but as we've seen in recent elections, it can be quite wrong. Of course, for the more advanced readers of this blog, I also include the simulator (not yet for BC though, it's coming soon) which allows you to enter your own percentages.
But at the end, a lot of readers, especially during a campaign, simply come here and look at the projections. If I only display seats, it doesn't tell them whether this outcome is likely or not. So let's look at the sources of uncertainty and how I try to take them into account.
1. The polls.
The polls give us a snapshot of the current vote intentions. But these are given with ucnertainty. First of all, there is the natural statistical variation. Even if two polls are conducted using the same methodology during the same period, they will not necessarily give us the same results. And this is normal, especially with samples of only around 1000 respondents. The margins of errors are here for that (well, 19 times out of 20). So how can I include that in the projections? Well, I first average the polls (with some wights depending on whether I trust this pollster more, or if the sample size is bigger, if the poll is more recent, etc etc). Coupled with the sample size or these polls, this gives me a multinomial distribution. After that, I now simulate a large number of draws from this distribution. In other words, if I have the Conservatives at 35% in average, with a sample size of 1000, this party could be at 32% or 38%. Same for the other parties. So I randomly draw a 1000 times from this disbribution. In average, the Conservatives will still be at 35%, but they will sometimes be much higher or much lower. For every draw, I then run the projection model. In clear: I now run a 1000 projections instead of just doing it once as before.
Notice that there is a second source of uncertainty with polls: the methdology and "house effects". Not all polls are equal and some use internet-based survey, others use antomated phone calls, some reach landlines only, etc etc. The various methodologies have an impact on the results. In order to correct for this, I use weights or adjust the polls directly. But for now, let's ignore this fact.
2. The distribution or efficiency of the vote.
Even if we had the exact and correct vote intentions (or percentages), we still couldn't correctly predict all the ridings with 100% accuracy. Even by using the results of the election, I can't have perfect projections (I can be mostly right in average, but I'll sometimes overestimate a party by 2 points, and sometimes underestimate it by the same margin). Even after taking the region or the incumbency effect into account, there is still some variation. A party can be lucky and gets its vote out where it matters. During the last Quebec election, the Liberals not only got a result much higher than what the polls predicted, but their vote was very efficient. Indeed, they won a large number of the close races they were involved in.
Therefore, in order to account for this source of uncertainty, I add a random term to each projection, for each party in each riding. This random term is drawn from a truncated Normal distribution, with a range of -3% to 3%. Over the 85 ridings, the added random terms cancel out each other (for instance, I could project that the Liberals would be slightly higher than expect in one riding for one simulation, but this party will then be projected lower than expected in another one). Because the added random terms have mean zero, it doesn't change the average, but it allows for additional outcomes. Again, think about the Liberals in Quebec earlier this year: they almost won the election thanks to the fact they received more votes than what was expected by the polls, but their vote was also efficient in the sense that it got out where it counted.
By running a 1000 simulations (I could run much more, but for now, 1000 seems right and enough), I actually have a 1000 projections in each riding. The projections vary a lot, thanks to the variations in vote intentions (the percentages) and the disribution of the vote (the added random term). At the end, the most likely scenario is still the one I use to display on my site. This is the one where we simply average the polls. But we now have a full distribution of outcomes and we can calculate probabilities.
Why is that useful? Well, let's look at the latest Ipsos poll for BC. They have the NDP at 48%, the Liberals at 35% and the Conservatives at 8%, with the Green at 7%. If I simply enter these percentages into the model, I get the following projections:
BC NDP: 61 seats
BC Liberals: 22
2 independants
So a comfortable majority for the NDP despite the Liberals coming back to the mid 30's. I could then show you the projections in each riding, like here. But it wouldn't give you any sense of uncertainty. For instance, is there a chance that the Liberals could win? Or what are the chances the NDP will win a particular riding?
In order to answer these type of questions, we need the 1000 simulations. Here you can find the probability of winning a riding for each party. Please note that right now, for the simulations, I exclude the potential 2 independants. It'll be fixed soon. As you can see, most ridings are currently projected to go to one party a 100% percent of the time. What this means is that even by fluctuating the parties within their margins of error, and even by adding the additional source of uncertainty with the efficiency of the vote, a lot of ridings just aren't currently competitive. This is because the NDP's lead is just too big at the moment (and the Liberals and Conservatives are splitting each other). But if the Liberals were to climb back a couple more points, we could have a lot more close races.
Maybe a little bit surprising is the fact that the Conservatives aren't projected to win a single riding. Not even out of a 1000 simulations. This is because at only 9%, they are way too far behind the other two parties. Even in Kelowna where they are projected at 24%, there isn't a single scenario where they'd win this riding. So even by being at the top of their margins of error (around 10-11% province-wide) and having a greater than expected swing there thanks to vote efficiency, they would still not win it. Please, notice that this is using this poll from Ipsos Reid only. And this is for now. A lot of things can happen between now and next May.
Nevertheless, despite the general lack of competitive ridings, out of a 1000 simulations, we got a couple of interesting scenarios. For instance, there is one scenario where the Liberals would get pretty close, with 38 seats. How is that possible? Well, in this simulation, they would beat the polls and receive 38.5% of the vote, while the NDP would do worse than the polls with 44.5% (i.e: both parties are at the very extreme of their margins of error, even slightly further). On top of that, the Liberals would be very efficient in their vote, allowing them to turn 38.5% into as many as 38 seats, while the NDP would only get 47. Still a majority, but a much closer election. But a very unlikely scenario, one that happened one out of a 1000 simulations. But one scenario that can be totally excluded given the current information.
With the percentages from the Ipsos Reid poll though, there isn't a scenario where the NDP doesn't win (and doesn't win a majority. One way to see this is to see that the NDP has enough ridings they are projected to win a 100% of the time). However, if the NDP was to drop at 45% for instance, then there would be outcomes, from the 1000 simulations, where the Liberals could win. It'd would be rare and thus unlikely, but not impossible.
It was a long post, but I needed it to show you how the added uncertainty works. During the election, from now on, instead of simply displaying the seats, I'll display the probability of winning the election (as well as possibly, the probability of a majority governement). It might not be very useful in BC if the NDP keeps its current lead, but it'd be very useful for the next Quebec, Ontario or federal election.