Q: How does the model work?
A: The model uses past election results as well as current polls in order to project the level of support of each party in each riding. Basically, it transposes province-level percentages into riding-level ones. To do so, we use statistical methods in order to estimate coefficients based on the past elections. In particular, the models include regional as well as incumbency effects. In some cases, we also include other factors, such as being the leader of a party or a "star" candidate. A good example of this would be the bonus given to Elizabeth May as the leader of the Green party.
Without going into the technical details, the idea is to compare the riding-level swing of a party to its province-level swing. For instance, when the Conservative Party of Canada increased by 1-point provincially in Quebec, we observed that this party actually increased by much more in some regions (Quebec city for instance) but by less than 1 point in other regions (for instance Montreal). The transposition of the provincial swing into a riding level one is usually influenced by the region, as well as being the incumbent. We also account for where the swing comes from. For instance, if the Bloc Quebecois drops by 10-points in favor of the Conservatives, this is a different story than if the 10-points go to the NDP. The regional swings will not be the same. Our coefficients are estimated to take care of this.
Q: Where do the probabilities come from?
A: We indeed also provide the probabilities for each party to win a specific riding as well as the election (i.e: getting the most seats). To do so, we run simulations (between 1000-5000, sometimes more). Specifically, here is the step-by-step process.
1) Using polls (one or many), we estimate the current voting intentions (usually province-wide). This is usually done by averaging the most recent polls. In some cases, we might adjust the percentages from the polls, for instance if we have reason to believe that one or some polls are biased. For instance, we usually adjust for the over-estimation of small parties.
2) Once we have these percentages, we need to acknowledge the first source of uncertainty with regard to the electoral outcome: polls aren't 100% accurate. Even without any methodological problem (which polls have), polls still have margins of error. So when a party is polled in average at 35%, statistically speaking, this party could be as high as 38% or as low as 32% (depending on the sample size, etc). Basically, the percentages from step 1 can be seen as parameters of a multinomial distribution. We thus sample from this distribution as many times as we run simulations. In our example, the party will in average receive 35% of the votes, but will actually vary above and below this average according to the distribution function. The nice thing about doing this systematically with a computer is that it automatically takes into account the fact that if one party is at the lower bound of its margins of error, then another party must be at its upper bound. In our example, if you have another party polled at 30%, then it is possible for this party to actually be above the party polled at 35%. It's unlikely and doesn't happen many times out of all the simulations, but it is possible. Currently, we try to have margins of error of around 5 points (for a party polled around 40%). This is much higher than the usual margins of error of any given poll. It's equivalent to having a sample size of 400 respondents. Given that we average polls (and thus get a much bigger sample size), why do we decide to have so much variation? The answer is because polls have had a tendency to be wrong quite a lot in recent elections. Therefore, we'd rather have too much uncertainty than not enough. Alternatively, you can see our margins of error as accounting for both the normal statistical variation as well as the bias and methodological bias of polls. As a reminder, the polls understimated the Liberals in Quebec by a good 4 points in 2012. They also had the Conservatives a good 8-10 points too low in Alberta or the BC Liberals around 6-7 below their electoral results.
3) One we have the province-wide percentages for each party, we use them in our model. Basically, we run thousands (same number as the simulations) projections in each riding for each party. This is also where we account for the second source of uncertainty: the electoral system and the distribution of the votes. Even if we had the actual percentages of the election, we couldn't correctly predict every single riding and therefore the number of seats. Despite our best efforts to estimate coefficients to take into account the region, the incumbency and other factors, there is still quite a lot of unexplained variation at the riding level.
Therefore, we sample a second time. In each riding, we sample from the multinomial distribution with parameters given by the projections and the model. In other words, if one party is projected at 45% in one riding, this party could actually be at 48% or only at 43%. As we said, this variation can be seen as the natural variation that occurs because of the distribution of the vote. Or, alternatively, this is to account for the fact that we can't be a 100% sure of the riding-level outcome with only province-level information.
In this step, we have even more uncertainty than at the provincial level. We want to make sure that our ranges really account for all possible results. Based on past results, within a region, a party's swings usually have ranges of as much as 12 points (meaning that within a region, one party could have experience a swing of (say) -2 points while, in another riding, the swing would have been -14 points). So there again, we sample with a small sample size in order to have a lot of volatility.
4. The steps above give us thousands of simulations, for each party in each riding. These simulations have quite a lot of variations (thanks to our double sampling process and our coefficients). In average, the projections are the same as the one we would get by simply entering the percentages from step 1 into the model (or simulator). However, we now have an idea of the possible variation. We can see, out of all the simulations, how many times a party is projected to win a given riding. If a party only wins (say) 10 times out of a 1000, it means that technically this party can indeed win this riding, but it's unlikely.
Remember though that being polled at 35% and receiving 35% of the vote on election night are two different things. With the former, we still need to account for the possible mistakes of the polls due to the margins of errors. In the latter, we would only need to account for the variation due to the distribution of the vote and the unexplained riding-level variation. When we say that a party has a 10% chances of winning a riding for instance, this includes the variation due to polls. With the actual percentages from the election, the possible outcomes at the riding-level don't have as many possible variations.
Without going into the technical details, the idea is to compare the riding-level swing of a party to its province-level swing. For instance, when the Conservative Party of Canada increased by 1-point provincially in Quebec, we observed that this party actually increased by much more in some regions (Quebec city for instance) but by less than 1 point in other regions (for instance Montreal). The transposition of the provincial swing into a riding level one is usually influenced by the region, as well as being the incumbent. We also account for where the swing comes from. For instance, if the Bloc Quebecois drops by 10-points in favor of the Conservatives, this is a different story than if the 10-points go to the NDP. The regional swings will not be the same. Our coefficients are estimated to take care of this.
Q: Where do the probabilities come from?
A: We indeed also provide the probabilities for each party to win a specific riding as well as the election (i.e: getting the most seats). To do so, we run simulations (between 1000-5000, sometimes more). Specifically, here is the step-by-step process.
1) Using polls (one or many), we estimate the current voting intentions (usually province-wide). This is usually done by averaging the most recent polls. In some cases, we might adjust the percentages from the polls, for instance if we have reason to believe that one or some polls are biased. For instance, we usually adjust for the over-estimation of small parties.
2) Once we have these percentages, we need to acknowledge the first source of uncertainty with regard to the electoral outcome: polls aren't 100% accurate. Even without any methodological problem (which polls have), polls still have margins of error. So when a party is polled in average at 35%, statistically speaking, this party could be as high as 38% or as low as 32% (depending on the sample size, etc). Basically, the percentages from step 1 can be seen as parameters of a multinomial distribution. We thus sample from this distribution as many times as we run simulations. In our example, the party will in average receive 35% of the votes, but will actually vary above and below this average according to the distribution function. The nice thing about doing this systematically with a computer is that it automatically takes into account the fact that if one party is at the lower bound of its margins of error, then another party must be at its upper bound. In our example, if you have another party polled at 30%, then it is possible for this party to actually be above the party polled at 35%. It's unlikely and doesn't happen many times out of all the simulations, but it is possible. Currently, we try to have margins of error of around 5 points (for a party polled around 40%). This is much higher than the usual margins of error of any given poll. It's equivalent to having a sample size of 400 respondents. Given that we average polls (and thus get a much bigger sample size), why do we decide to have so much variation? The answer is because polls have had a tendency to be wrong quite a lot in recent elections. Therefore, we'd rather have too much uncertainty than not enough. Alternatively, you can see our margins of error as accounting for both the normal statistical variation as well as the bias and methodological bias of polls. As a reminder, the polls understimated the Liberals in Quebec by a good 4 points in 2012. They also had the Conservatives a good 8-10 points too low in Alberta or the BC Liberals around 6-7 below their electoral results.
3) One we have the province-wide percentages for each party, we use them in our model. Basically, we run thousands (same number as the simulations) projections in each riding for each party. This is also where we account for the second source of uncertainty: the electoral system and the distribution of the votes. Even if we had the actual percentages of the election, we couldn't correctly predict every single riding and therefore the number of seats. Despite our best efforts to estimate coefficients to take into account the region, the incumbency and other factors, there is still quite a lot of unexplained variation at the riding level.
Therefore, we sample a second time. In each riding, we sample from the multinomial distribution with parameters given by the projections and the model. In other words, if one party is projected at 45% in one riding, this party could actually be at 48% or only at 43%. As we said, this variation can be seen as the natural variation that occurs because of the distribution of the vote. Or, alternatively, this is to account for the fact that we can't be a 100% sure of the riding-level outcome with only province-level information.
In this step, we have even more uncertainty than at the provincial level. We want to make sure that our ranges really account for all possible results. Based on past results, within a region, a party's swings usually have ranges of as much as 12 points (meaning that within a region, one party could have experience a swing of (say) -2 points while, in another riding, the swing would have been -14 points). So there again, we sample with a small sample size in order to have a lot of volatility.
4. The steps above give us thousands of simulations, for each party in each riding. These simulations have quite a lot of variations (thanks to our double sampling process and our coefficients). In average, the projections are the same as the one we would get by simply entering the percentages from step 1 into the model (or simulator). However, we now have an idea of the possible variation. We can see, out of all the simulations, how many times a party is projected to win a given riding. If a party only wins (say) 10 times out of a 1000, it means that technically this party can indeed win this riding, but it's unlikely.
Remember though that being polled at 35% and receiving 35% of the vote on election night are two different things. With the former, we still need to account for the possible mistakes of the polls due to the margins of errors. In the latter, we would only need to account for the variation due to the distribution of the vote and the unexplained riding-level variation. When we say that a party has a 10% chances of winning a riding for instance, this includes the variation due to polls. With the actual percentages from the election, the possible outcomes at the riding-level don't have as many possible variations.
Q: How is that different from other models?
A: There exists a lot of models and they usually share the core principle: the results in a specific riding this election are a function of the past election's results as well as of the current level of supports for each party. Simplistic models usually use an uniform or proportional variation. For the former, it means that if a party's support increases by 2 percentage points in a province (between the last election and now), the level of support for this party will increase by 2 points in every riding. For the latter, the variation in every riding is proportionnaly the same as the provincial one (i.e: if a party's level of support increases by 5% (not percentage points!), then the party will increase its share of votes by 5% in every riding or, if you prefer, it will be multiplied by 1.05.) Those two models then simply assume the form of the variation and do not allow any region-specific effects. Some regional considerations exist in some model, usually by using the regional breakdown of polls. A valid method, but a limited and uncertain one as sample sizes by regions are usually quite small.
To our knowledge, we are the only site offering such advanced projections. No other site and/or model includes regional, incumbency and other effects, as well as taking into account where the swing comes from. As for the probabilities, we are the only one to get them based on a high number of simulations. It doesn't mean that we'll necessarily be right or more correct. But it means that given the information available to us before the election, we are the one who tries the hardest to use this information.
To our knowledge, we are the only site offering such advanced projections. No other site and/or model includes regional, incumbency and other effects, as well as taking into account where the swing comes from. As for the probabilities, we are the only one to get them based on a high number of simulations. It doesn't mean that we'll necessarily be right or more correct. But it means that given the information available to us before the election, we are the one who tries the hardest to use this information.
Q: Is it reliable?
A: Yes and no. It does work very well but the accuracy is largely dependent on the reliability of the polls. Indeed, if the polls estimate the Liberals at 30% but on election day this party gets only 26% of the votes, the projections will be biased. On the other hand, thanks to the huge number of opinion polls available during a campaing (especially during a federal election where we have around 4-6 polls every week), we usually have a good idea of where the parties stand.
This is also why we allow users to use the model themselves. For our own projections, we use an average of polls. But some people might believe more in one pollster (Angus-Reid seems to have been quite close to the actual results during the last couple of election, with Nanos in close second) or simply believe that the polls are wrong Therefore, you are free to make your own projections using the percentages YOU believe are the most accurate.
With the simulations and the probabilities, we try our best to overcome the possible mistakes from the polls. When we sample from the distribution based on the average of the recent polls, we actually account for the possibility that polls could be wrong. But no matter what we do, our projections are heavily influenced by the polls. It's natural as this is the most up-to-date and available source of information during a campaign. And except in some few and rares cases (like the Alberta election of 2012 or BC in 2013), polls are not usually that wrong.
With the simulations and the probabilities, we try our best to overcome the possible mistakes from the polls. When we sample from the distribution based on the average of the recent polls, we actually account for the possibility that polls could be wrong. But no matter what we do, our projections are heavily influenced by the polls. It's natural as this is the most up-to-date and available source of information during a campaign. And except in some few and rares cases (like the Alberta election of 2012 or BC in 2013), polls are not usually that wrong.
Q: Can I use the model to see what would happen if the Bloc didn't exist or if the NDP was at 55%?
A: The model can handle this, but you have to remember that it will be less reliable. This is pure extrapolation and therefore, the past observations are less useful in order to make predictions. The rule of thumb is to stay within + or - 10% of the highest and lowest score of each party observed during the last three elections. Beyond these margins, the model begins to extrapolate.