On June 7, five states — California, Montana, New Jersey, New Mexico, and South Dakota — will hold primary elections. It is the last major day of primaries of 2016, and with the Republican race already decided, almost all of the attention will be focused on the Democratic side, where 676 pledged (elected) delegates will be at stake in those five states.
As of now, Hillary Clinton holds a seemingly insurmountable lead of about 270 pledged delegates over Bernie Sanders, and a much larger lead when unpledged “superdelegates” who have endorsed her are added to her total. Nevertheless, Sanders hopes that winning decisive victories in these contests, especially in California with its 475 pledged delegates, will dramatically reduce his deficit in pledged delegates and help him to persuade a large number of superdelegates to switch sides.
So what does the Democratic primary forecasting model predict for the June 7 primaries? The model uses four variables to predict Clinton’s margin: the percentage of a state’s primary electorate made up of African Americans, the percentage of a state’s primary electorate made up of Democratic identifiers, whether a state is in the South, and the number of other states holding primaries on the same day.
Table 1 displays the estimated weights for these predictors based on data for states that have held primaries to date and for which exit poll data are available. Once again, Sanders’ home state of Vermont is excluded from the analysis because of the very large home state advantage that he enjoyed there.
Table 1: Regression results for Democratic primary forecasting model
Source: Exit polls and data compiled by author
The results in Table 1 show that all four predictors have substantial and statistically significant effects on Clinton’s margin in the Democratic primaries. The greater the African-American share of the electorate, the greater the proportion of Democratic identifiers, and the larger the number of states holding primaries on the same day, the better Clinton does. In addition, Clinton does substantially better in the South than in other regions of the country. The model explains an impressive 92% of the variance in Clinton’s vote margin.
Figure 1 displays a scatterplot of the relationship between Clinton’s predicted vote margin and her actual vote margin in these primary states. Each point in this graph represents a state and most of the points are very close to the prediction line. However, there are a few fairly big misses including Ohio, where the model underestimates Clinton’s margin, and North Carolina, where the model overestimates her margin.
Figure 1: Actual Clinton margin by predicted Clinton margin in Democratic primaries
Source: Exit polls and data compiled by author
Of course, in order to make a prediction in advance of a state’s primary election, we have to rely on estimates of the share of Democratic primary voters who will be African American and Democratic identifiers. For this, I use the results of exit polls from the most recent contested Democratic presidential primaries, those which took place in 2008. I have used this technique to predict the outcomes of 12 primaries since March 15. The model correctly predicted the winner of nine of these 12 primaries with an average margin of error of 6.5 points. Two of the misses occurred in states where the outcome was extremely close: Kentucky and Missouri. And the model actually outperformed pre-primary polls in Pennsylvania, Rhode Island, and Oregon.
Table 2 displays predictions for the five Democratic primaries taking place on June 7. The model predicts a big win for Sanders in Montana, a state with a large share of independents and very few African Americans, and a fairly comfortable win for Clinton in New Jersey, a state with a much smaller share of independents and a substantial African-American electorate. Three states are expected to be close, including the one that is by far the biggest prize of all: California.
Table 2: Predictions for June 7 Democratic primaries
Source: Data compiled by author
How do these forecasts compare with recent polls? There have been no polls to date in Montana, New Mexico, or South Dakota. In California, though, the forecasting model is predicting a much closer race than the polls. According to RealClearPolitics, the current polling average as of Wednesday afternoon gives Clinton a nine-point margin over Sanders, while the model gives her just a two-point lead. The model predicts a closer race than the polls do in New Jersey as well, giving Clinton a 10-point margin while the RealClearPolitics average gives her a lead of 17 points.
The forecasting model is clearly more optimistic than recent polls about Sanders’ chances in the two most important states voting on June 7. But the model gives Sanders almost no chance of achieving the sorts of landslide victories that he needs on the last major day of voting to dramatically reduce Clinton’s lead in pledged delegates. Indeed, based on these forecasts, it seems almost certain that Clinton will have won enough delegates to clinch the Democratic nomination on Tuesday.