A Defense of Election Forecasting Models

All models are wrong, some are useful, many will be misinterpreted -- and that's OK

Facebooktwitterredditpinterestmail

KEY POINTS FROM THIS ARTICLE

— While there is a legitimate debate about the public utility of forecast models, these are tools people in politics, finance, and media use to understand the political environment and make informed decisions. These private goods have corresponding public ones.

— Campaign forecasting models, at least the Decision Desk HQ/Øptimus Analytics model, do more than provide horse race numbers. We aggregate important data such as polling, FEC reports, and historical results in one place for easy public consumption.

— The goal of a forecast model is not to replace someone’s thinking but to offer a tool to better inform their thought process and ability to reach a conclusion.

The value of modeling elections

It’s been a long four years for U.S. election pollsters and forecasters. I have had countless conversations with people about whether the 2016 polls were wrong, whether the FiveThirtyEight model was totally off, and how anyone can come into 2020 with any measure of certainty about what’s going to happen. Beyond my own interactions, the general public has expressed skepticism of the polls and of forecasting models too.

With the last four years hanging over our heads, I and the team at Decision Desk HQ and Øptimus are releasing our 2020 Presidential, Senate, and House forecasting models as a follow-up to our 2018 Senate and House forecasting models. We are hopeful our perspective is useful, but we are self-aware of our model’s shortcomings and prepared for the onslaught of criticism they will inevitably attract.

We are also fully expecting our forecast to be misinterpreted.

The piece published in this outlet by Natalie Jackson correctly points out that election models fail to capture all the uncertainty in elections. Jackson’s article sparks a healthy discussion around whether forecasting models like ours do more harm than good for the public and the democratic process in general.

I think election forecasts like ours, despite being inevitably wrong and frequently misinterpreted, have a net positive impact on the public discourse surrounding elections. While the analysis provided by forecasting models may not be easily interpretable to everyone without experience studying statistics, I don’t think that means the analysis should be withheld from the public. Logically, I also think Jackson’s point on the utility of making these forecasts public is a good discussion to have, out in the open, and hopefully for many election cycles to come.

If all models are wrong, why release them?

I have a saying around the office that having blind spots in the data is OK as long as we know what the blind spot is and that we can properly account for it in our calculations — or, at the very least, in our analysis. The same goes for our forecasting models. This cycle, we know our model will have a tough time accounting for the impacts of COVID-19; namely, the economic impacts are at scales that previous elections do not cover, and turnout will be affected in different ways in each state based on how government officials decide to respond. Even without COVID-19, our models are heavily reliant on polling that by its nature is based on a sample frame that we are modeling at best and guessing at worst.

Despite those blind spots, we believe our model is still informative to readers and does a good job of capturing uncertainty and providing insight into what is going on. Our confidence comes from our internal testing and public track record of forecasting. Our model in 2018 for the House and the Senate performed very well. Our mean prediction for the House was 233 Democratic seats, and Democrats ended up winning 235. In the Senate, our mean prediction had 52 Republican seats, and Republicans ultimately won 53. Overall, the 2018 model’s accuracy was 94% in the Senate and 97% in the House. And outside of toss-ups, the model “missed” only one individual Senate race and four House races.

In addition to these top-line predictions, our model’s web pages provide readers with other data and tools to become more informed. We consolidate campaign finance data, public polling, and race background information into our individual race views, and also include commentary from a wide variety of political observers ranging from “Election Twitter” to academia and beyond. Additionally, we strive to show readers the inner workings of the model, providing transparent methodological details and race-by-race summaries of which variables carry the most weight. I believe this approach epitomizes the goal of a model: to provide some quantitative insight that the reader can ingest and analyze alongside all of the other data and analysis available to the public.

Our job is to provide the analysis and it’s the public’s job to interpret it. We try hard to do our job and my hope — perhaps an overly optimistic one — is that the public does its job by interpreting the data critically and thoughtfully.

Can the public handle the truth?

For over a decade, I have worked to build models for political campaigns who want to win, and financial organizations that want to quantify political risk. Through that experience, I have seen how models are effective at forecasting and predicting outcomes if interpreted and used correctly. If political models can provide this value to private interests, I believe they can be just as valuable to the public.

Critical thinking and analysis of information is as important as learning how to read or do basic arithmetic. It is especially important in the context of elections where pre-existing biases are deeply embedded in what information we expose to ourselves, how that information is presented to us, and how receptive we are to it.

In 2016, some people who read the Huffington Post model, the FiveThirtyEight model, the RealClearPolitics polling average, or admittedly my own on the record comments, took those probabilistic assessments of Trump’s chances of winning and misconstrued them as a binary assessment definitively proclaiming he would lose. My naive hope — dare I say, prediction — was that most of those people learned from their past mistakes and looked inward at their own understanding of what they were reading instead of blaming the people who worked on the analyses for being wrong.

So how is DDHQ/Øptimus responding to a post-2016 forecasting world? Our model strives to be methodologically transparent and paywall free. This cycle, we are submitting our model to the Harvard Data Science Review to be put up against other modelers across the spectrum where accuracy will certainly be a focal point, but all of us were asked to write lengthy pieces describing our methodologies and how we handle everything from uncertainty to public interpretation.

All modelers strive to be right, but ultimately, we hope the public will interpret our data with a skeptical eye and use it to refine — not define — their own viewpoints.

Scott Tranter is head of Data Science for Decision Desk HQ and founder of Øptimus Analytics, a Data Science firm. He was Data Science Director for Marco Rubio for President in 2016 and is currently an adjunct professor at the American University School of Public Affairs teaching quantitative research methods. Views expressed herein are his own and not representative of any employer, past or present.