Sunday, December 4, 2016

Technology & Election 2016 part 3 – The Failure of Data Science?

The first reaction on election night, November 8th, 2016 was – what, how did that happen? The entire country and in fact the whole world was more or less shocked at the unexpected outcome. But why was it so unexpected? The top level answer to that is simply that nearly every major poll or projection turned out to be wrong. What kind of numbers are we talking about? For example, Nate Silver’s FiveThirtyEight blog projected Clinton to win by about 5% in the popular vote (with 71% certainty). If we drill down to the state polls we see that the projections showed all 3 key turnover states, Wisconsin, Michigan and Pennsylvania going to Clinton by 3 to 5% in each contest. This is significant because many of these projections (and blogs like FiveThirtyEight) were using aggregates of dozens or hundreds of polls, not just one or a handful and the differences exceeded the margin of error.
Let’s step back for a moment and talk about the typical role that Data Science plays in the election process today. This role encompasses several well-known and some lessor known functions, including:
  • Predictive polling
  • Exit polling
  • Predictive modeling
  • Vote targeting (which facilitates a sort of CRM for campaign marketing as well as Get Out the Vote efforts)
There’s nothing new about polling, it’s been around for a long time. In fact, the last time there was a collective shock like this year’s outcome was in 1948, when the polls had predicted Dewey would win (by 50 to 45% - but Truman won by 50% to 45%). Polls have improved since then and of course, now we have the benefit of the latest data technology as well as 70 years of added experience, so how did nearly every major poll get it wrong this year? There are some theories; they include the following:
  1. A lot of people changed their minds at the last moment and weren’t particularly firm in their previous opinions.
  2. The Russians did it.
  3. Many polls were not properly targeting prospective voters for their models.
  4. Many people who had said they were voting for Clinton didn’t turn out to vote at all (e.g. the lack of enthusiasm)
  5. The poll numbers for the 3rd party candidates may have been inflated, and when it came to election day these candidates received far fewer votes that had been predicted (the implication being they went ahead and voted for one of the main candidates with Trump being the main beneficiary).
To be honest, we may never have a fully satisfactory answer for what happened in the 2016 election. It is likely that we’ve never had a race where both of the main candidates were universally unpopular and that kind of situation might never happen again (and there’s no telling how that may have impacted the polling results). How do we move forward then? Did technology, did Data Science fail us in 2016? Maybe, but probably not. What we witnessed however is that technology is only as good as our ability to apply it. If situations become more dynamic, complex within a relatively short window, do we stick with what we know or do we adjust our models or practices?
I’d like to step back for a moment here and ask a larger question. Do we really want ironclad predictions before elections in the first place? Before the big upset on election night, many pundits were talking about the lessons learned from the 1980 election where results from the East Coast encouraged West Coast voters to stay home thinking their votes didn’t really count. Because of that, the FEC passed a rule prohibiting networks from announcing winners before certain polls close. Don’t polls predicting a sure outcome before an election have a similar chilling effect? This year for example, how many voters may have stayed home because while they weren’t terribly enthused about Clinton, thought she would win? That’s a hard question to ask because people who don’t show up to vote can’t be interviewed in exit polls (or at least typically aren’t interviewed as part of the election post mortem).
What can we do in Future to avoid getting surprised?
I think it is important that in coming up with suggestions here, we need to weigh the relative value of using a particular solution with the potential impacts of using it. In other words, if we view predictive polling as a relativistic activity (e.g. a bit like Heisenberg’s principle in that taking a measure can influence the outcome), then we might conclude that the highest value of such predictive polls relative to possible harm might end several weeks before the election. How then could we be assured that elections are honest, that public sentiment is in fact aligned with election results? Well, that can still come through exit polling – polls taken of actual voters on election day and in addition, sentiment polls taken the day after the election of potential voters who didn’t vote (something that doesn’t typically happen now).  
Suggestion – Place a moratorium on predictive polls at least 2 weeks prior to election day and preferably 4 weeks prior. Why would this work? Here are few potential benefits of doing it:
  • This has the immediate effect of not making the election as much of a horse race and more of a contest of ideas.
  • It has at least the potential of driving up voter participation – a lot can happen in 4 weeks, people can vote based on their opinions and do a little less hedging in making their decisions.
  • It helps to combat Group-think (people swarming to the anticipated victor or becoming despondent about their own preferred candidates)
  • It certainly eliminates the main source of any potential surprise.
  • It may encourage candidates to take a more expansive view towards courting voters, recent trends have focused way too much attention on potential swing states and districts.
As you can tell, this and potentially other suggestions may have relatively little to do with Data Science itself, but have everything to do with how we apply it to given situations. I don’t believe the technology failed us here, I think we failed to recognize how much it already influences election outcomes. In my next post in this series, I’ll talk a bit more about Get Out the Vote (GOTV), politics as demographics and how technology has been and will be used to manage campaigns.

Copyright 2016, Stephen Lahanas 

0 comments:

Post a Comment