Don't Expect Good Decisions from Bad Data

Bad Data Bad Decisions
The contrast between poll numbers and the realism of the 2016 US Presidential election results highlight the fact that no matter how good your visualizations are or how precise and accurate your algorithms may be, bad data leads to incorrect outcomes. So long as pollsters continue to rely on human qualitative responses versus more indirect data collection based upon observed behavior and relational ties to other metrics we will never have poll data we can depend upon for correct assumptions. The focal point is an issue we've known about for decades yet every election cycle we are lured away from considering. We developed a belief that improvements in technology, analytics, machine learning, and sheer volume of data would lend accuracy to the analysis of poll data we were subjected to for the last couple years.

Known about for decades?

Yes. Since the 1980s.

The Bradley Effect
The Bradley Effect is a theory that addresses the discrepancies between voter opinion polls and actual electoral outcomes where white and non-white candidates square-off against one another. The Bradley Effect is named for Former Los Angeles Mayor Tom Bradley and his failed 1982 bid for the Governor of California. Even though leading in the polls throughout much of the campaign, Bradley, who was African American, lost to a white opponent. The effect suggests that those individuals being polled will tend towards sharing an answer that is more politically correct than their true response or action represents; they're more likely to state they are or have voted for the minority candidate though still intending to - or actually voting for the white or "more-traditional" candidate. The Bradley Effect also posits the race of the pollster factors into the answer as well if there is face-to-face interaction between the polled and the pollster. A solid line relationship could be drawn between gender and race in the relevance of the Bradley Effect to the 2016 campaigns of Clinton and Trump. (More about the Bradley Effect here.) Under the Bradley Effect once could say that polled citizens would trend towards stating their support for Clinton because they didn't want to be seen as being against voting for a woman for the United States Presidency. Likewise a dashed line could be drawn from the 2016 process to the Bradley Effect based upon not race or gender but for the polled to not want to appear supporting an individual with the reported reputation and history of Donald J. Trump.

How do We Fix This?
As a technical professional and someone with 20 years of experience with data solutions I'd say we can work towards a fix through technology - specifically iOT. It's my belief that if we can remove emotions in favor of drawing conclusions through actions when it comes to opinion poll results then we can improve the validity of forecasting outcomes. What do I mean by actions? It's my belief that we can build correlation between what people do and how they vote.

The difficulty comes in correlating the demographic data points associated with a specific voter or class of voters and actions they performed and data collected via IoT that we may be able to build algorithms around to identify how citizens that do X1, X2, X3, Xn and how they vote. If that can be solved we could then draw conclusions on voting outcomes for specific cohorts of the electorate based upon their collective history of certain actions and not their emotional response to feel better about themselves or for the benefit of telling the pollster what they assume they wish to hear. You could then apply demographic information from the polled population along with demographic data for the candidates and build algorithms to draw a more solid conclusion through machine learning rather than ever conducting a poll in favor of running a query.

I'm obviously over-simplifying the solution. The true concern to overcome in such a model is one of protection of personal privacy. How do we track and capture these actions then tie them to specific individuals without ever identifying the individuals? It's not how do we build these algorithms or derive results. It will be interesting to see if we move towards something like this as IoT, machine learning, and analytical software evolves and the rapid cadence it's been moving in the last 4 years. I suspect we'll definitely see dramatic changes in how we measure electorate sentiment and more importantly how that information is collected.

Or in four years I'll be back here discussing how we tend to consistently forget the lessons from which we should be learning.

Comments

Plain text