- Data visualizations look good but may not represent sound science
- Big data can exacerbate errors and bias
- Skepticism and interrogation is key to identifying junk data
They track meteors, show you global temperature change, and discover the hardest working people in America. Sometimes they’re even interactive, letting you engage with and personalize your experience.
Well-designed data visualizations take complicated data and important science stuff and transform it into slick graphics that make complex topics easy to grasp in an instant.
But should we trust them?
Maybe not, says Jevin West, assistant professor in the Information School at the University of Washington. Together with theoretical and evolutionary biologist Carl Bergstrom, the two have put together a popular course for undergrads about ‘calling bullshit’ in the age of big data.
“Graphs and charts are everywhere,” says West. “Unfortunately, not all organizations that produce them conform to the same degree of quality — which is why we need to be vigilant.”
The big data movement has intensified the quantification of our world, from hiring new employees to predicting the next big box office hit. Numbers carry a veneer of authority that are questioned less than they should be.
“I believe in the power of big data,” says West. “But I think that it has been oversold the last several years.”
Seduced by the idea that bigger always equals better, we tend to assume that big data has fewer quality issues. In reality, some systematic errors, such as sampling bias, are magnified as the sample size grows.
And big data’s massive nature actually encourages seeing patterns where none exist because the sheer number of data points offers the possibility of connection in nearly infinite directions. Analysts can sort through the pile and pick the statistics that confirm their beliefs and ignore the rest.
Technological advances and changes in publishing have made it possible for nearly everyone — not just scientists — to produce, organize, and share large quantities of data without the gates of peer review and editorial selection.
“Thousands of start-up companies want you to believe they can solve any question, at any scale, and in sub-zero time,” says West. “We want the public to be a little more skeptical.”
No one is immune
Like others in the scientific community, Bergstrom and West have developed their critical thinking about data through years of training. But even they can make mistakes.
“BS in the research community is a HUGE problem,” says West. “Researchers are humans and therefore governed by the same incentives and reward structures as the general public. We can all fall prey to confirmation bias, even well-trained scientists.”
They point to a 2016 study that claims to have developed algorithms that can distinguish criminals from non-criminals using only a headshot photograph.
In an analysis used in their course, Bergstrom and West point out several places where the study went astray, including a potentially biased training set for the neural network, derived from the sourcing of the photographs.
“You don’t need to know all the details of a fancy machine learning algorithm. Much of the time, all one needs is common sense to ascertain the plausibility of results,” says West.
Being our own gatekeepers
So how do we keep from being overwhelmed by numbers and bamboozled by fancy graphic?
West suggests that one way to keep data honest is to combine big data methods with qualitative work. Viewing the wealth of big data at our fingertips not as a substitute but as a supplement to traditional analysis is the first step in avoiding big data hubris.
The other is vigilance.
“The scientific literature is too vast for any one individual to be an expert in everything,” says West. “So we must continue to be skeptical of data, and try to break it with different interrogations.”
Remember, if something sounds too good to be true, it probably is.