• Subscribe

The wild west of big data

Speed read
  • Data visualizations look good but may not represent sound science
  • Big data can exacerbate errors and bias
  • Skepticism and interrogation is key to identifying junk data

You’ve seen them: Fancy, candy-colored visualizations that make you smart and knowledgeable in just fifteen seconds of watching data zigzag across a chart on your screen.

They track meteors, show you global temperature change, and discover the hardest working people in America. Sometimes they’re even interactive, letting you engage with and personalize your experience.

Well-designed data visualizations take complicated data and important science stuff and transform it into slick graphics that make complex topics easy to grasp in an instant.

But should we trust them?<strong>Jevin West</strong> wants his students to be able to separate fact from fiction in our new big data era. Courtesy Jevin West.

Maybe not, says Jevin West, assistant professor in the Information School at the University of Washington. Together with theoretical and evolutionary biologist Carl Bergstrom, the two have put together a popular course for undergrads about ‘calling bullshit’ in the age of big data.

“Graphs and charts are everywhere,” says West. “Unfortunately, not all organizations that produce them conform to the same degree of quality — which is why we need to be vigilant.”

Pick’n’mix

The big data movement has intensified the quantification of our world, from hiring new employees to predicting the next big box office hit. Numbers carry a veneer of authority that are questioned less than they should be.

“I believe in the power of big data,” says West. “But I think that it has been oversold the last several years.”

Seduced by the idea that bigger always equals better, we tend to assume that big data has fewer quality issues. In reality, some systematic errors, such as sampling bias, are magnified as the sample size grows.

And big data’s massive nature actually encourages seeing patterns where none exist because the sheer number of data points offers the possibility of connection in nearly infinite directions. Analysts can sort through the pile and pick the statistics that confirm their beliefs and ignore the rest.

Technological advances and changes in publishing have made it possible for nearly everyone — not just scientists — to produce, organize, and share large quantities of data without the gates of peer review and editorial selection.

“Thousands of start-up companies want you to believe they can solve any question, at any scale, and in sub-zero time,” says West. “We want the public to be a little more skeptical.”

No one is immune

Like others in the scientific community, Bergstrom and West have developed their critical thinking about data through years of training. But even they can make mistakes.

Garbage-in garbage-out. Jevin West and Carl Bergstrom delve into a recent paper that purports to automatically infer criminality using face images. Based on an alternative explanation, they call BS! Courtesy West and Bergstrom.

“BS in the research community is a HUGE problem,” says West. “Researchers are humans and therefore governed by the same incentives and reward structures as the general public. We can all fall prey to confirmation bias, even well-trained scientists.”

They point to a 2016 study that claims to have developed algorithms that can distinguish criminals from non-criminals using only a headshot photograph.

In an analysis used in their course, Bergstrom and West point out several places where the study went astray, including a potentially biased training set for the neural network, derived from the sourcing of the photographs.

You don’t need to know all the details of a fancy machine learning algorithm. Much of the time, all one needs is common sense to ascertain the plausibility of results,” says West. 

Being our own gatekeepers

So how do we keep from being overwhelmed by numbers and bamboozled by fancy graphic?

<strong>Carl Bergstrom </strong> and fellow professor Jevin West teach a course called that helps students identify spurious scientific claims. Courtesy Carl Bergstrom.

West suggests that one way to keep data honest is to combine big data methods with qualitative work. Viewing the wealth of big data at our fingertips not as a substitute but as a supplement to traditional analysis is the first step in avoiding big data hubris.

The other is vigilance.

“The scientific literature is too vast for any one individual to be an expert in everything,” says West. “So we must continue to be skeptical of data, and try to break it with different interrogations.”

Remember, if something sounds too good to be true, it probably is.

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2017 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.

Republish

We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.