• Subscribe

From big data to big scandal

Speed read

  • Breaking news about Cambridge Analytica casts new light on role in 2016 election
  • CA built modeling software based on voter records, commercial and campaign data, and telephone polling
  • 13 terabytes of collected data modeled an alternate outcome, lighting a path to Trump's victory

In November 2016, Science Node interviewed Cambridge Analytica (CA) data scientist David Wilkinson about the company's role in helping Donald Trump win the 2016 presidential election.

Since then, the world has learned that CA mined Facebook for the data of 50 million users in an effort to sway the election. Additionally, the company was caught on tape by Britain's Channel 4 admitting that they use bribes and prostitution to blackmail politicians. These events have prompted Elizabeth Denham, Britain's Information Commissioner, to seek a search warrant for the company's office and servers.

While no one yet fully knows the legality of CA's actions, Facebook is also receiving criticism for its role in sharing user data. According to the Washington Post, two Federal Trade Commission officials believe the social media giant violated a landmark consent decree governing how Facebook handles user privacy. Each violation holds a penalty of $40,000, meaning Facebook's potential exposure could reach trillions of dollars.

Only time will tell where this story will lead, but it's worth digging into the past to understand where it started. The following interview with Wilkinson is an inside look into the workings of a major player in the 2016 election, as well as a window on how technology is shaping the world around us in ways we are only just beginning to understand.

 


 

Two weeks before election day it appeared that Donald Trump was stuck.

Polling websites, ranging from the New York Times’ Upshot model to statistician Nate Silver’s FiveThirtyEight, showed Trump with a low chance of winning the election.

Yet one data organization was able to see a bigger picture. UK-based Cambridge Analytica (CA) culled comprehensive datasets and built custom computing clusters to model a far different electorate than other polling sites – models that showed Trump a path to victory.

CA, with senior Trump strategist Steve Bannon on its board of directors, was hired by the Trump campaign shortly after he won the Republican nomination.

Democratic prism. This interactive map offers a 3D view of voter preference while maintaining regional significance without map distortion. This view associates county height with county vote total. For full screen version, see here. Courtesy Max Galka and Mark Kearney. 

“What we found is that one of the strongest signals was an urban/rural split,” says David Wilkinson, lead data scientist at CA.

“Our models showed that if you had higher turnout among rural areas, and lower turnout among urban areas, particularly in some ethnic minorities or some higher incomes, then you saw some very different changes in the election.”

Source material

CA built modeling software based on four different types of data: voter records, commercial data, campaign data, and weekly surveys. The company used online and telephone polling to survey thousands of people every week in all 50 states. They eventually focused on about 17 battleground states that would be critical to win the election.

Through this multi-tiered approach, CA scientists were able to model the electoral sentiments of roughly 100 million people who were constantly updating their Trump or Hillary Clinton preference.

Trump support, CA revealed, was different from a typical Republican electorate. In general, Republicans prefer American-made products. But with Trump voters, this was especially important. CA scientists found that American-made cars, in particular, were a strong predictor of who supported Trump.

Data can certainly help make sure that the messages are getting to the right people, and make sure the strategy is being focused in the right way. But it is a candidate that makes the decisions and wins the election, not the data.    ~David Wilkinson

The public polls used by most campaigns can weight only one or two characteristics at a time – age or gender, for instance. When these polls are combined with other sources of data, they can provide a more realistic picture for political campaigns.

“When you plug those results [from public polls] into other sorts of data, when you have commercial data available to you, when you have other political sorts of data, and when you match those responses to a database of voters, you can use a lot more information. You can see which features are the most effective at weighting, and you can get a much more accurate picture,” says Wilkinson.

CA also analyzed early voting returns from rural areas to see if they matched the firm’s modeling of a more active rural electorate. Wilkinson’s team saw the rising early voter turnout in the Rust Belt – and a historical correlation between early turnout and final election results – and alerted the Trump campaign to the shift that was occurring.

Science for the win

Their discovery that early voting turnout was high in the Rust Belt, and that their models could weigh more factors than other polling organizations, led Trump to return to states that hadn’t voted for Republicans since the 1980s.

“Re-calculating voter turnout and reweighting our models showed us the scenario in which Trump could win,” says Wilkinson. “So we presented this to the campaign, and I think they really took it to heart. That's when the campaign revisited areas like Michigan and Wisconsin and areas that really surprised people that Trump would even bother campaigning in.”

<strong>Blown into proportion. </strong> The map on the left shows voter preference across the US. The cartogram map on the right represents the nation in proportion to its population. Courtesy <a href='http://www-personal.umich.edu/~mejn/election/2016/'>Mark Newman.</a>

CA’s analysis relied on custom high-performance computing clusters – upwards of 560 processing cores and over 130 TB of data storage.

Total data analyzed during the campaign approached 13 TB, analysis possible via a data cloud accessed through Amazon Web Services.

While winning campaigns depend on more factors than just data, it is unmistakable that data can be a critically important factor – one that can put one candidate over the top in a close race. For all Trump’s populist, anti-intellectualist appeals, he ultimately relied on computational power and scientific analysis to secure victory. This acceptance of science is a heartening, if ironic, signal from the incoming president.

“Data can certainly help make sure that the messages are getting to the right people, and make sure the strategy is being focused in the right way,” notes Wilkinson. “But it is a candidate that makes the decisions and wins the election, not the data.”

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2018 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.

Republish

We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.