- Breaking news about Cambridge Analytica casts new light on role in 2016 election
- CA built modeling software based on voter records, commercial and campaign data, and telephone polling
- 13 terabytes of collected data modeled an alternate outcome, lighting a path to Trump's victory
In November 2016, Science Node interviewed Cambridge Analytica (CA) data scientist David Wilkinson about the company's role in helping Donald Trump win the 2016 presidential election.
Since then, the world has learned that CA mined Facebook for the data of 50 million users in an effort to sway the election. Additionally, the company was caught on tape by Britain's Channel 4 admitting that they use bribes and prostitution to blackmail politicians. These events have prompted Elizabeth Denham, Britain's Information Commissioner, to seek a search warrant for the company's office and servers.
While no one yet fully knows the legality of CA's actions, Facebook is also receiving criticism for its role in sharing user data. According to the Washington Post, two Federal Trade Commission officials believe the social media giant violated a landmark consent decree governing how Facebook handles user privacy. Each violation holds a penalty of $40,000, meaning Facebook's potential exposure could reach trillions of dollars.
Only time will tell where this story will lead, but it's worth digging into the past to understand where it started. The following interview with Wilkinson is an inside look into the workings of a major player in the 2016 election, as well as a window on how technology is shaping the world around us in ways we are only just beginning to understand.
Two weeks before election day it appeared that Donald Trump was stuck.
Yet one data organization was able to see a bigger picture. UK-based Cambridge Analytica (CA) culled comprehensive datasets and built custom computing clusters to model a far different electorate than other polling sites – models that showed Trump a path to victory.
CA, with senior Trump strategist Steve Bannon on its board of directors, was hired by the Trump campaign shortly after he won the Republican nomination.
“What we found is that one of the strongest signals was an urban/rural split,” says David Wilkinson, lead data scientist at CA.
“Our models showed that if you had higher turnout among rural areas, and lower turnout among urban areas, particularly in some ethnic minorities or some higher incomes, then you saw some very different changes in the election.”
CA built modeling software based on four different types of data: voter records, commercial data, campaign data, and weekly surveys. The company used online and telephone polling to survey thousands of people every week in all 50 states. They eventually focused on about 17 battleground states that would be critical to win the election.
Through this multi-tiered approach, CA scientists were able to model the electoral sentiments of roughly 100 million people who were constantly updating their Trump or Hillary Clinton preference.
Trump support, CA revealed, was different from a typical Republican electorate. In general, Republicans prefer American-made products. But with Trump voters, this was especially important. CA scientists found that American-made cars, in particular, were a strong predictor of who supported Trump.
Data can certainly help make sure that the messages are getting to the right people, and make sure the strategy is being focused in the right way. But it is a candidate that makes the decisions and wins the election, not the data. ~David Wilkinson
The public polls used by most campaigns can weight only one or two characteristics at a time – age or gender, for instance. When these polls are combined with other sources of data, they can provide a more realistic picture for political campaigns.
“When you plug those results [from public polls] into other sorts of data, when you have commercial data available to you, when you have other political sorts of data, and when you match those responses to a database of voters, you can use a lot more information. You can see which features are the most effective at weighting, and you can get a much more accurate picture,” says Wilkinson.
CA also analyzed early voting returns from rural areas to see if they matched the firm’s modeling of a more active rural electorate. Wilkinson’s team saw the rising early voter turnout in the Rust Belt – and a historical correlation between early turnout and final election results – and alerted the Trump campaign to the shift that was occurring.
Science for the win
Their discovery that early voting turnout was high in the Rust Belt, and that their models could weigh more factors than other polling organizations, led Trump to return to states that hadn’t voted for Republicans since the 1980s.
“Re-calculating voter turnout and reweighting our models showed us the scenario in which Trump could win,” says Wilkinson. “So we presented this to the campaign, and I think they really took it to heart. That's when the campaign revisited areas like Michigan and Wisconsin and areas that really surprised people that Trump would even bother campaigning in.”
CA’s analysis relied on custom high-performance computing clusters – upwards of 560 processing cores and over 130 TB of data storage.
Total data analyzed during the campaign approached 13 TB, analysis possible via a data cloud accessed through Amazon Web Services.
While winning campaigns depend on more factors than just data, it is unmistakable that data can be a critically important factor – one that can put one candidate over the top in a close race. For all Trump’s populist, anti-intellectualist appeals, he ultimately relied on computational power and scientific analysis to secure victory. This acceptance of science is a heartening, if ironic, signal from the incoming president.
“Data can certainly help make sure that the messages are getting to the right people, and make sure the strategy is being focused in the right way,” notes Wilkinson. “But it is a candidate that makes the decisions and wins the election, not the data.”