- Trump victory suprised many, but not the scientists at Cambridge Analytica
- 13 terabytes of big data and computer modeling proved decisive in election upset
- Trump's nostalgic ethos cloaked his high tech modus operandi
Two weeks before election day it appeared that Donald Trump was stuck.
Polling websites, ranging from the New York Times’ Upshot model to statistician Nate Silver’s FiveThirtyEight, showed Trump with a low chance of winning the election.
Yet one data organization was able to see a bigger picture. UK-based Cambridge Analytica (CA) culled comprehensive datasets and built custom computing clusters to model a far different electorate than other polling sites – models that showed Trump a path to victory.
CA, with senior Trump strategist Steve Bannon on its board of directors, was hired by the Trump campaign shortly after he won the Republican nomination.
“What we found is that one of the strongest signals was an urban/rural split,” says David Wilkinson, lead data scientist at CA.
“Our models showed that if you had higher turnout among rural areas, and lower turnout among urban areas, particularly in some ethnic minorities or some higher incomes, then you saw some very different changes in the election.”
Source material
CA built modeling software based on four different types of data: voter records, commercial data, campaign data, and weekly surveys. The company used online and telephone polling to survey thousands of people every week in all 50 states. They eventually focused on about 17 battleground states that would be critical to win the election.
Through this multi-tiered approach, CA scientists were able to model the electoral sentiments of roughly 100 million people who were constantly updating their Trump or Hillary Clinton preference.
Trump support, CA revealed, was different from a typical Republican electorate. In general, Republicans prefer American-made products. But with Trump voters, this was especially important. CA scientists found that American-made cars, in particular, were a strong predictor of who supported Trump.
Data can certainly help make sure that the messages are getting to the right people, and make sure the strategy is being focused in the right way. But it is a candidate that makes the decisions and wins the election, not the data. ~David Wilkinson
The public polls used by most campaigns can weight only one or two characteristics at a time – age or gender, for instance. When these polls are combined with other sources of data, they can provide a more realistic picture for political campaigns.
“When you plug those results [from public polls] into other sorts of data, when you have commercial data available to you, when you have other political sorts of data, and when you match those responses to a database of voters, you can use a lot more information. You can see which features are the most effective at weighting, and you can get a much more accurate picture,” says Wilkinson.
CA also analyzed early voting returns from rural areas to see if they matched the firm’s modeling of a more active rural electorate. Wilkinson’s team saw the rising early voter turnout in the Rust Belt – and a historical correlation between early turnout and final election results – and alerted the Trump campaign to the shift that was occurring.
Science for the win
Their discovery that early voting turnout was high in the Rust Belt, and that their models could weigh more factors than other polling organizations, led Trump to return to states that hadn’t voted for Republicans since the 1980s.
“Re-calculating voter turnout and reweighting our models showed us the scenario in which Trump could win,” says Wilkinson. “So we presented this to the campaign, and I think they really took it to heart. That's when the campaign revisited areas like Michigan and Wisconsin and areas that really surprised people that Trump would even bother campaigning in.”
CA’s analysis relied on custom high-performance computing clusters – upwards of 560 processing cores and over 130 TB of data storage.
Total data analyzed during the campaign approached 13 TB, analysis possible via a data cloud accessed through Amazon Web Services.
While winning campaigns depend on more factors than just data, it is unmistakable that data can be a critically important factor – one that can put one candidate over the top in a close race. For all Trump’s populist, anti-intellectualist appeals, he ultimately relied on computational power and scientific analysis to secure victory. This acceptance of science is a heartening, if ironic, signal from the incoming president.
“Data can certainly help make sure that the messages are getting to the right people, and make sure the strategy is being focused in the right way,” notes Wilkinson. “But it is a candidate that makes the decisions and wins the election, not the data.”