- Social media provides valuable data about political beliefs
- Machine learning can help sort and categorize these opinions
- We may one day be able to use an automated tool to see which politicians think most like us
Trying to nail down a politician’s beliefs is a bit like figuring out what’s wrong with a broken toilet. It requires hard work, dedication, and is impossible to do without being a little grossed out.
No matter how you look at it, politicians lie. Most of us know that, but simply don’t have the time or energy to dig through their record and find the truth.
But what if there were a tool that helped you choose which politician to vote for based solely on how well their beliefs align with the issues that matter to you?
Dr. Srijith Rajamohan, a computational scientist at Virginia Tech, thinks this could be within reach. In fact, he’s working on a deep-learning based interactive visualization tool to understand and plot political ideologies based on Twitter activity.
“Is there a way to extract and understand people’s ideologies from the things that they say?” asks Rajamohan. “I turned to natural language understanding to see if we can take text from social media, run it through a deep learning model, and find some way to quantify it.”
Someday soon, a tool like Rajamohan’s could have a huge impact on how people understand political ideologies. And if we’re lucky, it could make voting for the right candidate a lot easier.
Cleaning up the data
The end-goal of this work was to construct a visualization tool that could help identify political ideology. However, as many important endeavors do, this project began with an intriguing conversation.
“It all started over coffee when Alana Romanella and I were discussing white supremacy and hate speech,” says Rajamohan. “The conversation evolved, and we started talking about different political groups.”
Eventually, Romanella and Rajamohan decided they could use deep learning to better understand political ideology by investigating social media posts. They decided to focus on Twitter, as it is a data-rich environment with a free application programming interface (API). After collecting data for four months, the team had roughly 3 million tweets to work with.
“We pulled the tweets based on certain hashtags provided by our in-house political scientist.”
But, as Rajamohan explains, this approach has some drawbacks. “A particular hashtag can be used by people from widely varying beliefs and backgrounds, so that’s not necessarily going to tell you that they belong to a particular group or they have a certain ideology.”
For example, say you’re trying to figure out how groups of people feel about the Black Lives Matter movement. You can’t simply assume anyone tweeting out the #BlackLivesMatter hashtag is a sympathizer to the cause, as members of white supremacist groups might also use this hashtag in a derogatory context.
This kind of ambiguous information is called dirty data, and it can be a big problem in machine learning. It can prevent scientists from actualizing any real analysis of a given dataset, and it is the most common issue facing data science workers.
For this project, Rajamohan decided to move to a weakly supervised form of machine training to understand intent. Contextual embeddings helped mitigate noise in the data by guiding a human researcher to the incorrect records on the plots that were generated from the neural network.
Once they had cleaner data, Rajamohan and his colleagues were able to visualize these belief structures. Although they experimented with various visualization techniques such as t-SNE, Isomap, and PCA, multidimensional scaling (MDS) turned out to be the most efficient.
This model places liberal ideologies on the bottom left, while conservative opinions are placed on the top right. Although other techniques such as t-SNE are able to provide a more effective separation of data, MDS is able to better identify incorrect labels in the corpus.
Quantifying an opinion
To make this whole process simpler, Rajamohan focused less on specific political ideologies in favor of plotting a person’s political affiliation based on their relationship to an important public figure. For instance, your opinion of Elizabeth Warren or Donald Trump reveals a lot about your own beliefs.
"In an ideal world, I would have a tool like this before voting." says Rajamohan.
I would take all of the beliefs and opinions a politician ever expressed and project it on a screen. I would then take my own beliefs and opinions and put it on the screen and look who I’m closest to.
While this tool has a long way to go before it becomes something the public can rely on to pick a candidate, simply pursuing this endeavor keeps Rajamohan interested.
“Being able to understand intent is hard,” said Rajamohan. “You have an entity, and it’s easy to say someone feels positively or negatively about something, but how do you quantify or extract someone’s intent? That is a really ill-defined concept, so if we can use deep learning or AI to extract them – I think that’s a pretty neat concept to explore.”