- Social networks are a fount of real-time information
- Detecting trends in social media is a big data problem
- EGI cloud computing platform provides early trend detection
Social networks are big data production engines.
Their analytics can produce insights on trending topics that can be used in many domains, from advertising to politics. Social media trends are also indicators for various phenomena, from public opinion shifts, to emergency situations, to disease outbreaks.
However, the prediction of a social network’s topic as a trend needs to be first declared a trend by the social network itself (e.g. Twitter, Facebook), and this can count as a classification problem.
Managing massive data volumes to extract valuable information and doing that in a real-time fashion are additional obstacles to predicting trend topics on social networks.
Athena Vakali and colleagues at the Aristotle University of Thessaloniki addressed these challenges by working on a new model for detecting social media trends. The team wanted to observe the effectiveness of some of the known techniques and algorithms of this field in a near-real world context.
They started by using actual Twitter large-scale data threads and employed trend prediction in a real-time manner under a framework designed in lambda architecture.
Social media trends are indicators of various phenomena, from public opinion shifts, to emergency situations, to disease outbreaks.
Lambda architecture is a data processing model capable of handling massive quantities of data using both batch-processing and stream-processing methods to provide views of online data.
The team chose to use this model because it tackles the manipulation problems of both the volume and the velocity of data.
Though it was relatively easy to decide on which model to use, Vakali's project lacked the necessary infrastructure resources upon which to build the whole architecture.
GRNET to the rescue
Vakali and her colleagues decided to contact Greek Research and Academic Community (GRNET), a federated cloud provider at European Grid Infrastructure (EGI), to help them with the much needed Cloud Compute resources.
Vakali's team installed their model on GRNET’s cluster Okeanos and implemented lambda architecture distribution.
“Lambda architecture is, by its definition, a complex consisting of a couple of frameworks for distributed analysis and NoSQL databases,” says Vakali.
“It would be useless to execute our experiments in our lab’s standalone servers. Our need for infrastructure resources that would make the build of such architecture possible was accommodated by GRNET.”
In total, they used about 48 CPU cores, 46 GB of memory and 600 GB of disk storage available at Okeanos, and installed 14 virtual machines to help them run the experiments.
They found that almost 80 percent of the actual trending topics were classified as potential trending topics. The results, published in Advances in Big Data, validate the performance of the proposed research framework and emphasise its ability to early detect trending topics.