• Subscribe

Text mining strikes gold in political discourse

Words can flow from the pen and drip from tongues; now, a Columbia University text-mining analysis has located streams of discourse in the State of the Union address. Deep reading algorithms and evocative visualizations have breathed new life into this old document, finding a pivot point in American governance in the process.

The global network structure of the SoU, 1790–2014.
The global network structure of the SoU, 1790–2014. A community detection algorithm reveals cohesive clusters or discursive categories from the semantic network built from the 1,000 × 1,000 terms matrix over the SoU’s history. Some terms lie between clusters and serve as bridges connecting otherwise disjointed discourses. Two clusters contain only a few linked terms: one indexes the set of concepts associated with immigration, the other those associated with crime. Click for larger image. Courtesy Alix Rule.

The State of the Union (SoU) address is a staple of US political discourse. Launched by George Washington in 1790, presidents have delivered it annually ever since. Despite the mediation of 45 presidents, 228 SoU addresses, and 1,763,622 words, you’d be forgiven if you thought a single author penned them.

“We find that the degree of change from one president to another is small,” says Alix Rule, graduate researcher at the Interdisciplinary Center for Innovative Theory and Empirics (INCITE)

Individual presidents were not the focus of the research, however. The multicultural research team was after much bigger fish. “We were interested in how understandings of the tasks of government, on the part of presidents and their audiences, evolve,” says co-author Jean-Philippe Cointet of the Laboratoire Interdisciplinaire Sciences Innovations Sociétés of the French National Institute of Agricultural Research (INRA).

Beginning with the assumption that the meaning of a word depends on its relation to surrounding words, INCITE researchers tasked a computer with analyzing the cumulated words of the time-spanning SoU. Looking across 225 years, researchers organized the speech into 10 overlapping 40-year periods, creating a 'semantic network' based on the co-occurrence of  frequently used terms in each epoch. From these clusters of nouns they inferred categories in political discourse as they would have appeared to contemporaries. An algorithm developed by Cointet helped find links between similar categories through time, and the result was the discovery of discourse streams.

A discourse stream allows readers to see what’s really being talked about, even as the topic seems to drift in the SoU.  For instance, “political discussion might at one moment be about the 14th amendment and individual rights, and then a conversation about gag rules and development aid, and later still, about health insurance and religious employers,” Rule points out. “But despite a changing set of objects of concern, we can recognize this as one continuous topic: namely, the ongoing conversation about abortion.” 

A river network captures the flow of terms across history of US political discourse.
A river network captures the flow of terms across history of US political discourse. Clusters on semantic networks of 300 most frequent terms for each of 10 historical periods are displayed as vertical bars. Relations between clusters of adjacent periods are indexed by gray flows, whose density reflects their degree of connection. Streams that connect at any point in history may be considered to be part of the same system, indicated with a single color. Click for larger image. Courtesy Alix Rule.

Whereas the discovery of discourse streams helped researchers identify stability in the SoU, noting significant diversions amid a perceived continuity is the study’s chief insight. Using largely different methods, the study reveals how the understanding of US governance took a departure from preceding eras around the time of the First World War, though this fact may have escaped contemporaries.

"We are able to show that this transition from 19th century to modern political consciousness overall — as judged by what the objects of concern are, how quickly they change, and  what we call 'master categories' of internal and international affairs — occurs despite what people could perceive in the political conversation of the time,” Rule says.

The study also called attention to a few outlying topics. Over time, some clusters dropped from the discourse stream (e.g., conversations about First Nations peoples noted by the cluster 'land' and 'limits'). Recent entries include the cluster 'school' and 'help,' terms indicative of the emerging conversation about privatized social policy.

Clocking in at almost two million words, the SoU may not be the biggest textual collection digital humanists have tackled, but this novel analysis does provide the first academic look at the discourse stream as an object of study. Using this text mining technique, we now have a way to visualize how terms relate across time and see the SoU as a web of intertwined terms.

“The data are not too 'big' for modern personal computers,” says Rule. “Nonetheless, our analysis gives us a synoptic picture that would be hard to get by reading all the documents one after the other.”

So what's the lesson for us today? Might there be shifts in the political dscourse streams surrounding us we have yet to perceive? "Standing on the eve of their revolution it would have been impossible for those in the immediate post-1917 period to know they had made such a break with their past," says INCITE director and study co-author Peter Bearman. "The same is true today. Understanding history helps us understand ourselves, but not our future selves."

--Thanks for reading iSGTW. Stay with us as we become The Science Node on 16 September.

Join the conversation

Do you have story ideas or something to contribute?
Let us know!

Copyright © 2015 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.


We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.