iSGTW is now Science Node Learn more about our evolution

  • Subscribe

What's in a name? Supercomputers use lists to reveal group identities

Courtesy J. Andrew Harris.

J. Andrew Harris, assistant professor of political science at New York University-Abu Dhabi, has devised a supercomputing method to extract group constitution from a list of names.

While deriving some identity characteristics from a name may seem intuitive, for many, ethnic identification raises serious concerns about labeling and the potential for discrimination. Political scientists, however, can use such statistics to help identify and address social problems.

"Anytime we're interested in questions of discrimination, bias, or inequality as a function of race, identity, or ethnicity, we need actual measures of identity to test whether or not such discrimination exists." Harris posits. "That's where my method becomes useful - it provides those estimates to address questions of legal and moral importance: do racial or ethnic groups have equal access to the ballot box."

Inferring ethnicity from individual names (e.g., Wong, Sanchez, Smith) is error-prone. For instance, the name Lee could easily be a Caucasian, Asian, or African-American surname. So, instead of trying to categorize each name, Harris looks at the entire list of names simultaneously. The resulting demographics can then be used to pursue larger questions of cultural identity, entrenched biases, and barriers to inclusion.

"Instead of saying that person one is from ethnic group A, I use all of the information in a list of names to estimate the proportion of groups A, B, and C in that list," says Harris. "This method provides more efficient and lower-bias estimates of ethnic group proportions than you would get if you categorized each name and computed the proportions from those categorizations."

Harris's systematic method, recently published in Political Analysis, introduces a context-independent identity estimator, and can work in any geographic region or nation. He has used his method to infer racial voting behavior in North Carolina, and to explain decreases in Kenyan ethnic groups in the voter register. What's more, Harris claims, the method could be used to estimate virtually any name-related characteristic such as a person's religion, class, and ancestry.

To test the method, Harris relied heavily on the BuTinah supercomputer on the NYU Abu Dhabi campus. Named after a protected marine reserve off the coast of Abu Dhabi, BuTinah's 512 super-dense compute nodes are capable of around 70 teraFLOPS.

BuTinah helped him both manipulate the rosters and run a large of number simulations - on both real and simulated data - in order to develop and demonstrate proof of concept.

"If I had years to spare maybe I could have done the research without BuTinah. But the team at BuTinah made it do-able on a much more reasonable timetable," he says.

Harris's research was funded in part by a National Science Foundation (NSF) doctoral dissertation improvement grant.

Harris foresees using this approach to monitor how individual Twitter or Facebook networks evolve as important events approach. He wants to know if people connect more with certain ethnicities as that politically salient identity is activated.

"The approach will allow us to measure identity in a way that wasn't previously feasible. New ways of understanding inequality, social mobility, social networks - the substantive insights that new measures of identity could afford are huge."

--Lance Farrell

Join the conversation

Do you have story ideas or something to contribute?
Let us know!

Copyright © 2015 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.


We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.