• Subscribe

Secrets from the world's biggest family tree

Speed read
  • Amateur genealogists contribute valuable data to scientists
  • Large-scale study shows that genes play only a minor role in longevity
  • Crowd-sourced data plays increasing role in scientific research

Before the Internet became so central to our lives, genealogy enthusiasts investigated their roots the old-fashioned way. They asked relatives about family names, dates of birth and death, marriages, occupations, and locations. The next step was to visit libraries, historical societies, and other repositories of public records.

Big branches. Using 86 million public profiles from Geni.com, researchers created a single family tree of 13 million people spanning 11 generations. Courtesy Columbia University.

Today, the digitization and commoditization of these records has allowed millions of people to research their ancestry without leaving home. As amateur genealogists search online databases, they also contribute information, records, and photos not previously available. A recent paper describes how a team of researchers used this crowd-generated data to facilitate scientific inquiry.

Yaniv Erlich, a computational geneticist at Columbia University, was curious about the large amount of family tree data collected by the ancestry-themed social media website, Geni.com.

After obtaining permission from Geni and its parent-company MyHeritage, he downloaded 86 million user profiles. The data contributed by these users represents 13 million individuals spanning an average of 11 generations. That’s about five centuries of human history.

This immense family tree primarily includes peoples from North America, the British Isles, and Western Europe. Each profile describes one person and any presumed connections to other individuals in the data set. The data included descriptors such as dates of birth, birth and death locations, names of parents, and gender.

Are you who you say you are?

The researchers faced several challenges during the six months it took to process this massive amount of user-generated data. According to Joanna Kaplanis, a member of Erlich’s research team, the greatest obstacle was verifying the family tree’s accuracy.

<strong>Before the internet.</strong> In the 19th century, families tracked their ancestry the old-fashioned way - with paper and pen. Courtesy Library of Congress.One way they did this was comparing the average lifespan of individuals in the Geni data with lifespans of persons found in the Human Mortality Database (HMD). Populated with government-sourced demographic data, the HMD is considered to be accurate.

The researchers were reassured when the Geni data closely matched the HMD. The team also removed records that contained invalid relationships such as someone being listed as both a parent and child of the same individual.

Team members Tal Shor and Omer Weissbrod then designed an algorithm to determine the expected amount of shared DNA between individuals found in the data. Manual and automatic curation tools, including Yahoo! Placemaker, clarified geographical information.

Kissing cousins

Analysis of the Geni data led to some surprising findings about the evolution of marriage. Before the Industrial Revolution, most individuals in the data set married someone who was born within about six miles of where they themselves were born.

<strong>Closer, my love.</strong> New evidence from the Geni study suggests that people continued to marry close relatives even after long-distance travel became the norm. After 1750, that distance steadily increased to a wider range of around 60 miles by 1950. Previously, it had been thought that marriages between cousins decreased because of this geographic distance. Instead, the Geni data shows that people continued to marry close relatives for 50 years after the advent of the railway and other improvements in transportation made it easier to move away from one’s place of birth. According to Kaplanis, this suggests that cultural rather than geographic changes caused the change in mating behavior.

‘Good’ genes

The study also found that genes explain only about 16% of the differences seen in human longevity. That means having the “right genes” only adds about five years to the average lifespan. That isn’t much of an advantage, considering that a habit like smoking reduces lifespan by 10 years.

<strong>Hedge your bets.</strong> So-called good genes account for only about 16% of differences in longevity, so it's probably smart to practice healthy habits like eating well and not smoking.Kaplanis believes the discoveries made about genetics and longevity are among the most important. She advises, “Even if your parents lived very long lives, unfortunately we still have to eat healthfully and exercise to make sure that we do too!”

Crowd power

A data set of the magnitude used in this study would have been difficult and expensive to obtain without harnessing the power of crowd-sourcing and social media.  According to Weissbrod, “The staggering amount of data makes it possible to perform analyses that are much wider in scope than any previous study of this kind.”

All the data used in the study is available to the public, and the branches of the global family tree are sure to yield many more secrets about human life and culture.

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2023 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.


We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.