• Subscribe

The Internet Archive is your portal to history

Speed read
  • Internet Archive aims to create an accessible repository of all human knowledge
  • Comprehensive record of the media encourages wider perspectives
  • Public can participate in preservation efforts by recommending pages and uploading data

What if you couldn’t remember what you did yesterday? Or what happened the last time you touched a hot stove? Or who won the Civil War?

Memories remind us of what happened in the past and teach us to make better choices in the future. But what happens when that past becomes invisible, deluged by the ever-changing present?

“We’ve become such a media-soaked environment; information is coming from every which way,” says Brewster Kahle, founder and digital librarian of the Internet Archive (IA). “People are having a hard time understanding and distilling it all — they tend to get lost in the present.”

One thousand librarians are helping to build the archive’s collections, currently estimated at over 30 petabytes of data.

The average life of a web page is only 100 days before it’s changed or deleted. But the IA is preserving those pages, along with books, music, video, and software, and making them permanently accessible to anybody in the world who wants to use them.

Library of Alexandria 2.0

Kahle views the IA’s mission as akin to building the Library of Alexandria 2.0. The famed library flourished in ancient Egypt for several centuries before being devastated by fire.<strong>Precious commodity. </strong>When the Library of Alexandria burned down, untold histories and knowledge were erased. The Internet Archive wants to make sure that doesn't happen again.

“Libraries are usually destroyed by governments because the new guys don’t want the old stuff around,” says Kahle. “If there had been a backup copy of the original Library of Alexandria, we’d still have the plays of Euripides or the other works of Aristotle. But we don’t.”

IA’s best-known service, The Wayback Machine, has collected 279 billion historical web pages, and is growing at the rate of half a billion pages per week. One thousand librarians are helping to build the archive’s collections, currently estimated at over 30 petabytes of data, and used by hundreds of thousands of people a day. The IA maintains copies on servers around the world, in case of another fire like the one that wiped out the Library of Alexandria.

Securing Science

Scientists are anxious to collect and preserve government data. Vast amounts of digital information are at risk of vanishing when a presidential term ends and administrations change. For example, 83% of government pdfs disappeared between 2008 and 2012.

I think this is the opportunity of our generation. The last generation put a man on the moon. I think we can achieve universal access to all knowledge. ~Brewster Kahle

Working with the Library of Congress, the IA archives all .gov and .mil websites, selected FTP sites, and official social media accounts in the End of Term Archive.

“We’re also working with climate scientists at various universities who are concerned about what the change in administration might mean for their research,” says Kahle. Climate Mirror is a distributed effort with several universities to mirror and back-up U.S. Federal Climate data.

“People put reusable datasets into the IA as a kind of non-profit cloud service,” says Kahle.

Trump Archive

One effort by the IA archives 61 channels of television from 25 countries, 24 hours a day, seven days a week. The archive is text-searchable based on transcripts.

<strong>Say what? </strong>The Trump archive was launched in early 2017 to help bring his claims and promises to the surface. Courtesy Internet Archive.

Since the US presidential election in November 2016, the IA has created a special collection focused on the television appearances of Donald Trump.

“When Trump was elected, we pulled out his campaign appearances and interviews and made it into a special collection that can be searched to uncover what candidate Trump or president-elect Trump has said he’s going to do. It’s been fantastically useful,” says Kahle.

The collection launched on January 7, 2017. Its most popular video, of one of the presidential debates, has been seen or downloaded one million times.

“Television is a very pervasive and persuasive medium,” says Kahle. “Yet it just flows over us. Unlike text, it’s not easy to quote, compare, and contrast.”

Kahle asserts that collecting information and making it available isn’t a partisan activity.

“We’re trying to make it so people can refer to what has happened; so that if someone makes a claim about x or y, you can compare it against what has been published before,” says Kahle.

We, the people

Kahle believes that despite the challenges, now is the right moment for a concerted effort to put the best we have to offer within reach of anyone curious enough to want to access it.<strong>Nothing new under the sun. </strong> First published in 1949, George Orwell's classic novel 1984 has topped book sales charts since the election of Donald Trump. Courtesy Alisa Alering.

“We need more participation and better tools to make sense of it all, to help people understand what’s coming at them from many different angles, and not just recline back and watch one bubble,” says Kahle. “Right now it’s hard to get a broader view.”

You can contribute to the preservation of our digital history by suggesting websites to be archived, or by uploading your own data to be preserved or indexed by the IA.

“I think this is the opportunity of our generation,” says Kahle. “The last generation put a man on the moon. I think we can achieve universal access to all knowledge.”

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2023 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.


We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.