Opinion: What would Linnaeus do?
Elizabeth Leake of TeraGrid - the high-performance, distributed computing network in the USA - was a guest at the EGEE User Forum, the recent conference in Europe on high-throughput computing. Here, she gives her impressions about a technology challenge faced by both types of computing: Long-term, persistent storage.
One of the highlights of visiting the conference venue in Uppsala, Sweden, was learning about Carl Linnaeus.
Born in nearby Råshult in 1707, Linnaeus moved to the college town of Uppsala to study, and quickly became a cornerstone of the university. He died in 1778 and was laid to rest in Uppsala Cathedral - largest in all of Scandinavia.
Linnaeus was famous for being the first scientist to consistently apply Latin binomial nomenclature to plants and animals, thereby initiating what has become a universally implemented method of species classification. Uppsala University has lovingly preserved some of Linnaeus' field research - specimens and documentation - for future generations to mine.
Not to be outdone, the Linnean Society of London keeps the world's largest collection of his original research materials, "where they are available for study by researchers and are stored in the best possible environment to preserve the materials intact."
I wonder what the situation will be for electronic media?
Imagine it is the year 2010 and you are a molecular biologist who is on the brink of finding a cure for a disease that has affected millions of people. You are leveraging technology in ways that make the best and brightest computational scientists take note. Everywhere you turn, funding agents give money to support your research. You are awarded tenure at your institution. Your discoveries are published. You put forth hundreds of students who have learned your techniques. You retire well, and live in the south of France.
Now flash forward 250 years. You are a nanomolecular biologist who is attempting to find a cure for a new disease. You would love to get your hands on the computational research of those who worked on a similar strain of the disease back in 2010, but that data is no longer available. Although the computational power that is available to you is far greater than that which was used in 2010, much of the effort expended over the lifetimes of those scientists who committed their research to digital files is lost.
Ironically enough, you can, however, read Linnaeus' hand-written field notes from nearly 500 years earlier.
Why is this?
Paper vs digital
The architecture of digital storage changes every 20 years or so. Storage costs money - power to cool the systems, media on which to store it, and therefore a commitment by an institution, scientific domain, or government to protect and preserve the data for future generations to mine.
But most institutions struggle to meet their administrative data management needs. Many commit to storing academic data for the tenure of the professional, or a period of two years (maximum) beyond the termination of a contract.
Perhaps Linnaeus was lucky in that he made his contributions to society in an era before this situation rendered it impossible to preserve and protect academic research for generations to come.
In practice, preserving paper in a museum is much cheaper. Linnaeus would have used hand-laid paper or animal vellum. Writing would have been done with indigo ink. These materials would stand the test of time.
Today, if we do commit something to paper, it is most likely printed on über-ephemeral recycled paper - the fibers of which are short, put through a bleaching process, and bound together with an acidic medium. Toner and vegetable inks fade, and paper that is designed to self-destruct is not the best way to preserve anything.
During the EGEE User Forum, while having the opportunity to chat with those who operate many of the world's technology infrastructures, it became apparent that we all seem to share this problem.
Most technology has a five-year life span at best. Many infrastructures are built and sustained by cycles of four- or five- year funding streams. When you think about it, how could a long-term, international, data repository be funded? Where would you house it? Who would you trust with the contents? Which set of security policies would you apply, and how would you sustain the natural evolution of progress? Who wants to be responsible for the world's data anyway?
Imagine that the year is 2030, and I have passed away. My grandchildren are rifling through a box which represents my life. Dozens of USB memory sticks, CDs, and floppy disks hold the writings, images, and evidence of my existence - but they don't know that. They have never seen a floppy disk, let alone a device that would read one. Therefore, these trinkets will most likely be turned into a piece of primitive jewelry, made into Christmas ornaments, or sent to the trash for an archaeologist to find.
-Elizabeth Leake, TeraGrid External Relations Coordinator