• Subscribe

When plants go digital

Speed read
  • Digitizing biological specimen collections is a daunting challenge
  • The Indiana University Herbarium created a streamlined digitization process
  • More than 150,000 specimens are now accessible through an online data portal

Computers and the internet have taken much of the grunt work out of science.

Despite this, capturing digital information from a large physical collection is an organizational challenge. Performed poorly, such a process can generate data of limited usefulness.

From dead plant to download. The IU Herbarium has a collection of over 150,000 unique plant specimens dating back to 1885. Director Eric Knox is engaged in a multi-year project to digitize the specimens and make them available online, along with their taxonomic and geographic information

This presented a huge problem for the Indiana University (IU) Herbarium. Established in 1885, the institution has more than 150,000 plant specimens that represent the flora from Indiana and elsewhere. The collection spans the Herbarium’s entire existence, so many items are irreplaceable.

Someone had to step up to take on this challenge, and Dr. Eric Knox decided his team was right for the job. A senior scientist and director of the IU Herbarium, Knox trained more than 70 undergraduate curatorial assistants to process the specimens.

<strong>Digital pipeline.</strong> A curatorial assistant barcodes preserved specimens before uploading labeled images to the Imago digital repository where they will be stored and made accessible to the public. Courtesy Alisa Alering.Partnering with IU Libraries’ Imago digital repository and the Consortium of Midwest Herbaria’s shared data portal, the digital collection of the herbarium’s unique plants can be shared with scientists and the general public.

“The specimens that we have in the herbarium are all unique,” Knox says. “Each one has scientific significance in its own right.”

Digital assembly line

The first step in the digitization process starts with scientists going out into the field to collect plant specimens that are pressed and dried. While the herbarium already has a huge collection, scientists like associate curator Paul Rothrock are still adding to it.

It’s easy to label new specimens, but ensuring the validity of older ones is trickier. According to Knox, making sure older plants were labeled with their correct and current names took about three years. This process also created a digital inventory of the species in the collection, which was used in subsequent steps.

<strong>Scientific significance.</strong> Curator Paul Rothrock examines a few of the IU Herbarium’s collection of over 150,000 specimens dating back to 1885. Courtesy Emily Sterneman.“Once we had the curated specimens, then we were able to image them all with a barcode on every single sheet,” says Knox.

“At the imaging stage, we wanted an efficient process, where students used barcode readers to rename the image file and to create a skeletal database record where the taxonomy and the geography – down to the level of county in the case of the United States – is selected from pre-populated, drop-down lists. This way, there’s no typing involved, which avoids introducing errors.”

After this, Knox’s team channels the images through a cyberinfrastructure quality control pipeline that uses IU's research computing resources to copy the images.

<strong>It takes a village.</strong> Scientists, librarians, and over 70 curatorial assistants have been working for five years to complete the digitization of the IU Herbarium collection. Courtesy Emily Sterneman.The files are converted from TIF format to JPEG, with 60% compression but no loss of resolution or image quality, making the files faster to download. Finally, the images are uploaded onto IU’s Imago digital repository, and then linked to the Midwest Herbaria data portal for the whole world to access.

“If we compile specimen images and information once and make them electronically available and, furthermore, use a platform that aggregates resources from all herbaria, that product is available for all people to use,” Knox says. “It fundamentally transforms how people can do science. This is environmental big data finally coming to fruition, but it needs to be digitized one specimen at a time.”

The devil is in the details

Properly handling and storing images of more than 150,000 specimens is no small task. Knox states that each TIF file is about 175 megabytes, and a discussion about the matter with Indiana University’s IT team meant “preparing for petabytes.”

Finally, the technology has caught up with the vision we had for how an information system like this should work.

“As I tell people, even though I’ve been trying to get this project underway for many, many years, I’m delighted that my early attempts were not successful,” says Knox. “What is easy now would have been difficult then." 

Knox and his team also had to think about what else has changed since specimen collection began in the 19th century.

<strong>For the ages.</strong> A specimen is prepared for preservation. Courtesy Emily Sterneman.“If somebody says a plant was collected ‘10 miles north of Bloomington’, does that mean it’s more than nine miles but less than 11, or more than five miles but less than 15?” asks Knox. “Did the collector start from the center of Bloomington or from the edge of town at that time?”

Working with the developers of GEOLocate, a large set of historical US Geological Survey maps can now be overlaid on contemporary maps to help determine where plants were collected in the past.

Although there’s still a lot of work to do, this five-year project is finally drawing to a close. Much like all the preparations for a complex dinner, the hard work is about to pay off.

“It’s analogous to having Thanksgiving dinner,” says Knox. “You think about how good that food is, but what’s the first thing you do in the morning? You get up and you start chopping onions and celery to make the stuffing. Even after you pull the turkey out of the oven, you still have to make the gravy. We’re in the gravy stage right now. A year from now, the table will be set and any guest can access this information from any computer, anywhere in the world.”

Eric Knox wishes to thank Paul Rothrock, Tanner Mayfield, Daniel Layton, Maggie Vincent, Laura White, and a large army of student curatorial assistants for their excellent work during this 5-year project, which was undertaken with funding from the Indiana University Department of Biology, College of Arts and Sciences, Vice Provost for Research, Vice President for Research, IU Libraries, and University Information Technology Services.

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2023 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.


We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.