By Allison Campbell-Jensen
The Honeycrisp apple, the crunchy sweet-tart apple that is Minnesota’s state fruit, is a product of the University of Minnesota’s Horticultural Research Center. It has grown to be fourth in apple production in the United States and among the top two in retail value.
For quite a while, however, its parentage was in question. And even after DNA sleuthing uncovered its parents, part of its history remained unclear — until librarians helped bring clarity.
The clues were in materials collected before the electronic age — analog data held in paper notebooks, ledgers, and maps. The amount of such data held just at the University of Minnesota is potentially immense; a survey of just a few life sciences departments revealed holdings of analog data that would span more than four football fields.
“I worked with a faculty member who had shelves of data that were in written format, and I knew hadn’t been preserved in any other way,” says Shannon Farrell, Natural Resources Librarian. “We were worried that once these people retire, that data was going to be lost.” Farrell and Julie Kelly, Science Librarian for Applied Economics, Ecology, and Horticulture, each had experience working in research labs prior to becoming Librarians, so they had an idea of the scope of the problem.
The issue has become more pressing in the last 15 years, Kelly says, as scientists who receive large federal grants typically have a mandate to make research data publicly available after a certain amount of time. That’s electronic data — but what about the rest, which may be useful in studies of biodiversity or climate change yet may not be findable, accessible, or even stored safely?
Apple of his eye
For University apple breeder Jim Luby, it was the impending visit of a colleague seeking information that pushed him to get files into better order. They weren’t just his files — they were more than 100 years of records of hybridizing apples to develop varieties that could withstand Minnesota winters and also please apple eaters, growers, and sellers.
In the Works Projects Administration-built Horticultural Research Center, these records were stored in a fireproof vault — but haphazardly. When apple historian Daniel Bussey asked to visit, Luby and fellow apple breeder David Bedford requested that librarians Kelly, Farrell, and Kathy Allen of the Andersen Horticultural Library help develop some order out of the chaos.
With graduate student Nick Howard, the librarians literally paged through the data to figure out what was in it, Farrell remembers. They created a spreadsheet with each document listed, as well as its format, such as spiral notebook, bound ledger, or flat map. “We would label the shelves, so you would have a geographic map, what were in the documents and where they were,” she says.
Then the librarians returned to the apple breeders to ask which was the most valuable information — the highest priority to digitize. Once they had a set digitized to prepare it for the University Digital Conservancy, they returned it to Luby to provide annotation and metadata about the data, the methods used to collect it, the time period it represents, and so forth.
“We tried to find other institutions who had done something like this so we could look at how they organized it. We couldn’t find anybody,” Kelly says. “We had to start from scratch: How will apple breeders approach it, how do we put it in categories and how do we describe it?”
At the start, it seemed that the records about Honeycrisp had been lost.
In the meantime, Nick Howard, now an apple breeder in the Netherlands, had traced the parentage of the popular Honeycrisp using DNA methods, but Luby and Bedford had uncovered no record of when the parents were crossed. However, in summer 2021, Luby found confirmation of the cross between Keepsake and Minnesota 1627 — in a newly digitized inventory of apples at the Horticultural Research Center that showed that Honeycrisp came from the first cross made in 1960.
“We knew it had been done,” Luby says. “It was good to get it confirmed and the year it was made.”
He adds: “I was just so grateful they took an interest in this. … I inherited all this stuff and I didn’t want to leave it in the same condition I inherited it. We decided to leave the original records in the vault, as a working collection. … I hope the digital collection will be an accessible resource that will be useful to people in the future.”
Future prospects?
“We’re interested in helping that old historic data become more easily used, better described, and safe. There’s no nice way to search — oh, let’s look for older data. We consider it to be an institutional asset and I think scientists do, too.”
—Julie Kelly
Outside of the Horticultural Research Center, few researchers have a physical vault. Other analog data are kept in labs, file cabinets, and even in researchers’ homes. How will future researchers store and retrieve analog data so that clearly defining the family tree of Honeycrisp will be just one example among many of what can be accomplished with analog data?
Librarians, Farrell says, need the expertise of scientists to fully describe the data. Kelly notes that scientists re-use analog data a fair amount but in an informal way. They might re-use their own data or that from someone else. But the process needs improvement.
“We’re interested in helping that old historic data become more easily used, better described, and safe,” Kelly says. “There’s no nice way to search — oh, let’s look for older data. We consider it to be an institutional asset and I think scientists do, too.”
Yet digitizing everything is not a good solution, Farrell says, because of the great expense of time and money. “Is there some other way to describe this data and provide information to people so that they know it exists and can be re-used?” she asks.
The scope of the problem of preserving analog data is large but finite; people today who gather information by pencil and paper quickly translate it into electronic data.
“Maybe this won’t be a long-term problem but it’s certainly a risk right now for a large-scale loss,” Farrell says. While it is not yet clear what librarians’ role will be, Kelly believes they will have one. “Often what we try to do is help faculty solve problems and this is a perfect example,” she says.
Some of the issues around preserving analog data, Farrell says, are: “How do you do this sustainably? How do you teach others how to do it? We think it’s going to be a large-scale effort that involves researchers, archivists, data managers, and tech solutions.”
As stakeholders seek solutions, there is great potential, too. “Once all this data is surfaced,” she says, “people could see new and novel approaches, new research questions to answer, novel ways to use it.”