Data curation services ensure public access of research data
By Mark Engebretson
Paul Klockow created the data set that supported his master’s thesis in the Department of Forestry Resources. Now a research fellow in the department, he wants to ensure that his data can be used in the future by other researchers.
So he signed up for a University Libraries workshop on management of research data. Impressed by what he learned, he later invited Libraries staff to his department to share additional information with other researchers.
“While I was very familiar with this set of data, I might not be around later to explain what this variable is or what this data set actually means or how it was collected,” Klockow explained. “Having that data organized and having associated metadata, at the very least, allows the next person to jump right in and get their research analyses going.”
White House directive calls for greater access to data
Organized research data is not only efficient and helpful for researchers, it soon may be required by the federal government as part of a White House directive to make the published results of federally funded research – and its supporting data – available to the public. The Obama Administration issued the directive in February 2013 in the belief that greater public access will advance scientific discovery, bolster the economy, and maximize the impact of the federal research investment.
The directive, from the White House Office of Science and Technology Policy, requires that published research supported by federal grants be freely available within one year of publication. And, it will require that researchers “better account for and manage the digital data resulting from federally funded scientific research.”
But – especially for the piece mandating management of digital data – getting there isn’t going to be easy.
When research projects are complete and journal articles published, it’s not unusual for researchers to leave the raw data on their own computers with no formal plan to preserve it or share it. A number of vendors provide repository services for those who decide or are required by funders to organize and preserve the data. But few systems are in place on university campuses.
The White House directive is forcing that to change – and it could have a big impact at the University of Minnesota, where two-thirds of the hundreds of millions of dollars in annual research grants come from federal agencies.
An extension of library services
“This is a service that the Libraries can provide and nobody else on campus is currently providing,” said Lisa Johnston, a University of Minnesota librarian, who also is Co-Director of the University Digital Conservancy. Johnston is working on a plan to meet the federal mandate.
“This is just a new type of resource that we will be providing,” she said. “It’s a natural extension of library services.”
Johnston led a pilot data curation project last year that involved faculty members, researchers, and students representing five different data sets. The project leveraged the Libraries existing infrastructure, the University Digital Conservancy, the institutional repository for the University of Minnesota (conservancy.umn.edu).
“Feedback from the faculty in the pilot was very positive and anticipated that this service might satisfy the upcoming requirements from federal funding agencies,” Johnston said. Now she’s working toward building a repository for the campus, which may be open for business later this fall.
“University libraries are the natural repository for research conducted at a particular university,” said David Levinson, professor in the Department of Civil Engineering. Levinson – who conducts research in the area of infrastructure, particularly transportation infrastructure – currently maintains some of his research data on his office desktop computer.
“I won’t be here in 20 years; I’ll be retired. What will happen to the data sets when I retire?” he asks “What if someone forgets to migrate it?”
Levinson was involved in the pilot study. He called it a “step in the right direction, but it’s a baby step,” citing potential lack of resources and compliance as two challenges to a fully functioning data curation repository.
“You could probably have one librarian for every department at the University … who could have a full-time job collating and collecting the data for that department each year,” he said, noting that a funding model has not yet been established. He adds “[The funding] should come from the grants.”
The public good
So, why is it important for publicly funded research data to be preserved?
“First of all, the data is oftentimes unique, you could never recreate it,” Johnston said. “It’s also very expensive. And what do you get out of it? One, two, five papers? You could instead make that underlying research data available so that other researchers can take a look at the data, re-analyze it and come up with new results – perhaps competing results, perhaps validating results.”
Levinson agreed, saying that Libraries already have the infrastructure, the resources and the tools to not only preserve the data but to make it “findable” by the public.
“There’s 7 billion people in the world – most of whom don’t want to use my data – but a couple of whom might. And they might not know that the data exist” if it’s just sitting on my computer, he said. “Putting it out into a standardized, findable public forum makes it easier for them to: A) Know that the data exists; and B) Actually get at the data.”
Johnston said she believes that an institutional repository created and managed by the Libraries is the preferred method for ensuring public access of research data into the future.
“It may be in the best interests of academic libraries to provide our own brand of support,” she said, cautioning that expensive digital data assets may be forgotten on unreliable publisher web sites or start-up disciplinary repositories with no plan for sustainability.
“These are the University of Minnesota’s digital assets. We are the ones that receive the grants. We are the ones that are producing the research,” she said. “We’re the owners. We are the custodians of it.”
Research fellow Klockow, whose research is leading to new bioenergy harvest guidelines in Minnesota and Wisconsin for winter-harvested aspen trees, is now a big believer in University Libraries workshops, thanks to the workshop on managing research data.
“To anybody reading,” he said “I would recommend taking a look at the Libraries website for information on data management workshops and resources.”