The University of Minnesota Libraries have been developing educational and outreach programs to support researchers throughout the data lifecycle.
Today’s research labs look much different than they did when the above photo was taken: no longer do researchers need wheels to move gigantic computing machines around the office. As the size of our computers shrink, their storage capacity is increasing dramatically, and the creation of data to fill them is keeping pace. But as software and hardware are updated and upgraded, we risk leaving behind the content created on the now outdated technology. From the gigabytes of digital family photos and videos we collect to the terabytes of data generated by researchers across campus, how do we as individuals and scholars keep up with our growing data management needs? How do we ensure that what we’re creating today will be available to researchers tomorrow? These long-term considerations of preservation and access are at the heart of a new data management program in the University Libraries.
The program has its beginnings in a study of University of Minnesota faculty, graduate students, and other researchers in the sciences, conducted by the Libraries in 2007. Trying to understand the unique information needs of scientists so services and tools that support research in the sciences could be improved, the Libraries asked questions such as: How do scientists share their work with colleagues, both at the University and at other institutions? How do scientists collect, manipulate, mine, and preserve their data? How do scientists use libraries?
What the original study and a follow-up survey in 2008 provided was clear evidence of an education gap in the way researchers manage their data. For example, over a quarter of those surveyed had lost important data due to the lack of a backup plan, and nearly half used unsecured, external hard drives instead of off-site servers for backing up data. But proper care of data is more than simply having a robust backup plan. To maximize the usefulness of data, researchers need to plan for its ongoing management, a process called “data curation.”
As information specialists, librarians have been dealing with issues of data curation, including preservation and perpetual access, for many decades and are well positioned to support researchers in this area. Taking the lead to provide this assistance is research services librarian and co-director of the University Digital Conservancy Lisa Johnston. With a bachelor’s degree in astrophysics, Johnston understands the needs of scientists looking to manage their research data. Collaborating with librarians Megan Lafferty and Amy West, Johnston has developed a program that includes an online overview of data management resources as well as workshops and one-on-one consulting.
Demand for these services has been strong, especially from faculty working to comply with funding agency requirements on data management planning and sharing. The National Institutes of Health (NIH) have had data sharing requirements for several years, but just this January the National Science Foundation (NSF) began requiring a data management plan as part of all new NSF proposals. As Johnston has met with faculty, they have expressed interest in making sure their NSF applications have robust data management plans, believing that will give them an edge in a very competitive grant process.
But not all faculty understand the value of creating a plan, or what makes a plan robust. The workshops that Johnston and Lafferty offer answer both the “why” and “how” of data management planning.
Saving Time, Increasing Impact
For those who are not under a mandate to create a plan, it can seem like a lot of extra work to do so. But Johnston explains how that effort early on can save time later. For example, complete documentation for a data set provides evidence for the published results of research and also makes it easy to field requests from funders or other researchers seeking information about the data. Further, studies have shown that researchers who post their data to a public space like a website or repository see an increase in citations to their work.
Having a plan to share data after publication of a researcher’s results can do more than stimulate citations of that publication. In fact, many data sets have value beyond their original research. Take the Human Genome project for example: in 1990, an international research team set out to sequence the thousands of genes that make up human DNA. By sharing their data throughout the project, they not only finished the project two years ahead of schedule, but open access to this data continues to generate new research aimed at curing genetic diseases.
Being convinced of the value of planning for ongoing management of their data is only the first step. To guide researchers through the steps of creating an actual plan, Johnston and her colleagues have created a checklist (see sidebar at right). These detailed questions make clear that researchers need to consider how they are planning to use the data today, as well as how they or others might use it tomorrow. And because every researcher and every data set is different, Johnston and her colleagues stand ready to help them answer those questions.