By Allison Campbell-Jensen
“Data management is at the core of what researchers are doing,” says Melinda Kernik, a Spatial Data Analyst and one of the instructors of the recent Graduate Student Crash Course in Managing Data. “It can make or break your project.” Another instructor of the course, Valerie Collins, Digital Repositories and Records Archivist, says, “When it’s done with care, you can make sure your work is findable in the future.”
Kernik and Collins are members of the Research Data Services Team, and they led the first hour of an online 90-minute data management workshop attended by more than 100 early-stage graduate students. Their efforts were appreciated.
“I thought this was a fantastic learning opportunity,” said one student. “I appreciated just getting to hear all of the librarians’ thoughts.”
Kernik and Collins’s main message? Consistency is key.
Data management guidelines
At every stage of a research project, data management is a powerful practice. How one chooses to do it is up to the researchers, but they must figure out a system, starting with folder and file naming guidelines. Files and folder names ought to be both descriptive and concise — not more than 30 characters. Moreover, researchers need to create a system so that a list of files and folders displays in the order that they expect. Students indicated they especially appreciated this advice through their feedback.
Then researchers need to keep track of where files should be saved — not in the downloads folder of your computer, Kernik and Collins advise. Those collaborating with others should create mailbox folders to share files. And don’t store files on your desktop.
Another student wrote: “I think I just need encouragement to get started on the organization process. It is overwhelming to start once things are messy, so it’s great to hear examples and be encouraged.”
There are different sorts of risks to data. Kernik and Collins presented many possible physical risks to files, including hard drive crashes, stolen laptops, department flooding, spilled coffee cups, tornadoes, and more. In addition, there are technical risks, as when software changes, passwords are lost, and permissions fail. Also, there may be intellectual risks, as when a collaborator changes institutions and questions of ownership or intellectual property arise or the way processes were done are lost.
The storage strategy to combat risk that Collins and Kernik advocate for is the 3-2-1 Rule. Always have three copies of your work. One is the working copy, the other two are backups. Those two should be held in different kinds of storage — with at least one copy off-site from the others.
Storage is a complicated matter. For instance, storing files in “the cloud” really means storing them on a third-party server — and terms can change. Kernik and Collins offered options from the U and also a storage selection tool to help the workshop participants.
Documenting a project
Along with decisions about file naming, researchers need to keep track of how they collect data, process it, and analyze it. They also need to record external sources of data and any permissions, the software used, and notes from meetings.
Putting comments in an Excel file, a text file, or embedded in code, can be helpful later. A key piece of advice from Kernik and Collins: take 15 minutes at the end of every work session to document your work. Quoting Rachael Ainsworth, they advised: “Your primary collaborator is yourself and your past self doesn’t answer emails.”
A student will be applying these principles in the near future: “Since I will be starting a project late this fall or early next spring, I need to see if my group has a data management system already and understand it, or develop one myself before the data collection starts and make sure I stick to it throughout.”
The course then moved into three breakout sessions on managing qualitative data, quantitative data, and human subjects data.
Human subjects research breakout session
The breakout session on human subjects research was led by Alicia Hofelich Mohr, Research Support Services Coordinator for Liberal Arts Technologies and Innovation Services (LATIS), and Shanda Hunt, Public Health Librarian and Data Curation Specialist. This research is often overseen by the Institutional Research Board at the U, which requires submitting an IRB application, study protocol, and consent forms that detail how data will be handled during and after the study.
Hofelich Mohr and Hunt stress the importance of storing data in a secure way and de-identifing data sets before sharing them to reduce the potential for revealing the identity of or sensitive information about a participant. There was a high level of interest among computer science students for the course, which Hofelich Mohr and Hunt surmised had to do with a move toward human subjects research in that area, for which computer science students are less prepared.
The two instructors had a detailed handout to go over — which was larger than usual to address COVID-related changes — and less time than usual, so it was “a whirlwind,” Hunt said. Still, they conveyed the principal ideas.
“How to balance openness and protecting a participant’s confidentiality is something we talk about a lot,” Hofelich Mohr says.