By Allison Campbell-Jensen
In April 2010, University of Minnesota Libraries began a major project with the goal of sending 1 million items from our collection to Google’s book digitization project with academic libraries. This year we celebrate that major effort that still continues 10 years later. Along with the Google Project, the Libraries were part of creating a collaboration, instigated by the Committee on Institutional Cooperation (the Big 10 plus University of Chicago) to create a Shared Digital Repository among academic and research libraries, now named the HathiTrust.
Research library leaders have long had a vision of a large-scale digital library, made possible by mass digitization and high-speed networks. The benefit was clear to our users earlier this year, when the Libraries’ buildings closed, thus curtailing access to physical collections. We received Emergency Temporary Access Service enabling full view of our in-copyright holdings in HathiTrust, which just ended Aug. 13.
“It was there to help us in ways that we never would have received, if we hadn’t invested 10 years of staff time in this,” says Kirsten Clark, Director of Access & Information Services.
The pillars of our Google partnership
Yet it was the Google Project that provided the foundation for building the HathiTrust digital library. Once Google has transported the materials to their center and scanned them, they return the print and a digital copy to the contributing library, which for Minnesota was the HathiTrust. In maintaining the partnership with Google, certain staff members have been pillars of our Google project.
In the beginning of the Google Project, there were concerns about a large academic institution sending materials to a private company, says Suzan Zniewski Hallgren, the initiative’s original and longstanding Project Manager. She and other members of the team talked in departmental meetings to staff and stakeholders about why they were doing it — to provide online access — and how they would carry it out to avoid disrupting our patrons. It was both an exciting and a stressful time, she says. John Butler, Associate University Librarian for Data & Technology, says Hallgren provided “very creative, cohesive, and effective management of the project for the first 10 years.
Clark was head of government publications when prior to 2010 a preliminary project began. “We were in the process of cataloging our collections and opening access, so this was a great opportunity,” she says. For the initial stage of the Google partnership, the Libraries sent to be scanned about 84,000 government documents, which are mostly in the public domain. Those documents were then shared with HathiTrust. Clark’s expertise with government documents led to her serving on several HathiTrust advisory committees over the last seven years. And the Libraries’ contributions of U.S. government documents, Butler says, have really “catapulted” HathiTrust’s to a substantial collection.
At the start, pulling those documents was a job for Wendy Kieser, now in Collection Management & Preservation. As the Google Project at the U transitioned to its major effort of pulling books to be scanned, she became the Operations Manager, overseeing student workers. About six months ago, she became the Project Manager, working with the sponsors and taking on the job of being in contact with the Google team about shipments. (Currently, Google Project is taking a break because of the coronavirus.)
Even before Kieser and her team can get to work, however, Data Analyst Chris Rose has an essential role. He provides Google with a list of our catalog, as do other partner institutions.
“Once Google has decided what Minnesota is able to provide, versus all the other institutions who are partners,” Rose says, “they post candidate lists. I download the candidates list and supplement with data from our system.” The team that pulls materials also updates records in the Libraries’ Alma system and Rose provides files, including metadata, for the shipments.
Google’s stringent criteria
Google has stringent criteria for materials to be scanned: They must be in good shape, with intact spines and paper that is not too brittle. And they must not be too large for Google’s scanners. Rose says that while preparing materials for Google, the Libraries evaluated more than 1 million items.
The number of moving carts going to and from Google has ebbed and flowed over the years, says Bernadette Corley Troge, Director of Libraries Facilities Management. Her staff has coordinated the physical moving of book carts to and from the various campus libraries to the Google project workspace. In addition, the library building managers had to find space for the project to set up on site. At the start, her team had to find a path from each library to a loading dock in order to avoid stairs and to find elevators large enough to hold the carts. Members of the shipping unit often have to maneuver heavily laden book trucks onto loading docks.
“Sometimes, it just takes finesse and brute strength to get the book carts on the back of the library truck,” Troge says.
All those named and many more who worked with them have been key to our success. Yet, Butler notes that it is conceivable that Google, as a for-profit company, might go away at some point. Research libraries, however, have been around for hundreds of years and part of their mission is to provide enduring access to information over time. So with this mission, should research libraries solely entrust a commercial entity with all this material? The community’s response was the HathiTrust digital library, a member-governed access and preservation organization committed to a longstanding digital future for these materials.
Google Project to HathiTrust
The Libraries entered the Google Project through our membership in the former Committee on Institutional Cooperation (the Big 10, plus University of Chicago). Initial partners in HathiTrust also included the CIC (now the Big Ten Academic Alliance).
“Then, this thing exploded,” Butler says.
Now there are more than 150 member institutions, with an international reach. The vast majority of the HathiTrust’s holdings derive from Google Project materials, with some contributions from the Internet Archive and members’ digitization projects.
In 2017, HathiTrust averaged 22 million hits a month. Yet, the digital library also was promoting that its members retain print monographs that mirror the digital collection, to ensure preservation of both versions, to catalyze collective management, and to maintain a lendable print collection.
Although almost 40% of the items being preserved in HathiTrust are in the public domain (either as defined in the U.S. or worldwide), a portion is in copyright. Every word on the 6.1 billion pages of digitized text is searchable. For in-copyright works, however, full view access generally is restricted. Yet an Accessible Text Request Service is available from HathiTrust for users affiliated with member institutions who are blind or print-disabled. They can request copies of copyrighted books, listed as limited (“search-only”) in the collection.
Access to HathiTrust collections is open to anybody anywhere to: search the collection via the web; read public domain and open access works via the web; and, build and share customized collections. Members, moreover, may download public domain and open access works and, in the U.S., gain replacement access for lost and damaged print copies.
Words in books as ‘big data’
Digitized books provide opportunities for computational analysis. Rather than obtaining items to be read, which is termed consumptive research, in non-consumptive research, scholars use large quantities of digitized text to do such things as look for patterns in a chosen language, seek to uncover the lexicon of a certain period, or make a literary, geographical or political analysis. Use of in-copyright works for these purposes is allowed, as long as information is not gathered to reassemble pages of in copyright works. The HathiTrust Research Center supports these kinds of studies.
At 17 million items and counting, HathiTrust’s collections continue to increase and promise to serve patrons near and far.