The Value of Digitization for Libraries and Humanities Scholarship

John Unsworth

Dean, Graduate School of Library and Information Science
University of Illinois, Urbana-Champaign

The Newberry Library, May 17, 2004

An Innodata Isogen Symposium

from the Blake Archive

Digitization and Libraries

"Digitization" implies the production of a digital surrogate for a physical object. Obviously, we don't speak of "digitizing" something that's already digital. And in the context of our discussion today, it is the digitized, not the born-digital, artifact that is most important, because the most common kind of digital artifact in library collections today is a digital surrogate for a physical artifact. For that reason, too, the most important questions about the value of digital artifacts, at the moment, are questions having to do with the artifact as surrogate. Chief among those questions are:

Many of these questions are treated in some detail in Paul Conway's contribution to The Handbook for Digital Projects: A Management Tool for Preservation and Access:

"The Preservation Purposes of the Digital Product [include efforts to]. . . . Protect Originals. . . . Represent Originals. . . . [and] Transcend Originals. . . . In a very small but increasing number of applications, digital imaging holds the promise of generating a product that can be used for purposes that are impossible to achieve with the original sources. This category includes imaging that uses special lighting to draw out details obscured by age, use, and environmental damage; imaging that makes use of specialized photographic intermediates; or imaging of such high resolution that the study of artifactual characteristics is possible."

While Conway makes clear the promise of the digital surrogate, the risk posed by these surrogates is presented by Angelika Menne-Haritz and Nils Brübach, in "The Intrinsic Value of Archive and Library Material":

"the loss of testimony is endangered, not only through . . . physical degeneration . . . but also through the unconscious destruction of evidence as to the context and circumstances of their origin, which can occur during their conversion and must therefore be prevented by a previous analysis of . . . intrinsic value."

The problem to which Menne-Haritz and Brübach refer is not unique to digital surrogates, by any means: bad editions in printed form pose the same threat, and indeed the early history of printing is, in part, a history of the loss or destruction of manuscript materials "replaced" by printed versions—the sources for which are now both undocumented and unrecoverable. In any case, these German archivists present the most reductive view of the value of digital surrogates, saying

"The loss of evidential value and permanent accessibility inherent in digital forms and textual conversion [by OCR] exclude them as a preservation medium. They can only be employed in addition to preservation on film in order to increase the ease of use," ("The necessity of criteria for conversion procedures" in "The Intrinsic Value of Archive and Library Material")
and, at another point, flatly stating that:
"digital imaging is not suitable for permanent storage." ("Imaging" in "The Intrinsic Value of Archive and Library Material.")

A preservation program based entirely on film, with digital surrogates used only for distribution of photographic images, may not be practical in all cases, though—and it is at this point that we must confront the differing missions of libraries and archives. Archives may well decide that issues of evidential value rule out "digital forms and textual conversion," whereas libraries might reasonably feel, in certain cases, that their mission of preserving and providing access to (fungible) information is adequately served by providing digital surrogates.

In fact, it is probably impossible to give a single answer to the question "What is the value of a digital surrogate?" since the answer depends, to a large extent, on the nature of the original and the conditions of its use. Therefore, as a means of determining the value and appropriate use of digital surrogates for library holdings, it may be useful to divide the original materials into those that are rare and those that are not, and to divide them further into those that are frequently used and those that are infrequently used. There would be, then, four possible cases:

1. Materials that are not rare and that are frequently used:

In this case, we can assume that preservation of the original is not a particularly high priority (since the original is not rare); nevertheless, digital surrogates for such an object might be worth producing and providing, for several reasons:

The first two are obvious and uncontroversial benefits. The third is potentially problematic, even if the object in question is not rare, because it is not obvious that digital surrogates provide all the functionality, all the information, or all the aesthetic value of originals. Therefore, while it may be sensible to recommend that digital surrogates be used to reduce the cost and increase the availability of library holdings that circulate frequently, the decision to deaccession a physical object in library collections and replace it with a digital surrogate should be based on a careful assessment of the way in which the original object (or objects of its kind) are used by library patrons. It is not necessary that the digital surrogate possess all the qualities and perform all the functions of the (not rare) original, but it is necessary that the digital surrogate answer to the identifiable needs and expectations of those who frequently used the original.

2. Materials that are not rare and that are infrequently used:

Many libraries now store infrequently used books (and other materials) in long-term storage facilities. Those materials are retrievable and available to library patrons, but only after a wait of two or three days. With such materials, digital surrogates might:

Again, the first two are clear and uncontroversial benefits, and the third comes with the caveat, as in 1., that the digital surrogate should answer to the identifiable needs and expectations of those who (in)frequently used the original. At some point, of course, especially with infrequently used materials that are not rare, libraries might reasonably be expected to evolve a calculus that balances functionality with actual use, in order to help decide when digital surrogates that provide most of the functionality of originals are acceptable.

There is one other point that needs to be raised, especially here, where we are discussing the component of library collections that has the least "market value." Libraries, as an institutional and cultural community, need to consider whether these infrequently used and commonly held materials are, in fact, being preserved in a concerted and deliberate way in their original form by any one (or more than one) library. If they are not, the sources for digital surrogates that are common today could easily become rare, or non-existent, tomorrow. This is the substance of Nicholson Baker's objection to libraries discarding their newspaper holdings. If there are fifty libraries that hold the same issues of the same newspapers in original form, at great expense and with limited use, then it is difficult to make the case that all of them should pay to house, shelve, reshelve, and preserve the originals, but if forty-nine of those libraries, over time, have replaced their physical holdings with digital surrogates, one certainly hopes that the fiftieth library would be aware that its physical holdings were now rare, and therefore subject to considerations outlined in cases 3 and 4, below.

3. Materials that are rare and are frequently used:

In this case, the principal (and very obvious) benefits of digital surrogates are:

Few would argue that truly rare materials should be replaced by digital surrogates: digital technology, and techniques of digitization, are so new, and are still developing so rapidly, that we can't have any confidence we've devised the best method for extracting and digitally representing information from any analog source (whether it is a printed page, an audio tape, or a film strip). Nonetheless, digital surrogates could, in many cases, stand in for rare and frequently used materials, and could thereby aid in the preservation of originals.

4. Materials that are rare and are infrequently used:

On the face of it, these materials seem the least likely to be represented with digital surrogates, if only because digitizing is expensive. On the other hand, if the cost of housing a rare but infrequently used object rises high enough, then digitizing and deaccessioning that object may become an attractive possibility. Here again, as in 2, above, one hopes that libraries, as a community, are aware of the lastness, the actual or potential rarity, of even those materials used infrequently today. Tomorrow, those may very well be the most valuable of artifacts, perhaps for users, or uses, that one could not predict today.

Having considered these four alternate conditions, let us revisit the questions with which we opened this discussion of digital surrogates, and try now to provide some answers to those questions:

When can a digital surrogate stand in for its source?

When it answers to the needs of users.

When can a digital surrogate replace its source?

If the source is not rare.

When might a digital surrogate be superior to its source?

In cases where remote or simultaneous access to the object is required, or when software provides tools that allow something more or different than physical examination. When the record of the digital surrogate finds its way into indexes and search engines that would never find the physical original.

What is the cost of producing and maintaining digital surrogates?

The cost of producing digital surrogates depends, among other things, on the uniformity, disposability, and legibility of the original. The cost of maintenance depends on frequency of use and the idiosyncracy of format, but beyond that it depends on technological, social, and institutional factors that are difficult or impossible to predict—which is an important reason for being cautious when one chooses to replace a physical object (the maintenance costs for which are known) with a digital surrogate (the maintenance costs for which are, to some extent, unknown).

What risks do digital surrogates pose?

The principal risk posed by digital surrogates is the risk of disposing of an imperfectly represented original because one believes the digital surrogate to be a perfect substitute for it. Digital surrogates also pose the risk of providing a partial view (of an object) that seems to be complete, and the risk of decontextualization—the possibility that the digital surrogate will become detached from some context that is important to understanding what it is, and will be received and understood in the absence of that context.

from The Rossetti Archive

Digitization and Humanities Scholarship

Libraries collect and preserve (and winnow, and deaccession) materials in order to serve some other purpose: in the case of academic libraries, scholarship is one such purpose, and nowhere in scholarship are libraries more important than in the humanities, where for centuries the library has been the laboratory. The most obvious benefit of digitization, for the humanities, is access to primary source materials. The aggregation of these resources, in digital form, is bound to provide new sources for humanities scholarship. Less obvious, and further out, we can expect to see new computational methods and new tools for humanities scholarship—new tools for discovery and analysis, for finding and exploring patterns.

With respect to the humanities, objects of study can be images, texts, sounds, maps, performances, concepts, three-dimensional objects. When we make a digital surrogate for any one of these, we always believe that our aim is to represent it as accurately, as faithfully as possible, with the least possible interference, or noise, in the process—but when, as scholars, we deal with these digital surrogates, or produce our own, we learn that there's no such thing as an innocent act of representation: every representation is an interpretation. Simple questions, like "is this poem a separate work, or is it part of a larger set of poems?" can be unavoidable—in markup, for example—and they can also raise issues that are critical to understanding the work in question. However we decide the question, we are both informed and constrained by our own decisions, when subsequent and related issues arise. Likewise, with images, when we digitize, we choose file-type, compression, color-correction, and other settings based on what we consider valuable and significant in the image—and when our chosen strategy is applied across a large body of images, or when others come to our digital surrogate with purposes we hadn't shared or predicted, we are bound to confront the fact that our surrogate has been shaped by the perspective from which it was produced. In this sense, the real value of digitization for humanities scholarship is that it externalizes what we think we know about the materials we work with, and in so doing, it shows us where we have overlooked, or misunderstood, or misrepresented significant features of those materials. As an example, in a project on Victorian London at The Institute for Advanced Technology in the Humanities at the University of Virginia, we built a complete model of The Crystal Palace, down to the last pane of glass, last guy-wire, last stair-tread. In the course of reconstructing the Crystal Palace virtually, I am sure we learned some things about the real building that no one, since the original builders, has known. In the same way, though in a different medium, the exercise of developing the SGML Document Type Definition for the Rossetti Archive, we went through an iterative process of modeling the components of Rossetti's paintings and poetry, an exercise that forced an explicit discussion of the nature of these materials, the relations between their parts, and the rules that could be deduced to govern the markup that would represent these. I guarantee that unless digitization of the materials had been involved, and unless the scholar-expert had been party to that digitization, these discussions would never have taken place, and this explicit specification of the scholar's understanding of the materials would never have emerged.

There's a great deal more that could be said on this subject, but I would like to stop on this point, to focus our attention on it. The value of digitization for humanities scholarship is that it externalizes interpretation, re-presents it to us in the form of the surrogate, and forces us, as humanities scholars, to confront and evaluate our beliefs and understandings, concerning the object of digitization, as well as our perspectives and purposes with respect to it. Of course, it can only have this effect if the scholar is actually involved in the process of digitization, at some level: otherwise, what would be self-criticism and self-understanding becomes simply the criticism of the shortcomings of a non-specialist. For that reason, I urge you—librarians and scholars alike—to collaborate in digitizing humanities materials. In fact, in projects like the Whitman Archive—and in particular, in their sub-project to create a virtual finding aid and item-level representations of all of Whitman's manuscripts, across a number of archives and libraries—you can see very good examples of this kind of collaboration. Scholars can learn a great deal from the expertise of librarians in cataloging and classification, in information organization, in preservation and access. By the same token, librarians can learn a great deal about the peculiar and idiosyncratic characteristics of individual works, or authors, or movements, or literatures, by working with specialists who know—or think they know—all the features and fine points of that material. Working together is perhaps the best way to find the proper balance point, in a project involving digital representation, between the abstract and the particular, between the collection and the item, between the librarian and the scholar.


Some of the foregoing was originally drafted by the author for The Evidence in Hand: Report of the Task Force on the Artifact in Library Collections, published in November, 2001, by the Council on Library and Information Resources:

Menne-Haritz, Angelika and Nils Brübach. "The Intrinsic Value of Archive and Library Material." Digitale Texte der Archivschule Marburg Nr. 5:

Sitts, Maxine K. Ed.. Handbook for Digital Projects: A Management Tool for Preservation and Access. First Edition. Northeast Document Conservation Center, Andover, Massachusetts, 2000:

