Supporting Digital Scholarship:

a project funded by the Andrew W. Mellon Foundation

 

John Unsworth

Institute for Advanced Technology in the Humanities

University of Virginia

November 1, 1999

 

Summary:

 

To date, digital library efforts have focused on library-based production of digital primary resources.  This project will, for the first time, address second-generation digital library problems, where the focus is on scholarly analysis, reprocessing, and creation of digital primary resources.  With $1M in support from the Andrew W. Mellon Foundation over three years (2000-2002), the University of Virginia’s Institute for Advanced Technology in the Humanities (IATH) and the University of Virginia Libraries’ Digital Library Research and Development Group will address three closely related problems:

 

1)      scholarly use of digital primary resources;

2)      library adoption of “born-digital” scholarly research; and

3)      co-creation of digital resources by scholars, publishers, and libraries. 

 

We predict that these problems will confront all research universities within the next decade and—because both faculty-driven humanities research computing and digital library activities have been underway at the University of Virginia for most of the past decade—we believe we are uniquely positioned to address these problems now.  The outcome of this project will be methods and guidelines for, and examples of, the management of digital objects across the scholarly information continuum, from creation (by libraries or scholars) to use in research, re-presentation in scholarship, and re-integration into library collections as scholarly publications and tools for research. 

 

Institutional Background:

 

Since its inception in 1992, the Institute has focused intensive support and advanced computer resources on long-term humanities research proposed by faculty at the University of Virginia and elsewhere. To date, the Institute has supported more than forty fellows in architecture, landscape architecture, architectural history, art history, religious studies, classics, anthropological linguistics, medieval and 19th-century British literature, 19th-century American literature, American history, classical history, history of science, archaeology, film, and music, among other disciplines.

 

The majority of this research—indeed, most of the Institute’s work—involves intensive collaboration among groups of scholars, and between scholars and the Institute’s technical experts.  The Pompeii Forum project, for example, sends an interdisciplinary group of researchers to Pompeii each summer, where a systematic survey of the Forum at Pompeii is being conducted using an extremely accurate surveying device known as a laser Total station, and feeding data from that device into a laptop in the field.  These measurements are then brought back to the Institute, where they are processed into two-dimensional plans and three-dimensional CAD models.  Further field-research provides an extensive photographic survey of the buildings at Pompeii, and these photographs are used in conjunction with advanced photogrammetric software to create accurate, photo-realistic surfaces for the three-dimensional CAD models.  Finally, using modeling tools custom-built at the Institute, the researchers are able to combine individual building models into a model of the entire site and even render the walls transparent, in order to see both sides at once, thus producing an analysis of the Forum more detailed, more accurate, and more flexible than any other to date. 

 

The University of Virginia Libraries have established a number of electronic data centers that work closely with the Institute’s staff and fellows: the Electronic Text Center, the Geospatial and Statistical Data Center, the Digital Media Center, and the Special Collections Digital Center.  Library digital centers have provided support to many of the same faculty involved in research with the Institute, and staff from these centers meet regularly with IATH staff and others in a digital library interest group.  Most recently, the Libraries have established a Digital Library Research and Development Group, charged with long-range planning of digital library architectures, systems, and procedures.  Having begun to assemble a broad digital collection, they recognize that no library management system yet exists to handle it and they have dedicated themselves to developing an appropriate solution to the problem.

 

Further information about library digital centers is available on the Web at: http://www.lib.virginia.edu/ecenters.html.  Information about Digital Library Research and Development is available at http://www.lib.virginia.edu/dl/intro/. 

 

Project Goals:

 

Much of what has taken place in digital library contexts to date has aimed at producing large collections of digital data, often—in fact usually—without the involvement of the intended audience for that data, scholars and researchers.  In this project, we aim to foreground the scholarly user—something we believe we are uniquely positioned to do—and from this perspective we will look at the issues of collections development, data management, metadata, and digital library systems.  We expect to complete a number of trials in these areas, and although we do not believe the scope of this project is sufficient to provide universal or definitive solutions, we do expect to arrive at a better understanding of the problems that will be involved in the next generation of digital library activities. 

 

So much hyperbole attends the current phase of digital library development that it may seem surprising to suggest there are things scholars need to do that digital libraries cannot support.  Three scenarios are presented here as examples of some of those unsolved, second-generation digital library problems:


Scenario 1: Scholarly use of digital primary resources   

 

A literary scholar researching the history of a particular poem knows that its author also painted the subject of the poem.  She can find information about the poem and the painting in the digital library, and can even retrieve a digital image of the painting.  The scholar knows that other dual-media works were produced by this author, and she suspects that the author’s arrangements of his paintings in exhibitions might well be significant in understanding the related literary works: therefore, the scholar would like to use the digital library to find out when the painting in question was exhibited and, for a given exhibition date, would like to know what painting was to its left and what painting was to its right—and then see those paintings together in a virtual reconstruction of the exhibit.

 

In this example, we consider the possibility that the scholar of the very near future will want to do something more than browse or perform keyword searches in the digital library.  The promise of the digital library is that it will enable scholars to frame questions that would have been inconceivable without this technology.  And yet, in practice, we find that digital libraries support only very narrowly defined investigative activities.  Partly this is because we tend to treat objects in the digital library as though they had no other temporal or spatial contexts—as though they had always and only existed, discrete and timeless, in our information systems.  Partly, too, these limitations are a sign that the digital library is mainly concerned, at this point, with providing simple access to the discrete digital object, rather than with supporting context, comparison, or analysis—the building blocks of scholarship. 

 

We could begin to grapple with this problem by producing several proof-of-concept example projects, in which data and metadata expressly support more complex kinds of “behaviors” in the digital library, and are associated with other objects in the digital library (e.g., Java applets) that actualize those behaviors on the end-user’s machine.  This follows the Fedora model that the library is already developing, specifically that aspect of Fedora that permits “client access to multiple views, or disseminations, of the object's data through the transparent activation of external mechanisms that execute these content type behaviors” (http://www2.cs.cornell.edu/NCSTRL/CDLRG/FEDORA.html).

 

Scenario 2: Library adoption of “born-digital” scholarly research 

 

An archaeologist spends decades producing detailed digital records of an important classical archaeological site.  The records include CAD reconstructions of individual buildings, topographical maps, photographs, and maps locating particular artifacts in areas and layers of excavation, and large-scale computer models of the entire site.  Upon retirement, the archaeologist offers his entire collection of digital records to the library (since no publisher has ever known what to do with them)—but he offers them on the condition that the library treat these records as a special collection, catalogue them, and make them available through the web to other researchers and students of archaeology. 

 

This example makes plain the problems that libraries will inevitably face as they come to collect digital resources produced by scholars outside of library (and quite possibly, publishing) frameworks.   The problem is likely to be especially acute in the areas of architecture and archaeology, where data is likely to have been produced by researchers in digital form, and where we have few (if any) established conventions for collecting, normalizing, cataloguing, providing, or preserving such data.  A single map or CAD drawing could represent hundreds of hours of research, data gathering, and expert analysis—as valuable, in principle, as a monograph or a journal—and yet libraries might well be unable to accept it, for lack of appropriate systems and procedures.

 

As a pilot project in this area, we can recruit large existing collections of digital architectural and archaeological data (from The Pompeii Forum, Victorian London, The Waters of the City of Rome, Jefferson’s Architecture, and other IATH projects), and use that data to experiment with cataloging, collections, and preservation issues raised in such contexts.  At the end of three years, we would expect to have brought several such collections into the library.

 

Scenario 3: Co-creation of digital resources by scholars, publishers, and libraries  

 

A historian, working together with technical experts in the library’s Geospatial and Statistical Data Center, uses census data, eyewitness accounts, military records and contemporary GIS information to generate a time-indexed, geo-referenced reconstruction of troop movements in a famous civil-war battle.  The research is going to be published by a university press, and the press has contributed original vector data for the underlying map.  At different points in this process, the press, the historian, the historian’s graduate research assistants, and library experts all need to share editorial control of the evolving data set.  At the end of the process, the data set needs to be published by the press, collected in the library, and connected to textual records of the event. 

 

Increasingly, we believe, scholars and libraries and publishers will enter into collaborative arrangements involving the production of digital primary resources by the library, a scholarly treatment of those resources, and electronic publication of the result.  We have already seen many instances of this pattern in IATH research projects.   In retrospect, it seems perfectly reasonable that the institution owning the primary resources (a rare book, a painting, a statue, a map) would want to produce its initial digital representation; once that digital representation exists, it seems inevitable that scholars will want to do what they have always done—edit, contextualize, re-present, and analyze the (now digital) object.  And, if not inevitable, it seems at least likely that the result of this scholarly engagement with digital primary resources will be the stuff of scholarly publishing.  There are many unanswered questions, though, behind these three reasonable assumptions: should it be a goal to have a single authoritative version of the digital object?  If so, how might scholars and/or publishers register corrections or revisions to the original, if the original is produced (and presumably owned) by a library or museum?  If several scholars disagree on the verisimilitude of the digital representation, how will their range of opinions be recorded and connected to that representation?  If electronic editions of the artifact become the norm, instead of an authoritative version with apparatus, then how should those editions be derived and denoted? 

 

At IATH, we already have several projects that raise this sort of problem—the Valley of the Shadow, the Walt Whitman Archive, the Victorian London project, and others.  We have a document management system (Astoria) that will help to address some of the practical procedural issues involved in managing multiple authorship; we will experiment with integrating that system into the library’s production strategies, to address those situations in which a single authoritative version is necessary or desirable, but we would also expect to experiment with managing and coordinating multiple divergent editions of a single base object, or multiple perspectives on an object. 

 

In order to address the many problems—some technical, some social, some intellectual—raised in these three scenarios, we need to move beyond the simple production and cataloguing of digital collections, and begin to recognize that, in the library of the future as in libraries of the past and present, most materials will be produced by many hands, not few; most materials will incorporate many perspectives, not one; and most materials will need to support specialized and pointed research as well as general, blunt queries. 

 

Recognizing these things, we will undertake a collaborative investigation of advanced digital library problems, including library absorption of scholar-produced digital resources, library/scholar co-creation of such resources, and analytical use of digital humanities data.  Within this investigation, our emphasis will be on metadata practices, library systems, and production protocols that support scholarly use.  And though we don't promise to solve all the problems that might be raised in this area, we will establish guidelines that will be useful to others, produce examples that others can imitate, and learn which problems are easy to solve and which are difficult. 

 

Content:

 

We will focus in particular on visual and spatial data, with an emphasis on architecture and archaeology, but also considering visual arts, especially in complex spatial and temporal contexts.  There are a number of research projects already underway in scholarly contexts that are producing and freely distributing digital data in architecture and archaeology.  The problem these disciplines face is that there is no well-established institutional mechanism for collecting, preserving, or publishing digital objects of this sort (CAD drawings, digital topo-maps, 3D models, even digitized photo or slide collections).  Moreover, the strategies for cataloguing and describing of art objects don’t work very well with the more hierarchical and complex information structures that characterize architectural and archaeological data.  With respect to visual arts, we are particularly interested in developing and applying metadata structures that would support comparison, contextualizing, and analysis of art works, and in producing some sample applications that would demonstrate to other libraries and scholars the value of spatial and temporal metadata. 

 

Part of the budget for this project will go, in small one-year awards, directly to ongoing faculty research.  A library/IATH committee will administer these funds, and they will be used to support experimentation, in the context of faculty research, the results of which would generalize readily to other contexts.  Normalizing data, standardizing metadata, capturing new data in accordance with recently specified best practices—all of these are appropriate activities for this committee to fund.

 

Intellectual Property:

 

The University of Virginia will grant to The Andrew W. Mellon Foundation a non-exclusive, royalty-free right to access, use, and distribute for educational, social and/or charitable purposes, the software technologies, tools, and related documents developed as a result of this project and to incorporate such software technologies, tools, and related documents in other projects supported by The Andrew W. Mellon Foundation.

 

Systems, Procedures, and Standards:

 

This project raises technical challenges at the level of information systems design and at the level of standards design and implementation, and it requires a coordinated investigation of these issues by IATH and the Library’s Digital Library Research and Development Group.  Fairly high-level staff will be needed for this: at IATH, we need to support cutting-edge technical work in architecture, archaeology, mapping, and other visual-data fields.  We also need to hire a second person at IATH, to concentrate on database and document management systems, as the production end of a continuum that delivers data to library systems. In the Library, we would like to add a position to the Digital Library Research and Development Group to implement systems and standards for producing, managing, and disseminating visual and spatial data in library contexts, to ensure that those library systems respond appropriately to the needs of research users, and to work with IATH and others on the difficult issues of adoption and co-creation, mentioned above. 

 

Software tools and environments for producing, managing, and publishing large image collections are of interest to IATH, and even more so to the Library, inasmuch as many of our research projects involve the creation and use of extensive image collections.  IATH’s principal interests in this area would be workflow and data management: on the library side, Thornton Staples has been working with the Cornell Digital Library Group (Carl Lagoze) on technical issues involved in the creation of digital repositories, and on the implementation of Lagoze’s Flexible and Extensible Digital Object and Repository Architecture.

 

Both IATH and the Library are very interested in applications of SGML, XML, and HyTime to the problem of describing art, architecture, and archaeological sites.  In particular, we believe there is significant work yet to be done in the description of art collections, the treatment of three-dimensional objects as information structures, and capturing the passage of time as an element of these collections and structures. Thornton Staples has been working in this area, developing something he calls the General Descriptive Modeling Scheme (GDMS), an Extensible Markup Language (XML) document type definition (DTD) that is intended to be used to create textual models describing real-world phenomena (such as creations, events, places and people) and giving a context for describing the content of, and relations among, digital objects. 

 

All of these interests could have a direct relevance to Mellon’s projected work in the areas of art, architecture, and archaeology, in ARTSTOR.  In order for collections of two- and three-dimensional image data to be useful for teaching and research, the ARTSTOR collections will need to be embedded in data structures that can support annotation, multiple spatial and temporal arrangements of works and sites, and the representation of change over time. 

 

Personnel:

 

Library and Institute staff directly involved in design aspects of this project throughout the three years, as part of their regular duties, include:

 

Worthy Martin, Technical Director, IATH and Associate Professor, Computer Science

Daniel Pitti, Project Director, IATH

Thornton Staples, Director, Digital Library Research and Development

John Unsworth, Director, IATH, and Associate Professor, English

 

Other Library personnel who would contribute some part of their time to implementation, as part of their regular library employment, would include:

 

Edward Gaynor (Special Collections)

Rick Provine (Digital Media Center)

David Seaman (Electronic Text Center)

Ross Wayland (Digital Library Research and Development)

Patrick Yott (GeoSpatial Information Center). 

 

IATH fellows whose ongoing research will be directly involved in this project include:

 

Ed Ayers et al., Valley of the Shadow

David Blair, WaxWeb

John Dobbins, Kirk Martini et al., The Pompeii Forum Project

Morris Eaves, Robert Essick, Joseph Viscomi, The Blake Archive

Lavahn Hoh, The Circus in Europe and America 

Jerome McGann, The Rossetti Archive

Michael Levenson et al., Monuments and Dust (Victorian London)

Kathy Poole, Boston Back Bay Fens

Ken Price et al., Walt Whitman Archive

Ben Ray, The Salem Witch Trials

Katherine Rinne, Waters of the City of Rome

Marion Roberts, Salisbury Cathedral

Ken Schwartz, Charlottesville Urban Design

Richard Guy Wilson, Jefferson’s Architecture

 

Management Plan:

 

This project will be jointly managed by John Unsworth and Thornton Staples, with close cooperation among IATH personnel, faculty fellows, and library staff.  Fellows will provide digital objects (maps, photographs, models, etc.) and the metadata to accompany those objects, as well as some functional specifications for scholarly use of those objects.  The Digital Library Research and Development Group will work with IATH and its fellows to establish guidelines for the production of digital data and metadata to be collected and disseminated by library systems, and they will advise IATH and its fellows on the systems design and development issues that attend the adoption of information produced by IATH fellows.  IATH staff will support data production to agreed-upon standards, will consult with fellows and library staff on the specification of those standards, and will work with library staff to prototype the functionality requested and specified by the scholars who produce (and intend to use) the data. 

 

Work Plan:

 

Year One:            Primary objectives in the first half of this year will be hiring, training, and information-gathering (which would include external consultation as well as a thorough analysis of our own data and systems).  In the second half of the year, we will finalize a first version of the General Descriptive Modeling Scheme, while working with individual projects to establish and document standard procedures for producing descriptive, structural, and administrative metadata.

                       

Year Two:            In the second year, we will attempt to deposit information from the Waters of Rome, Boston Back Bay Fens, and the Pompeii Forum projects into the Digital Library, and we would experiment with Java applets provided via Fedora as disseminators for the comparison and analysis of visual art objects based on metadata, probably using projects on Blake, Rossetti, and Salisbury Cathedral.

 

Year Three:            In the third year, we will focus on the difficult issues involved in co-creation of scholarly resources, both technical and social.  We will experiment with multi-author/single version solutions (in the Valley project, Jefferson’s Architecture, and Victorian London), and we will also look at multi-author/multiple edition solutions (with some of the same projects, plus Whitman, Salem, and others).

Dissemination:

 

Information about the problems encountered and lessons learned in the experiments described here will be reported at the conferences that project participants normally attend—annual meetings of the Association of Research Libraries, the Research Libraries Group, the Digital Libraries Federation, the American Association of University Presses, the Modern Language Association, the Association for Computers in the Humanities, the Association for Literary and Linguistic Computing, the annual XML/SGML conference, the Markup Technologies conference; these results would be appropriate to publish in the journals associated with some of these professional associations as well.  Our presentations at these and other conferences would be supported in many cases by the travel portion of the budget.

 

In addition to these venues, the Web itself is obviously an important medium in which to publish project results and documentation—for example, Document Type Definitions, production manuals, best practices, and reports on our failures and successes.  Reusable technical products of the research such as DTDs or software will be freely distributed, updated, and documented through the Web.  Finally, the web-based content that is produced in the different scholarly projects that participate in this research can provide links to “how-to” information.