Electronic Textual Editing and the TEI

at the first of the two sessions sponsored by the Committee on Scholarly Editions at the 2002 Annual Convention of the Modern Language Association
by John Unsworth

Electronic Textual Editing

The Committee on Scholarly Editions welcomes you to its two back-to-back sessions on Scholarly Editing, and as co-chair of the Committee, I'm pleased to say that the focus of these two sessions will be the forthcoming Electronic Textual Editing, a volume of about 120,000 words, consisting of 25 essays from practicing textual editors with experience working with electronic text encoding, sponsored by the Modern Language Association's Committee on Scholarly Editions and the Text Encoding Initiative Consortium, with funding from the Andrew W. Mellon Foundation. This volume will include major new revisions of the guidelines produced by both the sponsoring organizations. Revisions to the CSE guidelines will be covered by Katherine O'Brien O'Keeffe in the next session; in this session, I'll give a little bit of background on the TEI and its guidelines, and an overview of what's new in the recently released fourth edition of those guidelines (all 1088 pages of which will be included on CD-ROM, in the back of the Electronic Textual Editing volume--but if you want them in print, you can order the two-volume set from the University of Virginia Press, http://www.upress.virginia.edu/books/tei.html).

The purpose of the Electronic Textual Editing volume, and of these sessions, is to bring together two scholarly standards efforts--the Committee on Scholarly Editions' and the TEI's--and to make both efforts more readily accessible and intelligible to scholarly editors by providing both sets of guidelines in the context of a methodological overview, specific cases and examples, and detailed discussion of particular procedures. Beyond that, this book project originated in the observation that textual editing was slowly dying out, just at the time that the Web had provided the occasion for making available electronic editions of thousands of literary, historical, philosophical, religious, and other texts. A rising generation of academics are responsible for many of these texts; libraries serving those academics are responsible for many more. If the production of these texts is to be considered a scholarly activity on the part of these academics, or if the texts provided by these libraries are to be useful for scholarship, then those who evaluate such activities, or select such texts, must have a way of distinguishing between good and bad, better and best, not only on intellectual grounds, but also on technical ones. The standards used in this evaluation and selection should represent a broad, community-based consensus, should be based on solid principles as well as informed by experience, and should be developed, disseminated, and maintained by credible instititions.

While it would be foolish to assert that the CSE and the TEI are without critics, skeptics, and detractors, they do in fact represent a broad, community-based consensus, and they are, in their respective arenas, the only credible institutions attempting to develop, disseminate and maintain general (rather than project-specific) guidelines. Both organizations have been accused, at various points in the past, of promoting a monologic orthodoxy, but in fact each organization has devoted significant time and effort to accommodating difference--the CSE in the evolution of its guidelines over the last decade to accommodate a greater variety of editorial methods and a broader range of materials and periods, as well as editions in electronic media, and the TEI, most importantly, in its extension mechanism, as well in its consistent insistence, over its fifteen-year history, on international and interdisciplinary representation in its governing bodies, its workgroups, its funding sources, and its membership. The complementary nature of the guidelines produced by these two organizations, the fact that both sets of guidelines have recently undergone major revisions, and the need, described earlier, for guidance in producing and evaluating electronic scholarly editions, make this an opportune moment for these two sessions and, in the longer run, for Electronic Textual Editing.

Allow me to briefly summarize the contents of that volume, as a way of providing context for what you'll hear in these two sessions. The volume begins with a foreword by G. Thomas Tanselle, and an introduction by the editors (Lou Burnard, editor of the TEI, Katherine O’Brien O’Keeffe, co-chair of the CSE, and John Unsworth, chair of the TEI Consortium and TEI Council, and co-chair of the CSE). Next, the volume will include the CSE's newly revised Guidelines for Editors of Scholarly Editions, including

Then there are a dozen case studies: These are followed by a section on practices and procedures, with contributions on And finally, the editors' conclusion, works cited, and, in the back of the book, a CD-ROM with the full text of the TEI Guidelines (P4) and example texts.

TEI and the TEI guidelines

The 15-year anniversary of TEI has just passed, in November of 2002. According to the TEI Guidelines,

The Text Encoding Initiative grew out of a planning conference sponsored by the Association for Computers and the Humanities (ACH) and funded by the U.S. National Endowment for the Humanities (NEH), which was held at Vassar College in November 1987. At this conference some thirty representatives of text archives, scholarly societies, and research projects met to discuss the feasibility of a standard encoding scheme and to make recommendations for its scope, structure, content, and drafting. During the conference, the Association for Computational Linguistics and the Association for Literary and Linguistic Computing agreed to join ACH as sponsors of a project to develop the Guidelines. [from http://www.tei-c.org/P4X/AB.html#ABTEI]

The reason for this 1987 conference, and indeed for the TEI itself, was the need--felt by these "representatives of text archives, scholarly societies, and research projects"--for an means of encoding electronic texts that was not tied to specific hardware or software, and thus might survive longer than a particular model of computer, a particular release of a word-processing or hypertext package, or even a particular operating system. This need, in turn, arises from the fact that, while hardware and software change very rapidly--now on about six-month cycles--scholarly projects often take years, sometimes decades, to develop. TEI was, in short, an attempt to bridge that gap.

The design principles of the TEI are explained, in the Guidelines, as follows:

the TEI scheme is driven by its original goal of serving the needs of research, and is therefore committed to providing a maximum of comprehensibility, flexibility, and extensibility. More specific design goals of the TEI have been that the Guidelines should: This has led to a number of important design decisions, such as: [from http://www.tei-c.org/P4X/AB.html#ABDPIU]

It is a testament to the soundness of these principles and decisions that, fifteen years later, the TEI is still be a viable text-encoding scheme, and further, that some of the earliest scholarly projects to adopt it (for example the Perseus Project) have managed to migrate their content over the many generations of hardware and software that have come and gone since 1987. In its most recent revision, the TEI has even proven capable of surviving across major shifts in the syntax used to express it, in this case, the shift from SGML to XML.

In a paper delivered at this convention in 1994 (a sort of half-way point between the inception of the TEI and the present moment) Michael Sperberg-McQueen discussed "Textual Criticism and The Text Encoding Initiative", in terms quite relevant to the present occasion. In this talk, he proposes the appropriateness of TEI encoding for the purposes of producing electronic scholarly editions, a proposition he bases on assumptions enumerated as follows:

  1. Electronic scholarly editions are worth having. And therefore it is worth thinking about the form they should take. . . .
  2. Electronic scholarly editions should be accessible to the broadest audience possible. They should not require a particular type of computer, or a particular piece of software: unnecessary technical barriers to their use should be avoided.
  3. Electronic scholarly editions should have relatively long lives: at least as long as printed editions. They should not become technically obsolete before they are intellectually obsolete.
  4. Printed scholarly editions have developed their current forms in order to meet both intellectual requirements and to adapt to the characteristics of print publication. Electronic editions must meet the same intellectual needs. There is no reason to abandon traditional intellectual requirements merely because we are using a different medium to publish them.
  5. On the other hand, many conventions or requirements of traditional print editions reflect not the demands of readers or scholarship, but the difficulties of conveying complex information on printed pages without confusing or fatiguing the reader, or the financial exigencies of modern scholarly publishing. Such requirements need not be taken over at all, and must not be taken over thoughtlessly, into electronic editions.
  6. Electronic publications can, if suitably encoded and suitably supported by software, present the same text in many forms: as clear text, as diplomatic transcript of one witness or another, as critical reconstruction of an authorial text, with or without critical apparatus of variants, and with or without annotations aimed at the textual scholar, the historian, the literary scholar, the linguist, the graduate student, or the undergraduate. They can provide many more types of index than printed editions typically do. And so electronic editions can, in principle, address a larger audience than single print editions. In this respect, they may face even higher intellectual requirements than print editions, which typically need not attempt to provide annotations for such diverse readers.
  7. Print editions without apparatus, without documentation of editorial principles, and without decent typesetting are not acceptable substitutes for scholarly editions. Electronic editions without apparatus, without documentation of editorial principles, and without decent provision for suitable display are equally unacceptable for serious scholarly work.
  8. As a consequence, we must reject out of hand proposals to create electronic scholarly editions in the style of Project Gutenberg, which objects in principle to the provision of apparatus, and almost never indicates the sources, let alone the principles which have governed the transcription, of its texts.
In sum: I believe electronic scholarly editions must meet three fundamental requirements: accessibility without needless technical barriers to use; longevity; and intellectual integrity. [from http://www.tei-c.org/Vault/XX/mla94.html]

Michael goes on to assert that the TEI meets the first two of those requirements--removing needless technical barriers by freeing the text of dependence on particular hardware and proprietary software, and ensuring longevity in the same way--but does nothing in particular to ensure intellectual integrity:

With the TEI, as without it, integrity remains inescapably the responsibility of the creator of an edition; all that the TEI can do is to provide the mechanisms needed to allow textual critics to create intellectually serious electronic editions using the TEI encoding scheme. [from http://www.tei-c.org/Vault/XX/mla94.html]

I would go a little further, and say that any attempt to systematically express one's understanding of a text in the kind of internally consistent, explicit and unambiguous encoding that is required in the creation of a computable SGML or XML edition will produce some intellectual benefit and will ensure some degree of integrity. Beyond that, the exercise of considering, applying, modifying, or rejecting the taxonomy of literary and linguistic texts that is proposed by the TEI provides a kind of rigor, even if the end result is to develop a non-TEI encoding scheme. But now that the CSE guidelines have been revised, they provide a useful tool for assessing the intellectual integrity of an electronic edition, and a set of guidelines that goes beyond the discipline of text encoding in addressing the principles, problems, and practices of scholarly editing, across materials and media. And now that the TEI guidelines have been revised into compliance with XML, the technical barriers to its use are significantly lower than they were in P3 and previous (SGML-only) editions, since XML has become the lingua franca of the Web, and now that there is so much new, free or inexpensive software for use with XML resources.

There are still challenges for TEI, and for the CSE, and there will be future revisions and editions of both sets of guidelines, without doubt--but that's a necessary feature of a living standard, that it should change and develop to keep up with changes in the rest of the world. A standard that stands still is dead. And that's why, in turn, it is necessary for these standards to be community-based, and to be maintained by an institution that has a basis in a community, and organizational mechanism for renewing and perpetuating itself. In the academy, especially in the humanities, we like to think of ourselves as antinomians, rebels, individualists--but in fact, we do participate in communities, and there is some benefit in expressing the consensus of those communities in things like standards and guidelines, if only to lay our cards on the table, and articulate in public the rules that we will, in any case, apply in private, in hiring and tenure and promotion, in publishing, and in library adoption.