Adriaan van der Weel, Digital Text and the Gutenberg Heritage - PDF

Please download to get full document.

View again

of 21
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Research

Published:

Views: 376 | Pages: 21

Extension: PDF | Download: 0

Share
Related documents
Description
Adriaan van der Weel, Digital Text and the Gutenberg Heritage Ch. 3: The Concept of Markup 1. Implicit Markup 1a. Homo Typographicus Chapter 1 looked at some of the changes a digital order might bring
Transcript
Adriaan van der Weel, Digital Text and the Gutenberg Heritage Ch. 3: The Concept of Markup 1. Implicit Markup 1a. Homo Typographicus Chapter 1 looked at some of the changes a digital order might bring to textual transmission. One of the more revolutionary developments suggested was the notion of intelligent text, or text that knows about itself. Unfolding a vista of what this might entail, it was suggested that given a sufficient aggregation of sufficiently structured data in a given subject area one could imagine a database capable of dynamically generating web pages which answer a particular information need with great precision. In this chapter we will look at one of the main methods by which textual data can be structured in such a way that they can be processed intelligently by computers: by applying a markup language. The concept of applying explicit markup to digital text requires an awareness of the nature and function of various aspects of text that in a conventional paperbased environment remain largely implicit, and so we should begin by examining these. Besides the verbal messages they bear, texts have a logical structure. The verbal message is usually read in a linear fashion, from beginning to end. The logical structure by contrast takes a bird s eye view. Taking the book you are reading as an example, the verbal message may be read linearly: from the first to the last chapter. But the logical structure cuts the book into chapters, each chapter into sections, and each section into paragraphs. [illus: tree diagram] The structure of a text distinguishes logical elements which, through the way they hang together, decide the nature of the text, and thus make a distinct contribution to the message of the text as a whole. We read texts that are structured as poems differently from the way we read texts that are structured as letters. As readers, we only need to glance at a printed page to recognise segments of text as footnotes, quotations, marginal glosses and so on. Without reading a letter of the text we are able to identify title pages, chapter openings and other major divisions within a book. And so we don t have to read an entire text before coming to the conclusion that we re dealing with a particular type of text, such as a poem or a letter. We take in that information about the text at a single glance at its visual appearance. The logical structure of the elements that make up a particular type of text is usually straightforwardly interpreted from the text s typography. Logical elements are rendered distinct by a variety of typographic means, such as type size, the use of bold, italics, typefaces, white space. However, the connection between typographic form and the place of elements in the logical structure is arbitrary and is based entirely upon convention. From the earliest times, writing has been governed by conventions. Conventions rule, for example, the direction in which we write (in other writing systems write from right to left, top to bottom or even boustrophedontic, which is to say the way the oxen turns when ploughing: from left to right and right to left in turn); the way we end one and begin another sentence; the meaning we attribute to punctuation marks and the white space surrounding characters. Although the early mediaeval scribes, writing on vellum, identified the first letter of a paragraph by highlighting it, sometimes simply in red, often more ornately, with artistic flamboyance, they lacked many of the other punctuation and spacing conventions to which we are used. The text proceeded in a virtually unbroken line, with the exception of closely spaced full stops, down two columns on each page, until the next paragraph. Even the convention of separating words was relatively new to them. In chirographic practice (the practice of writing by hand) it had taken a long time before word spaces were standardly used as a structuring element. In Roman times a raised dot was often used before the Greek custom of scripta continua (running words together) was adopted. It was only when Irish Christian scribes reinvented the word space around the fifth century AD that it came to stay. Word breaks are a convenience that we have since learned to depend upon as readers. But it was not at all obvious that word breaks should have been adopted as a convention. In oral conversation the flow of words is after all uninterrupted, [#check] though sentences are usually punctuated. (Segmenting speech into meaningful units is therefore one of the major challenges confronting anyone who sets out to learn a new language.) As more people learned to read and the practice of silent reading developed, the demand for structuring conventions such as word spacing and clearer rubrication for punctuation and section headings increased. The invention of printing in the middle of the fifteenth century called for a further elaboration of typographic conventions. Rubrication, after all, was scribal handwork which in the long run went against the nature of printing. Print can exploit the fact that it is more precise than handwriting in rendering subtle spatial variations, differences in type size and weight and so on.1 In this way, print both made possible and demanded alternatives to rubrication, but also led to the invention of a whole new array of typographical structuring devices. By its nature it fostered codification: the development of standards and conventions through which the structure of a text, and thus a central part of its meaning, could be transferred more faithfully. For these structuring conventions we use the term mark-up. The term markup originates in the world of print, where it denotes the instructions for the typesetter written in the margin of the text being prepared for printing. These instructions represent in effect a conscious interpretation of the logical structure of a text. The term was then adopted for the interpretation of the logical structure of texts being prepared for digital transmission by means of markup languages such as the Standard Generalised Markup Language which will be discussed below. It has since become accepted to apply the term to any means by which the distinct elements and their place in the logical structure of the text may be identified, in chirographic, typographic of electronic practice.2 Markup thus refers to all structuring devices used to identify the distinct elements and their place in the logical structure of the text. It variously covers: the explicit descriptions by means of a digital markup language in the case of a digital text; the explicit instructions towards a typographic representation of a text in the case of printer s copy; and the implicit typographic representation of a text s structure in the case of manuscript and print.3 The last of these, markup in manuscript and print, is of course most familiar. Examples include, as we have 1 {Ong, Orality and Literacy, p. 128.} 2 {Following a proposal by James H. Coombs, Allen H. Renear, and Steven J. DeRose, in an article entitled Markup Systems and the Future of Scholarly Text Processing (Communications of the ACM November 1987, pp ).} 3 {We have been concentrating here on the representation of a text s structure; in Chapter 5 will look at the use of markup languages to analyse the text s content.} seen, the use of white space (e.g. space between paragraphs and words, the space surrounding titles, etc.), but also punctuation, or the form, size or type of the letters themselves (e.g. bold lettering, italics, etc.), and so on. [Sidebar] Major categories of structuring devices: I. General - Illustrations - Ornaments (e.g., fleurons, rules, boxes, shading) - Running headers or footers - Folios (page numbers) - Special signs (e.g., paragraph signs; brackets) II. Type - Capitals vs lower case - Punctuation (originally designed to render aspects of speech--gumbert 1993, p. 11) - Typeface - Type size - Ornamented or dropped capitals - Justification (left, right, centred, fully justified) - Bold, italic, small caps, underlining. III. The arrangement of blank space for - Word spacing - Leading (interlinear spacing) - Margins (size and proportions of the type page compared to the page as a whole) - Indents - Columns - Tables - Letter spacing IV Colour (including black, white and grey) - Rubrication (mainly in MS and early printed book practice) - Background highlighting (e.g., tables, boxes, sidebars etc) = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = There appears at first sight to be no very good reason why these conventions should change in a digital environment. They are well established, and appear to work rather well for most communications purposes. Indeed, as we have seen in Chapter 2, computers have been harnessed to reproduce all of the forms of typographic markup mentioned so far and more to perfection. However, there are two main reasons to reconsider the serviceability of typographic appearance in a digital environment. There is, first, the problem of digital typographic instability. The way texts are displayed on the screen depends on a large number of variables, such as the user s hardware and software: the operating system, default settings for the OS and applications, installed typefaces, etc. Notoriously, there is the problem of the many different schemes of proprietary markup involving binary codes used by word processors and layout programs to encode typographical markup (such as italics, bold, new page, indents, columns etc.). Documents created in one program cannot usually be read by another, at least not without the aid of a conversion filter, as most people will have had ample occasion to lament. And even if the document can be read, its graphic representation may be different on different computers owing to varieties in personal preferences. As long as a text is created and displayed on a single machine, these variables are known, but in the case of computers in a network they become unpredictable. Proprietary markup codes differ from one program to another, and even from one version of the same program to another, so that sender and receiver need to have matching programs (even matching versions of programs) to write and read a file. This problem is exacerbated when the documents are in addition transferred from one software platform to another: from MS Windows to Macintosh, from Macintosh to Linux, from Linux to another Unix. In this chaos de facto standards spring up only to perish again, making electronic textual transmission a shaky affair, with further-reaching implications as the internet s grasp on human communication gets firmer. The second is a still stronger imperative to rethink markup in a digital environment: typographic markup is ill suited to the nature of the computer, ignoring its particular potential. As we have seen (in the previous chapter), computers have been successfully taught to deal with typographic appearance. This may be very clever, but it is not nearly clever enough. For a start, it is a one-way process, in which the computer creates the typographic appearance of a text, but cannot, conversely, interpret the typographic appearance of a text in order to understand and represent its structure. This one-way process offers satisfactory solutions so long as the typographic appearance of the text is geared towards eventual visual reproduction, i.e. on paper or on screen. However, computers are capable of processing texts in much more interesting ways, provided, that is, that they can be taught about the structure and contents of the texts they are asked to process. And there lies the difficulty. Due to the fact that computers are incapable of processing in a controlled fashion anything which has not been explicitly defined, we are forced, when dealing with them, to be very clear about our definitions and usage of text and mark-up. But in typographic markup there is no one-to-one relationship between form and function. This makes it extremely difficult, if not impossible, to formulate rules that are sufficiently clear for a computer. There are two major problems in the relationship between typographic form and function: redundancy and ambiguity. Redundancy Humans are quite used to a certain redundancy in the typographic signposting of structure. One structural function may be indicated by more than one typographic code. In Western usage, a new sentence, for example, is indicated by three means: a full stop; a white space; and a capital letter. In Caroline script, for example, there was less such redundancy. Word spaces were not consistently used, and a medial dot served variously as our modern full stop when it is followed by a capital letter, or as our comma when it is followed by a lower case letter.4 Similarly, a paragraph in English is usually indicated, in addition to the new sentence indicators, by the start of a new line and an indent. Increasingly international practice shows that a paragraph may also start on a new line following a line of blank space, dispensing with the indent. This practice may cause ambiguity when the blank space coincides with the end of a page. Again, there are many ways to indicate a long quotation, i.e. one not run on within the text in prose: it may be indented left and/or right; it might be set with less leading; it might be set in a smaller font, or any combination of these devices. Ambiguity Besides the redundancy of several typographical devices being employed to indicate one structural item, we also see the reverse. One typographical device may represent various structural items. We may use italics, for example, to indicate emphasis, words from a foreign language, a book title, structural 4 {Kendrick, , p. 126.} hierarchy (e.g., in headings) and so on. A space may be used variously to divide words, sentences or thousands (the thin space in ). A full stop may represent a decimal divider, a numerical divider (as in chapter numbering: 1. The economic view or 1.2 ), an end of sentence indicator, part of the mark indicating elision ( ), file name-extension divider (markup.html), etcetera. Coding and decoding Both redundancy and ambiguity result from the implicit nature of all of the examples of markup discussed so far. That is to say, the markup never states explicitly what it means; rather, we rely on unspoken conventions for the use we make of it. Partly there is of course simply no need in written and printed communication to be explicit. Human readers are in most cases capable of interpreting the form of the text (decoding the markup) correctly. Partly it is not feasible. Thus, an objective analysis of the relation between form and function is complicated by the subjective and implicit nature of mark-up in written or printed texts. Such an analysis would be complicated by numerous further factors. For example, if a text is very short, readers may not be able to decode the underlying matrix the system of typographic codes used in the text. And then, there does not necessarily have to be a link between typographic markup and structuring the text for the reader s convenience at all. Typographic markup may, for example, be closely intertwined with an esthetic purpose for its own sake. Or again, more subtle semiotic intentions may be present, i.e. for the general appearance of the text to carry a message about the text s nature, or the reader, owner or user s social position.5 Again, authors or designers may choose to deviate from or modify the existing typographical conventions and thwart the reader s expectations. Confusion may also be caused by the fact that typographical practices tend to vary according to the circumstances of place and time. In a trendy youth magazine the conventions will be different from those observed in a staid scholarly journal, not to mention national cultural differences, or indeed historical ones. Not only is the analysis of typographical encoding and decoding fraught with the difficulties presented by its object, but the terminology we have available to perform the analysis with is also defective: Despite a tradition of book design going back centuries, and despite the 5 {Cf, for example, J.P. Gumbert, The Typography of the Manuscript Book, p. 6.} efforts of many devoted critics, no one could claim that there is a consensus on how to describe a printed page in detail. Some critics speak of the bibliographic codes that form part of the publication of any work. But the term is, for now, still more a metaphor than a sober description. The characteristic of any code is that it is made up of a finite set of signs, which as Saussure teaches us are arbitrary linkages of signifier and signified. For artificial languages, the sets of signifiers and their meanings are given by the creator of the artificial language. For natural languages, dictionaries attempt to catalog the signifiers and their significance; grammars attempt to explain the rules for combining signs into utterances. We have nothing equivalent for the physical appearance of texts in books. Any serious attempt to record the bibliographic codes built into the book design and typography of a literary work must begin by specifying the set of signs to be distinguished. Is 24-point type different from 10-point type as a bibliographic code? In most circumstances, yes. Is 10-point type from one type foundry different from 10-point type in the same face, produced by a different foundry? In most circumstances, no. What about 10- and 11point type? Ten and 12? To specify a formal language for expressing significant differences of typographic treatment, we need to reach some agreement about what constitutes a significant difference what the minimal pairs are. (Michael Sperberg McQueen, Textual Criticism and the Text Encoding Initiative, p. 54) We lack, in other words, both an explicit, universal typographic code and a terminology which would allow us to to study and describe the meaning of such a typographical code in an objective way. This combination of factors makes conventional typographic markup unsuitable in an electronic environment: its implicit nature makes decoding unreliable. So if we still want computers to be able to perform more advanced text processing than simply reproducing the typographic form of a text we need to think of a different solution for encoding information about the logical structural of the text. This solution lies in the application of explicit descriptive markup. 2. Explicit Markup Markup through markup language To achieve the purpose of the interchange of texts between people and the hardware and software they use without communication breaking down, the concept of descriptive markup was invented. A markup language is a language that can describe explicitly any features that may be in danger of not being understood or misunderstood, both by computers and by human beings, such as in particular the logical or structural units which occur in the text. These explicit descriptions take the form of codes embedded in the text and clearly marked as codes.6 Incidentally, note that the terms descriptive and explicit are ambiguous. Neither term actually specifies what it is that is thus being qualified. What is being made explicit? What is being described? Though we use the terms primarily to refer to the structural function of textual elements, they could equally describe typographic features. And indeed, as we shall see in Chapter 5, Markup Continued, scholarly use of descriptive markup often involves describing typographic features. The history of explicit descriptive markup, like that of word processing and page layout programs, goes back to typesetting: [That g]eneric markup can help us to reintroduce the important separation between structure and appearance... was realized at the time of the confusion over specific markup with photo-typesetting systems. A movement was started to create a standard markup language, whi
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks