Text Processing with HTML

In the article <x7iuo2dm9h.fsf@rocza.krall.org>, Douglas G. Henke asks the following questions:

Do you know what the difference is between layout and markup? Do you know that abusing a markup language to specify what font or color is used to render something is a grave conceptual error?

Do you know why it's foolish to use page layout when you don't know anything at all about the output device, including whether or not it resembles a page or has anything like physical layout?

Can you explain the relationship between contrast and luminance? If not, then what makes you think you have any business setting the default color of ANYTHING?

Do you have any excuse at all for not using 7-bit flat ASCII?

His tone is a little harsh, yet effective and inspiring. These four questions bring out some of the most important aspects of the Hyper-Text Markup Language (HTML). Please keep them in mind while you are reading this document. Before turning back to HTML;however, I will discuss the differences between word processing and text processing.

Word Processing versus Text Processing

There are many word processors out in the computing world. There are simple ones like the Apple Works word processor which I had on my old Apple IIc. There are slightly more powerful ones like Word Perfect5.x for DOS. Now there are the very graphical ones like Word Perfect, Claris Works, AmiPro, and more for the Macintosh and the Macintosh clone windowing system which Intel based PC computer users often use. You can even find word processors on many UNIX style OS's now; most notably(and commercially) Word Perfect 6.0/7.0 for the X Window System. Personally I like LyX, a word processing front end to LaTeX for UNIX style OS's under the X Window System; although by being a front end to a text processor it retains many features of a text processing language.

There are also many text processing languages out there as well: TeX and all of its derivatives, The {n,t,g}roff clan, Scribe is yet another one --kind of an oldie but goodie, The various markup languages based on SGML --like HTML-- are a form of text processing. The content oriented idea of text processing has been around for a long time and has tended to thrive mostly on centralized main frames and at publishing firms for books and magazines.

Back in the day, when computer life was text based and the command line still ruled the world, a printer could do a nicer looking job of typesetting a document than the console could do. And different printers had very different capabilities --this is still true today. The problem was to create a nice looking, presentable document which could be entered in a command line world. And thus were born typesetting languages. Today most terminals are graphical, and to the point where I have a friend who will not use Linux because he can not get anything better than 800x600 resolution on his video card usgin XFree86. So, for most users who only use computers because they have to, the graphical word processors may be better than the text processing languages. Do not count the text processing languages out though, they have many advantages for those willing to learn them.

There is something beautiful about a text processing language. It adds an un-matchable structure to your document. The reader may not know, but they do not have to. You can pick your own interface to your document writing. The choice of a text editor is the second most important choice in the usage of a computer(second only to the choice of a shell). Hopefully the author found vile as their text editor, but your garden variety vi or even joe will do. After this choice is made then the author can write the document. They can write it how they want to, in the structure which is conducive to themselves, not conducive to some micro$oft technician sitting behind a computer all day pretending to contribute to the computing community. The format is theirs and theirs alone. Once the document is written, the author can compile their code. Proper footnotes and line spacing are not a concern, let the compiler take care of these things. If the code is written well then the author does not worry about which compiler(or interpreter) works on their document. It will look fine. It may be formatted for a laser printer, or a dumb terminal; these things are not the concern of the author if the document is written well. Because of this, the author can put more effort into the content of a document, and worry less about the appearance. And, as an added extra bonus, text processing is fun.

The story is very different for those who word processors. On screen appearance is very important. The platform the document will be formatted for is of great importance as well. Many times the document will contain the name of the printer it is formatted for in some of the large commercial word processors. Changing the device can sometimes produce very bad results. If you wanted to use a document from an old word processor you would have to spend hours reformatting it to take advantage of a modern printers enhancements. An old, well written troff file could be compiled for modern postscript printers and look great with no modifications, taking advantage of new features in the printer without any effort from the author.

The prime example of why not to use a word processor is in the writing of resumes. I was a site consultant at a fairly large university and many people came through with questions about how to write their resumes. I helped them, gave them the answers they wanted to here that is, and told them the hack to get around this or that bug in the word processor which caused the screen to look different than the print out. These word processor people spend hours changing fonts, margins, and line spacing. When they are done they have a picture perfect document with no real content.

William Totten (totten@pobox.com)

Copyleft: (C) 1996 1997, William Totten