Have you ever looked at the source code of these pages and wondered which content management system would generate such clean XHTML, and why anyone would use it? Here is the answer.
Even though these pages are relatively simple, writing them directly in HTML has a number of substantial drawbacks.
Using HTML as a structural markup language has significant costs. It is possible to separate presentation and content using CSS, but only by sacrificing readability of the source code. For example, instead of
<COMMAND>/sbin/reboot</COMMAND>
you have to write the following:
<SPAN CLASS="command">/sbin/reboot</SPAN>
As a result, your source code will be cluttered with SPAN
(or DIV
) tags.
You have to validate your documents against custom DTDs to catch certain types of interoperability problems (for example, the infamous nested tables misfeature in old but still widely used Netscape versions).
You have to repeat the same information in multiple places, risking inconsistencies. For example, the headline of a page should be shown in the browser window and in its title bar, so it has to be entered in, say, a H1
tag and a TITLE
tag. When adding cross-references, you have to enter document paths, and in most cases, the document titles. Renaming documents becomes extremely painful as a result.
If you include images, you must keep in mind to provide the correct WIDTH
and HEIGHT
attributes, to speed up loading. You must not forget to update these if you change the sizes, or substantially degraded image quality will be the result.
Some of these problems can be solved by investing work in HTML parsing and validation tools, and thus, XML processing. But tackling the fundamental problems requires a different source representation. HTML is not a convenient language for authoring documents.
Input files are validated by xmllint
(part of libxml2), against a custom DTD (modeled after the ideas in Notes on XML DTD Design. After that, a special-purpose Perl script (which is implemented using the XML::DOM Perl module, see XML Processing with DOM and Perl) reads all documents, processes them, and writes all the XHTML documents which are part of the website. To catch errors in the processor, the generated files are validated using xmllint
, this time against the official XHTML 1.0 Strict DTD.
The internal hyperlinks of this website are automatically generated. The author assigns to each page (and each file which is part of the website) is assigned a unique name, and other pages can reference it by that name. By a special declaration in the document header, the author can add a document to a list. Such lists can be included in other documents, and are automatically updated when the HTML page is generated, of course.
At the moment, results of the conversion are not cached, so all the website has to be regenerated after each change. This needs optimization for larger websites.
Since April 2007, version control for the document source files and the programs is provided by Git. Until Feburary 2004, GNU arch was used, and Subversion after that.
Currently, there is only one development branch. After some change has been made and previewed, the rendered documents are transmitted to the public web server using rsync. However, it would be straightforward to establish multiple branches with parallel development (if there was anybody else working on these pages).
Have you ever looked at the source code of these pages and wondered which content management system would generate such clean XHTML, and why anyone would use it? Here is the answer.
This document describes some of the design decisions underlying the XML processing tools for this web site, and their influence on DTD design.
XML Processing with DOM and Perl
Using hashes of (mostly) anonymous subroutines, it is surprisingly easy to write XML translators with the Perl DOM interface.
2003-07-31: published
2011-10-09: Note that Git has been used as a version control system for a while