Subject: Re: bin/7249
To: Mike Cheponis <mac@Wireless.Com>
From: None <dsr@mail.lns.cornell.edu>
List: tech-kern
Date: 07/18/2000 17:19:10
Mike Cheponis <mac@Wireless.Com> writes:
> Why html?  Because it's everywhere and it has the required
> propertities, and because it allows updates over the net, or indeed,
> the entire doc tree to be off your machine in an easy-to-access way.

I occasionally have to write a bit of documentation, and IMHO HTML
doesn't have the required properties for a documentation project the
size of the NetBSD man pages.  I'll address just a few points here:

1) One of the great strengths of the Unix man pages is the conventional
structure--NAME, SYNOPSIS, PARAMETERS, DESCRIPTION, RETURN VALUES, etc.
Seriously, I'm not kidding--consistency in technical documentation is
*highly* desirable, and the man pages have a structure that, for the
most part, works well.  If we're going to consider replacing them, the
replacement ought to directly support, even enforce, the conventional
structure.  HTML doesn't, and switching to HTML rather than tmac.man
would most likely encourage people to depart from the documentation
conventions.

2) HTML as it is today--a poorly structured, undisciplined presentation
language--is very difficult to extract information from.  Several
people have already pointed out that HTML doesn't print as nicely as
troff.  More to the point, it can be very hard to extract meaning from
HTML documents (one reason why search engines are so hard to do well).
The current man page conventions offer some implicit support for
representing and extracting meaning, via the structuring conventions,
the '-' convention used to construct the 'whatis' index, the
conventions used in the 'RELATED INFORMATION' section, etc.  Any
replacement ought to make this information explicit rather than
conventional.

3) HTML tomorrow won't be the same as it is today.  Any change in
format must take that into consideration, most plausibly by rewriting
the man pages into a format that captures the structure of the
documents, and which can then be automatically converted into
semantically poorer presentation formats like roff and HTML.

In short, I think any format conversion ought to take into account
long term maintenance, conversion into varying presentation formats,
and explicitly representing the semantic content of the document
structure.  HTML is horrible on all counts.  The LinuxDoc folks
realized all this, which is why they're using SGML rather than HTML.

If I were planning a commercial NetBSD (or had a horde of volunteers)
and wanted to be trendy, I'd be looking at converting the man pages
into a man page XML DTD that captures the conventional structure of
man pages, and which could then be automatically converted into HTML,
WAP, etc., or even back into man format.  However, that's a lot of
work, and there are probably more pressing needs than converting all
the man pages into XML.

The only lesser step that makes any sense to me would be better
support for automatic conversion of the existing man pages into HTML.
Switching to HTML as the native format looks like a big step
backwards.

wrt to info pages, info and .texinfo are ok for what they were
designed for--info pages--but info pages are not man pages, nor are
they an adequate replacement for man pages (and yes, I think it's a
shame that a lot of the GNU utilities do not have supported man
pages).
-- 
Dan Riley                                         dsr@mail.lns.cornell.edu
Wilson Lab, Cornell University      <URL:http://www.lns.cornell.edu/~dsr/>
    "History teaches us that days like this are best spent in bed"