Subject: Re: misc/32221 (NetBSD's web documentation is not valid HTML)
To: None <,,>
From: None <>
List: netbsd-bugs
Date: 03/22/2006 15:47:22
Synopsis: NetBSD's web documentation is not valid HTML

State-Changed-From-To: open->analyzed
State-Changed-When: Wed, 22 Mar 2006 15:47:21 +0000
Working on this PR I found that DocBook and Website XSLT are very
likely produce valid HTML output. Analysing our HTML files I have found
that most pages are invalid because of following reasons:

1) "Stray" XML namespace declarations. Please note that that
   declarations are incorrectly influenced on some other tag
   construction, such as <br></br> (must be just <br>).
2) Absent DOCTYPE declarations.
3) Possible incorrect HTML schema.
4) Possible some other reasons.
5) CSS validity.

I hope I've fixed (1) and (2), and now most of XML files can be used as
source for valid HTML pages. For example, the NetBSD Guide is now valid
HTML 4.01 Transitional document.

Regarding (3) and (4) IMHO we should:

1) have a way to detect/verify validation for all HTML pages (some sort
   of "make htmllint"). Validator engine used by is available
   for downloads from their site, and I'm wondering is it packaged with
   pkgsrc, so we can include it into our toolchain.
2) because valid XML DocBook/Website documents would result in valid
   HTML, we should have a way to validate our XML pages (i.e. just as
   with HTML sort of "make xmllint"). Currently you may try validate
   your own XML files as follows:
   a) set {XML,SGML}_CATALOG_FILES to "$HTDOCS/share/xml/catalog-common.xml
      $HTDOCS/share/xml/catalog.xml $LOCALBASE/share/xml/catalog" (space
      separated list).
   b) use xmllint(1) by the following way:
      xmllint --noout --nonet --xinclude --catalogs --valid FILE_NAME

The second part is really broken because we're using Simplified Docbook
as backend for Website. It doesn't have <sect[1-6]> and many other
widely used things (don't know exactly, but I'm sure :-). Because all
this will enweight our toolchain and bind use even more to XML/Docbook,
we must talk with <hrs> about our website again.

The (5) can be very easy eliminated completely. All problems are shown here:

Responsibles for this errors are <grant> and <keihan>.