Subject: Re: Which HTML standard to follow?
To: None <netbsd-docs@NetBSD.org>
From: Hiroki Sato <hrs@NetBSD.org>
List: netbsd-docs
Date: 07/19/2005 11:05:56
----Security_Multipart0(Tue_Jul_19_11_05_56_2005_412)--
Content-Type: Multipart/Mixed;
 boundary="--Next_Part(Tue_Jul_19_11_05_56_2005_445)--"
Content-Transfer-Encoding: 7bit

----Next_Part(Tue_Jul_19_11_05_56_2005_445)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Klaus Heinz <k.heinz.jul.fuenf@onlinehome.de> wrote
  in <20050718215705.GA457@silence.homedns.org>:

k.> as PR admin/30763 points out (I had meant to start validating again, but
k.> did not follow up :-/) we have a bad mix of DOCTYPE declaration and
k.> actual content in our pages at the moment.
k.> 
k.> My opinion regarding the DOCTYPE is, that we should try to stay with
k.> "HTML 4.01 Transitional" and not go the XHTML 1.0 route yet.
k.> 
k.> The only way I would reconsider this would be if it turns out too much
k.> hassle to keep HTML 4.01 and become standards compliant, ie if our
k.> tools prevent us from producing valid HTML 4.01.

 I have no objection to stay with HTML 4.01.

 BTW, I think using a postprocessor is a practical way to generate
 valid HTML.  XSLT does not guarantee such conformance; it guarantees
 XML standard only.

 Some years ago I added a postprocessing rule with tidy to the Guide
 for this purpose, but someone disabled it probably because it broke
 some entity references and document in some multibyte encodings.
 There is a patch for tidy to solve these problems (the original
 author did not accept it, though).  If we can have this in pkgsrc,
 it should work in various languages.

 The necessary patches for valid HTML 4.01 are attached.  Comments?

-- 
| Hiroki SATO

----Next_Part(Tue_Jul_19_11_05_56_2005_445)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="tidy.diff"

Index: Makefile
===================================================================
RCS file: /cvsroot/pkgsrc/www/tidy/Makefile,v
retrieving revision 1.19
diff -d -u -I\$FreeBSD:.*\$ -I\$NetBSD:.*\$ -I\$OpenBSD:.*\$ -I\$DragonFly:.*\$ -I\$Id:.*\$ -I\$hrs:.*\$ -r1.19 Makefile
--- Makefile	22 May 2005 20:08:46 -0000	1.19
+++ Makefile	18 Jul 2005 23:08:13 -0000
@@ -3,7 +3,7 @@
 
 DISTNAME=	tidy_src_040811
 PKGNAME=	tidy-20040811
-PKGREVISION=	1
+PKGREVISION=	2
 CATEGORIES=	www
 MASTER_SITES=	http://tidy.sourceforge.net/src/ \
 		http://tidy.sourceforge.net/docs/ \
Index: distinfo
===================================================================
RCS file: /cvsroot/pkgsrc/www/tidy/distinfo,v
retrieving revision 1.9
diff -d -u -I\$FreeBSD:.*\$ -I\$NetBSD:.*\$ -I\$OpenBSD:.*\$ -I\$DragonFly:.*\$ -I\$Id:.*\$ -I\$hrs:.*\$ -r1.9 distinfo
--- distinfo	24 Feb 2005 14:08:39 -0000	1.9
+++ distinfo	10 Apr 2005 18:02:18 -0000
@@ -8,3 +8,4 @@
 Size (tidy_docs_040810.tgz) = 153044 bytes
 SHA1 (patch-ab) = 6d44dbc78bed849997108fe6bb5ff41088f653a3
 SHA1 (patch-ac) = 93e397595652a697bb705444f2923453818d1dde
+SHA1 (patch-ad) = 2ed61eca11628280d0353345d9683688edeb2878
Index: patches/patch-ad
===================================================================
RCS file: patches/patch-ad
diff -N patches/patch-ad
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ patches/patch-ad	10 Apr 2005 18:02:17 -0000
@@ -0,0 +1,114 @@
+diff -duN -x CVS -ruN ../tidy.orig/console/tidy.c ./console/tidy.c
+--- ../tidy.orig/console/tidy.c	2005-04-11 02:14:56.000000000 +0900
++++ ./console/tidy.c	2005-04-11 02:53:13.000000000 +0900
+@@ -58,6 +58,7 @@
+     printf( "  -clean   or -c    replace FONT, NOBR and CENTER tags by CSS\n");
+     printf( "  -bare    or -b    strip out smart quotes and em dashes, etc.\n");
+     printf( "  -numeric or -n    output numeric rather than named entities\n");
++    printf( "  -preserve         preserve entities from source file\n");
+     printf( "  -errors  or -e    only show errors\n");
+     printf( "  -quiet   or -q    suppress nonessential output\n");
+     printf( "  -omit             omit optional end tags\n");
+@@ -491,6 +492,9 @@
+             else if ( strcasecmp(arg, "numeric") == 0 )
+                 tidyOptSetBool( tdoc, TidyNumEntities, yes );
+ 
++            else if ( strcasecmp(arg, "preserve") == 0 )
++                tidyOptSetBool( tdoc, TidyPreserveEntities, yes );
++
+             else if ( strcasecmp(arg, "modify") == 0 ||
+                       strcasecmp(arg, "change") == 0 ||  /* obsolete */
+                       strcasecmp(arg, "update") == 0 )   /* obsolete */
+@@ -555,6 +559,15 @@
+                 }
+             }
+ #endif
++            else if ( strcasecmp(arg, "doctype") == 0 )
++            {
++                if ( argc >= 3 )
++                {
++                    tidyOptSetValue( tdoc, TidyDoctype, argv[2] );
++                    --argc;
++                    ++argv;
++                }
++            }
+ 
+             else if ( strcasecmp(arg, "output") == 0 ||
+                       strcasecmp(arg, "-output-file") == 0 ||
+diff -duN -x CVS -ruN ../tidy.orig/htmldoc/man_page.txt ./htmldoc/man_page.txt
+--- ../tidy.orig/htmldoc/man_page.txt	2005-04-11 02:15:01.000000000 +0900
++++ ./htmldoc/man_page.txt	2005-04-11 03:02:30.000000000 +0900
+@@ -43,6 +43,13 @@
+ .B -numeric or -n   
+ to output numeric rather than named entities
+ .TP 15
++.B -preserve
++to preserve source file entities as is
++.TP 15
++.B -doctype <type>
++to specify doctype declaration (\<type\> = auto, omit, strict,
++loose, transitional, user specified fpi)
++.TP 15
+ .B -errors  or -e   
+ to only show errors
+ .TP 15
+diff -duN -x CVS -ruN ../tidy.orig/include/tidyenum.h ./include/tidyenum.h
+--- ../tidy.orig/include/tidyenum.h	2005-04-11 02:14:56.000000000 +0900
++++ ./include/tidyenum.h	2005-04-11 02:24:19.000000000 +0900
+@@ -136,6 +136,7 @@
+   TidyBurstSlides,     /**< Create slides on each h2 element */
+ 
+   TidyNumEntities,     /**< Use numeric entities */
++  TidyPreserveEntities,/**< Do not parse entities */
+   TidyQuoteMarks,      /**< Output " marks as &quot; */
+   TidyQuoteNbsp,       /**< Output non-breaking space as entity */
+   TidyQuoteAmpersand,  /**< Output naked ampersand as &amp; */
+diff -duN -x CVS -ruN ../tidy.orig/src/config.c ./src/config.c
+--- ../tidy.orig/src/config.c	2005-04-11 02:14:57.000000000 +0900
++++ ./src/config.c	2005-04-11 02:36:05.000000000 +0900
+@@ -203,6 +203,7 @@
+   { TidyBurstSlides,             PP, "split",                       BL, no,              ParseBool,         boolPicks       },
+ 
+   { TidyNumEntities,             MU, "numeric-entities",            BL, no,              ParseBool,         boolPicks       },
++  { TidyPreserveEntities,        MU, "preserve",                    BL, no,              ParseBool,         boolPicks       },
+   { TidyQuoteMarks,              MU, "quote-marks",                 BL, no,              ParseBool,         boolPicks       },
+   { TidyQuoteNbsp,               MU, "quote-nbsp",                  BL, yes,             ParseBool,         boolPicks       },
+   { TidyQuoteAmpersand,          MU, "quote-ampersand",             BL, yes,             ParseBool,         boolPicks       },
+@@ -909,6 +910,13 @@
+         SetOptionBool( doc, TidyQuoteAmpersand, yes );
+         SetOptionBool( doc, TidyHideEndTags, no );
+     }
++
++    /* Avoid &amp;copy; in preserve-entities case */
++    if ( cfgBool(doc, TidyPreserveEntities) )
++    {
++        SetOptionBool( doc, TidyQuoteAmpersand, no );
++    }
++
+ }
+ 
+ /* unsigned integers */
+diff -duN -x CVS -ruN ../tidy.orig/src/lexer.c ./src/lexer.c
+--- ../tidy.orig/src/lexer.c	2005-04-11 02:14:57.000000000 +0900
++++ ./src/lexer.c	2005-04-11 02:14:27.000000000 +0900
+@@ -2004,8 +2004,10 @@
+ 
+                     continue;
+                 }
+-                else if (c == '&' && mode != IgnoreMarkup)
++                else if (c == '&' && mode != IgnoreMarkup
++				  && !cfgBool(doc, TidyPreserveEntities) ) {
+                     ParseEntity( doc, mode );
++		}
+ 
+                 /* this is needed to avoid trimming trailing whitespace */
+                 if (mode == IgnoreWhitespace)
+@@ -2242,7 +2244,7 @@
+                 }
+ 
+                 /* fix for bug 762102 */
+-                if (c == '&')
++                if (c == '&') /* XXX: need preserve support? */
+                 {
+                     UngetChar(c, doc->docIn);
+                     --(lexer->lexsize);

----Next_Part(Tue_Jul_19_11_05_56_2005_445)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="htdocs.diff"

Index: share/mk/web.site.mk
===================================================================
RCS file: /cvsroot/htdocs/share/mk/web.site.mk,v
retrieving revision 1.39
diff -d -u -I\$FreeBSD:.*\$ -I\$NetBSD:.*\$ -I\$OpenBSD:.*\$ -I\$DragonFly:.*\$ -I\$Id:.*\$ -I\$hrs:.*\$ -r1.39 web.site.mk
--- share/mk/web.site.mk	16 Jul 2005 19:48:24 -0000	1.39
+++ share/mk/web.site.mk	19 Jul 2005 01:44:09 -0000
@@ -54,6 +54,8 @@
 HTML2TXTOPTS?=	-force_html -dump -nolist ${HTML2TXTFLAGS}
 ISPELL?=	ispell
 ISPELLOPTS?=	-l -p ${WEB_PREFIX}/en/share/dict/words ${ISPELLFLAGS}
+TIDY?=		${PREFIX}/bin/tidy
+TIDYOPTS?=	-wrap 120 -raw -m -preserve -f /dev/null ${TIDYFLAGS}
 
 # Handle old .LIST files.
 .if exists(${WEB_PREFIX}/${DOCLANG}/list2html.pl)
@@ -189,9 +191,9 @@
 		-o ${.TARGET} ${PARAMS.${_ID}} \
 		${XSLT.${_ID}} ${XML.${_ID}} )
 .endif
-#. if !defined(NO_TIDY)
-#	-${TIDY} ${TIDYOPTS} ${.TARGET}
-#. endif
+. if !defined(NO_TIDY)
+	-${TIDY} ${TIDYOPTS} ${.TARGET}
+. endif
 .endfor
 .endfor
 
@@ -252,6 +254,9 @@
 .xml.html: ${XMLDEPS}
 	@${ECHO} "[xsltproc] ${.IMPSRC} -> ${.TARGET}"
 	@(ulimit -d 800000 && ${XSLTPROC} ${XSLTPROCOPTS} --stringparam autolayout-file ${AUTOLAYOUTFILE} -o ${.TARGET} ${XSLFILE} ${.IMPSRC})
+. if !defined(NO_TIDY)
+	-${TIDY} ${TIDYOPTS} ${.TARGET}
+. endif
 
 .html.txt:
 	@${ECHO} "[html2txt] ${.IMPSRC} -> ${.TARGET}"
Index: share/xsl/netbsd-webpage-en.xsl
===================================================================
RCS file: /cvsroot/htdocs/share/xsl/netbsd-webpage-en.xsl,v
retrieving revision 1.3
diff -d -u -I\$FreeBSD:.*\$ -I\$NetBSD:.*\$ -I\$OpenBSD:.*\$ -I\$DragonFly:.*\$ -I\$Id:.*\$ -I\$hrs:.*\$ -r1.3 netbsd-webpage-en.xsl
--- share/xsl/netbsd-webpage-en.xsl	14 Mar 2004 20:36:17 -0000	1.3
+++ share/xsl/netbsd-webpage-en.xsl	19 Jul 2005 01:40:57 -0000
@@ -4,13 +4,12 @@
 
 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 		xmlns:html="http://www.w3.org/1999/xhtml"
+		exclude-result-prefixes="html"
 		version="1.0">
 
   <xsl:param name="locale.backto">Back to </xsl:param>
 
-  <xsl:output method="html" encoding="ISO-8859-1"
-	indent="yes"
-	doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"/>
+  <xsl:output encoding="ISO-8859-1" />
 
   <xsl:include href="netbsd-webpage.xsl" />
 

----Next_Part(Tue_Jul_19_11_05_56_2005_445)----

----Security_Multipart0(Tue_Jul_19_11_05_56_2005_412)--
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (FreeBSD)

iD8DBQBC3GAETyzT2CeTzy0RAkj7AKDbPHsku1NmLp2yo+dCOn0tmVl95ACfWIHT
piCKC7hKn7y+/VwNPN8ZvOE=
=hcb+
-----END PGP SIGNATURE-----

----Security_Multipart0(Tue_Jul_19_11_05_56_2005_412)----