Subject: Re: CVS commit: htdocs/guide/en
To: None <rpaulo@netbsd-pt.org>
From: Hiroki Sato <hrs@NetBSD.org>
List: www-changes
Date: 05/06/2005 02:28:45
----Security_Multipart(Fri_May__6_02_28_45_2005_869)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Rui Paulo <rpaulo@netbsd-pt.org> wrote
  in <20050505123405.GA181@proton.fnop.net>:

rp> Regarding to <para> I thought the proper way was:
rp>   <para>
rp>     paragraph here
rp>   </para>
rp> 
rp> And that's how I indented chap-boot.xml, as you saw.

 Okay, let me explain the normalization rule I applied.
 I think this <para> template rule is not consistent as an indent/style
 rule because elements which can directly contain #PCDATA (the text
 which may consist of entity references and any characters that
 are legal in the document character set) such as <para>foo</para>
 and <title>foo</title> usually do not contain a newline character
 unless the number of characters of the line exceeds 70 chars or so.

 <para> should not be a special case because, if so, the other DocBook
 elements also need some special, non-intuitive rule.
 If you write characters only inside an element (in other words,
 that element is one in the lowest level which includes raw characters only)
 you should not put a newline character at the head/tail of the content,
 and if you write elements only inside it (it is a "container" which
 has the child level elements), they (child level elements) should be
 in another line and indented.  These are because when raw characters
 are there (the DTD permits it) they can be in the formatted output.

 Anyway, these rules are a bit controversial, but I think they are
 the safest principle to avoid inconsistency.  For example,
 <programlisting> is likely to be written in a block style like this:

 <programlisting>
   foo
   bar
 </programlisting>

 However, the correct markup is the following:

 <programlisting>  foo
   bar</programlisting>

 because the two newline characters are not ignored in this case.
 If there is this sort of inconsistency, the line number will be
 shifted in unexpected way when it is referred to.

 Of course, <programlisting> is somewhat different from <para>,
 and actually a newline character in <para> is interpret as a
 whitespace, but the general principle of "Don't include content
 that you would not like to see in the formatted output" holds there,
 I think.

 In short, mixed-content elements (any elements that can contain
 elements and #PCDATA, such as <para>) should not include extra
 newline or whitespace characters you do not expect to see in
 the output.  And element-content elements can include them safely but
 they should be indented according to its level.

 Does this explanation make sense?  If you have any questions and/or
 feel inconsistency, please let me know.  Any suggestions are welcome.

rp> IMO, we should change that too and if possible provide some guidelines
rp> as your commit message says, what do you think ?

 Hmm, is putting an XML markup guideline to htdocs/developers/htdocs.xml
 better?  I think I will try to write an article as a basis for
 discussion.

--
| Hiroki SATO

----Security_Multipart(Fri_May__6_02_28_45_2005_869)--
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (FreeBSD)

iD8DBQBCelfNTyzT2CeTzy0RAin8AKDP/TK8DSt//KwYDbApmmCbwReEUACcCeJD
xLxJENPJN7ZH5G5FE4oI8Cs=
=sIVo
-----END PGP SIGNATURE-----

----Security_Multipart(Fri_May__6_02_28_45_2005_869)----