Subject: Re: CVS commit: htdocs/guide/en
To: None <>
From: Hiroki Sato <>
List: www-changes
Date: 05/06/2005 02:28:45
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Rui Paulo <> wrote
  in <>:

rp> Regarding to <para> I thought the proper way was:
rp>   <para>
rp>     paragraph here
rp>   </para>
rp> And that's how I indented chap-boot.xml, as you saw.

 Okay, let me explain the normalization rule I applied.
 I think this <para> template rule is not consistent as an indent/style
 rule because elements which can directly contain #PCDATA (the text
 which may consist of entity references and any characters that
 are legal in the document character set) such as <para>foo</para>
 and <title>foo</title> usually do not contain a newline character
 unless the number of characters of the line exceeds 70 chars or so.

 <para> should not be a special case because, if so, the other DocBook
 elements also need some special, non-intuitive rule.
 If you write characters only inside an element (in other words,
 that element is one in the lowest level which includes raw characters only)
 you should not put a newline character at the head/tail of the content,
 and if you write elements only inside it (it is a "container" which
 has the child level elements), they (child level elements) should be
 in another line and indented.  These are because when raw characters
 are there (the DTD permits it) they can be in the formatted output.

 Anyway, these rules are a bit controversial, but I think they are
 the safest principle to avoid inconsistency.  For example,
 <programlisting> is likely to be written in a block style like this:


 However, the correct markup is the following:

 <programlisting>  foo

 because the two newline characters are not ignored in this case.
 If there is this sort of inconsistency, the line number will be
 shifted in unexpected way when it is referred to.

 Of course, <programlisting> is somewhat different from <para>,
 and actually a newline character in <para> is interpret as a
 whitespace, but the general principle of "Don't include content
 that you would not like to see in the formatted output" holds there,
 I think.

 In short, mixed-content elements (any elements that can contain
 elements and #PCDATA, such as <para>) should not include extra
 newline or whitespace characters you do not expect to see in
 the output.  And element-content elements can include them safely but
 they should be indented according to its level.

 Does this explanation make sense?  If you have any questions and/or
 feel inconsistency, please let me know.  Any suggestions are welcome.

rp> IMO, we should change that too and if possible provide some guidelines
rp> as your commit message says, what do you think ?

 Hmm, is putting an XML markup guideline to htdocs/developers/htdocs.xml
 better?  I think I will try to write an article as a basis for

| Hiroki SATO

Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

Version: GnuPG v1.4.0 (FreeBSD)