Subject: Re: CVS commit: htdocs/guide/en
To: None <rpaulo@netbsd-pt.org>
From: Hiroki Sato <hrs@NetBSD.org>
List: www-changes
Date: 05/06/2005 02:28:45
----Security_Multipart(Fri_May__6_02_28_45_2005_869)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Rui Paulo <rpaulo@netbsd-pt.org> wrote
in <20050505123405.GA181@proton.fnop.net>:
rp> Regarding to <para> I thought the proper way was:
rp> <para>
rp> paragraph here
rp> </para>
rp>
rp> And that's how I indented chap-boot.xml, as you saw.
Okay, let me explain the normalization rule I applied.
I think this <para> template rule is not consistent as an indent/style
rule because elements which can directly contain #PCDATA (the text
which may consist of entity references and any characters that
are legal in the document character set) such as <para>foo</para>
and <title>foo</title> usually do not contain a newline character
unless the number of characters of the line exceeds 70 chars or so.
<para> should not be a special case because, if so, the other DocBook
elements also need some special, non-intuitive rule.
If you write characters only inside an element (in other words,
that element is one in the lowest level which includes raw characters only)
you should not put a newline character at the head/tail of the content,
and if you write elements only inside it (it is a "container" which
has the child level elements), they (child level elements) should be
in another line and indented. These are because when raw characters
are there (the DTD permits it) they can be in the formatted output.
Anyway, these rules are a bit controversial, but I think they are
the safest principle to avoid inconsistency. For example,
<programlisting> is likely to be written in a block style like this:
<programlisting>
foo
bar
</programlisting>
However, the correct markup is the following:
<programlisting> foo
bar</programlisting>
because the two newline characters are not ignored in this case.
If there is this sort of inconsistency, the line number will be
shifted in unexpected way when it is referred to.
Of course, <programlisting> is somewhat different from <para>,
and actually a newline character in <para> is interpret as a
whitespace, but the general principle of "Don't include content
that you would not like to see in the formatted output" holds there,
I think.
In short, mixed-content elements (any elements that can contain
elements and #PCDATA, such as <para>) should not include extra
newline or whitespace characters you do not expect to see in
the output. And element-content elements can include them safely but
they should be indented according to its level.
Does this explanation make sense? If you have any questions and/or
feel inconsistency, please let me know. Any suggestions are welcome.
rp> IMO, we should change that too and if possible provide some guidelines
rp> as your commit message says, what do you think ?
Hmm, is putting an XML markup guideline to htdocs/developers/htdocs.xml
better? I think I will try to write an article as a basis for
discussion.
--
| Hiroki SATO
----Security_Multipart(Fri_May__6_02_28_45_2005_869)--
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (FreeBSD)
iD8DBQBCelfNTyzT2CeTzy0RAin8AKDP/TK8DSt//KwYDbApmmCbwReEUACcCeJD
xLxJENPJN7ZH5G5FE4oI8Cs=
=sIVo
-----END PGP SIGNATURE-----
----Security_Multipart(Fri_May__6_02_28_45_2005_869)----