Subject: Re: XML config file
To: None <tech-userlevel@NetBSD.org>
From: Terry Moore <tmm@mcci.com>
List: tech-userlevel
Date: 07/05/2006 00:55:05
On Tue, 4 Jul 2006, Magnus Eriksson wrote:
 >   In that case, go with XML all the way.  Convert all config files in the
 > whole system to XML, have command line tools that manipulate XML the way we
 > now use grep, awk, etc to manipulate text, include an XML parser library in
 > the base system that anyone can use in their programs, include good
 > documentation on all the above, etc.

My experience has been that UNIX config files are textual 
representations of relational database tables.  As such, the UNIX 
text tools are very good at doing the normal kinds of database 
operations that one needs to do.

XML files by contrast, are typically textual representations of 
trees, not of n-place relations.  As such, the UNIX text tools are 
fairly weak at processing them directly for doing database-like operations.

I've used things like xml_grep to work around these limitations.  For 
me, they still don't work very well for this purpose.  They're big, 
slow, require perl, and they still want to produce XML wrappings 
which are NOT fine when one is trying to present information to 
people, or transform into C code.  I know about the various 
transformation tools like Saxon etc -- I use them when transformation 
work needs to be done repetitively in production.  Then the 
notational weight doesn't matter as much.

But a LOT of sysadmin, and a LOT of managing large projects, involves 
answering quick questions or performing quick transformations that 
are one-off (possibly replicated across a cloud of similar systems in 
a network).

The ability to apply massive regular transformations to the sysadmin 
files is one of the reasons I find Unix a lot easier to administer 
than Windows.  There are tools/methodologies based on xmlpath, e.g., 
various perl-ish things to do the same thing; but these are very 
heavy, and not suited for quick command line 
"calculus".  Furthermore, the lack of accepted standards for which 
tool to use for these kinds of jobs in XML, makes it seem to me that 
in fact doing this will lead to further fragmentation within the Unix world.

If one wants to do use the XML thing and yet not abandon one of the 
great strengths of Unix, one needs a tool to convert arbitrary text 
databases (with their schema -- different from the XML DTD) to and 
from the XML representation.

If one does that, I suggest that the arbitrary text database should 
actually be the normative form in most cases, and the XML 
transformation is then useful for people and or apps that need to 
deal with this.

None of this is to say that XML proplists are not fine for their 
purpose.  But I am arguing that they may not fit all needs, since 
proplists are representing trees of information, possibly embedded in 
regular table-like iterations.

I might even speculate that one of the reasons that Unix shell (or 
awk or ..) plus the line-oriented text tools are so efficient is that 
the combine an adequate procedural framework with a conceptually 
adequate relational database framework.  (One can argue about 
notation in any of the procedural languages.  Of course a flat text 
file is not an efficient database representation for huge 
databases.  I'm talking about notation as a tool of thought, not 
about the most efficient use of compute cycles.)

To the extent that XML gets in the way of thinking about the problem 
(because one has to deal with all the introduced notation, and one 
loses the tools one is used to), use of XML will not increase productivity.

To summarize my experience:  Tables still have their uses, and they 
are different than XML files.  Tables represent sequences of tuples, 
each with identical structure.  XML represents trees.  Flat text is a 
good way to represent tables, both as a notation and for performing 
quick transformations. Flat text is not a great way to represent trees.

This leads to the rules that I currently follow:

If a config file is representing a table, represent it as such.

If a config file is representing a tree, consider XML.

If a config file is a table of mostly identical tuples, some needing 
to contain trees, then consider a table, with embedded (one-line) XML 
[or references to XML stored separately] to represent the tree part; 
or use XML throughout, but then be prepared to build special 
extraction tools so one can do the table-like operations that are 
likely when the top-level data structure is fundamentally a table.

--Terry