Subject: Re: xmlcatalog and bad xml/catalog
To: Julio M. Merino Vidal <jmmv@menta.net>
From: Jeremy C. Reed <reed@reedmedia.net>
List: tech-pkg
Date: 10/09/2003 12:58:17
On Thu, 9 Oct 2003, Julio M. Merino Vidal wrote:

> > Your quotemeta() function is for preparing the backslashes for use with
> > GNU gawk.
> >
> > Can these backslashes be removed and still work with gawk?
>
> Nope.  I just tried and gawk fails without them.
>
> Do you have any idea on how to fix this?  (or why they expose different
> behavior wrt quoted characters?)

I will look a little more. Anyways, I installed gawk. I built your package
with "bmake AWK=gawk" and the build said:

  gawk: -:2: warning: escape sequence `\/' treated as plain `/'

Then after installing it, I cleaned out the related line from the
share/xml/catalog, I reran:
$ /var/db/pkg/libglade2-2.0.1nb10/+INSTALL libglade2 POST-INSTALL
here -c /usr/share/xml/catalog add system http://glade.gnome.org/glade-2.0.dtd /usr/X11R6/share/xml/libglade/glade-2.0.dtd
gawk: -:2: warning: escape sequence `\/' treated as plain `/'
gawk: -:2: warning: escape sequence `\.' treated as plain `.'

(The "here" part was just me outputting the arguments to your command
first.)

These gawk warnings make me think that the escape sequence  (at least for
that part) is not really needed.

Do you get those warnings?

This is GNU Awk 3.1.3 as installed via pkgsrc (under Linux, but I don't
think that matters).

Looking at your awk ...

The $value is uses as a regular expression, so the escapes are good there.

But in the the other awk:

    gawk -f - $catalog > $catalog.tmp <<EOF
/<\/catalog>/ {
    print "  $entry"


I don't think the $entry needs everything to be escaped. (I got rid of
your metaquote entry and my share/xml/catalog became 0 bytes because og
gawk syntax errors.)

My awk pocket refernce tells me that \/ for literal slash is for regular
expressions -- so not needed for printing strings.

\\ literal backslash and \" literal doublequote (for strings) should be
enough.

So I believe the fix is to have two different quotemeta() functions: one
for regex and one for printing.

The following worked for me using both gawk and mawk.

-    quotemeta entry
+    quotemeta2 entry

...

+# for printing strings only
+quotemeta2() {
+    qm_var="$1"
+    eval qm_value=\"\$$qm_var\"
+
+    qm_char='-e s|\\|\\\\|g'
+    qm_char="$qm_char -e s/\\\"/\\\\\"/g"
+#    qm_char="$qm_char -e s/\?/\\\?/g"
+#    qm_char="$qm_char -e s/\*/\\\*/g"
+
+    qm_value="`echo $qm_value | sed $qm_char`"
+    eval $qm_var=\"\$qm_value\"
+}

It worked fine for the +INSTALL script and I tested like:

$ xmlcatmgr -c /usr/share/xml/catalog add system "http://\backslash\"\"" and\\this
entry is now <system systemId=\"http://\\backslash\"\"\" uri=\"and\\this\" />

And my catalog next contained:
  <system systemId="http://\backslash""" uri="and\this" />

I tested the about example with gawk and mawk.

So it appears the backslashes are properly quoted too (for the awk
expression)

Hope the idea above works for you.

   Jeremy C. Reed
   http://bsd.reedmedia.net/