Subject: Re: WWW query engine bug (was Query-PR)
To: None <current-users@NetBSD.ORG>
From: der Mouse <mouse@Collatz.McRCIM.McGill.EDU>
List: current-users
Date: 02/21/1996 17:23:45
>>> 	(3) if the user does a 'long-range' <html>, perhaps one which
>>> 	    is never closed, how does the scanner deal with that?
>> I don't see why there's any need to.  Your scanner just has to keep
>> a bit saying whether it's inside an unclosed <HTML>...</HTML>, and
>> if it's not, just do mindless mapping of < to &lt;, etc.
> ...  How do you check if the <html> is closed, without parsing the
> entire file?

By being inside an unclosed <HTML>...</HTML> I mean whether it has seen
an <HTML> but not yet seen its matching </HTML>.  I don't see why you
would care about whether a matching </HTML> exists somewhere out there
in the as-yet-unread text, only whether you've seen it yet for the bits
you're gobbling now.  My unintentional <i> was never matched by a </i>,
but that didn't seem to bother anyone....

But in any case, doing this is not a fix.  The text in my PR could just
as well have been <HTML><i> instead of <i>.

Besides, it doesn't really matter.  Anything, anything at all, that
tries to be smart about the contents of the PR is going to lose
information sometimes.  For human browsing it generally doesn't matter,
but what I care about is mechanical retrieval.

					der Mouse

			    mouse@collatz.mcrcim.mcgill.edu