Subject: Re: WWW query engine bug (was Query-PR)
To: None <leonard@dstc.edu.au>
From: None <Chris_G_Demetriou@NIAGARA.NECTAR.CS.CMU.EDU>
List: current-users
Date: 02/21/1996 03:02:11
> > But the question is: how can you tell 'intentional' html from
> > something that just looks like HTML?  (and, what impact does that have
> > on the software used to spit out PRs?)
> 
> Force people to insert <HTML>...</HTML> around their text if they
> dont want tags to be converted to &amp;, &lt; and &gt;.

You didn't answer the second question: what impact does that have?

i don't think that's workable, for several reasons:
	(1) PRs are sent as e-mail messages, and for the most part
	    look like e-mail messages.  How can you put that before
	    your headers, so it will do the right thing with HTML
	    in the headers?  (e.g. an X-Organization: header...)

	(2) the PR machinery appears to mangle some submissions
	    in ways that are not obvious to me, e.g. reordering
	    some headers, etc.  How are people supposed to set
	    things up so that they work right?

	(3) if the user does a 'long-range' <html>, perhaps one
	    which is never closed, how does the scanner deal with
	    that?  some of the PRs are gigantic, and i think it's
	    unreasonable to have to have it parse them completely
	    before it processes any of them.  I wanted to write it
	    as a filter, which more or less eliminates dealing with
	    this.

	(4) this still doesn't solve the problem!  the user can
	    _still_ supply bad html!  (It's for this reason that
	    I do basic sanity checks on the pr's...  however,
	    i cant catch things like hanging italicization, because
	    of (3)...


cgd