Subject: Re: squid now reveals a new kernel problem.
To: Charles M. Hannum <root@ihack.net>
From: Darren Reed <darrenr@reed.wattle.id.au>
List: tech-kern
Date: 10/29/1999 09:53:59
In some email I received from Charles M. Hannum, sie wrote:
> 
> 
> woods@most.weird.com (Greg A. Woods) writes:
> 
> > [ On , October 28, 1999 at 16:44:16 (-0400), Charles M. Hannum wrote: ]
> > > Subject: Re: squid now reveals a new kernel problem.
> > >
> > > I don't see what's surprising about this.  fdalloc() put the struct
> > > file (`fp') into the process's file descriptor table, but your ffree()
> > > call did not remove it.  Any future operation on that file descriptor
> > > -- including cleanup during process exit -- will thus walk a stale
> > > pointer.
> > > 
> > > You should have used fdrelease() instead.
> > 
> > There's no call to fdalloc() -- only falloc().
> 
> That was a typo.  The point stands; you *must* remove the pointer from
> the file descriptor table, which you're not doing.
> 
> > Indeed my idea to mimic the other routines and use ffree() survived
> > testing twice as long in a relatively heavy production environment
> > without even exhibiting any of the previous symptoms of the problem I
> > was trying to solve in the first place, and only failed in a new
> > scenario.
> 
> Note that sys_socket(), sys_socketpair(), sys_pipe(), sys_open() and
> sys_fhopen() -- which are all the routines that use ffree() -- do this
> explicitly.
> 
> In short, both your change and Darren's are clearly
> in{sufficient,correct}.

Perhaps you'd like to add some man pages to section 9 which document
the required behaviour here ?  I notice that fdalloc(9), ffree(9), etc,
all seem to be missing.  Knowing which one(s) needs to be called when
clearly seems to be the problem here.  I must apologize for not knowing
the kernel backwards and not having read all the relevant source.

btw, whilst our changes are clearly (to you) insufficient/incorrect,
they did survive testing (to some extent) by both of us.  Given that
I discussed this problem before patching (I'm not sure if I posted
the patch to tech-kern), the correct calling of routines can't be
that obvious if you didn't pickup the problem then and neither did
anyone else.  (Had I the time right now, I'd verify what I actually
sent to tech-kern regarding this matter.)  Personally, I'd like to
thnk Greg for testing it in a scenario where correct operation is
required, leading to the discovery of the problem.

Darren