Subject: Re: i386 1.4Q hangs nonrandomly?
To: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
From: Ethan Solomita <ethan@geocast.com>
List: current-users
Date: 01/27/2000 10:49:49
	Yes, that's the fix. The three buffers that are continuously cycling
all have unresolved "soft updates" dependencies on them, so none of them
can be truly freed. When they're done going to disk, soft updates
immediately bdirty()'s them, which sends them back to the disk, and this
bug allows a cycle.
	-- Ethan

Juergen Hannken-Illjes wrote:
> 
> I'm working on this problem for the last two weeks (PR kern/9197). Some buffers
> are continously cycling through the b_actf queue. All have the same cylinder,
> only the blknum varies. Because of a problem in the (old, pre-B_ORDERED)
> sys/kern/subr_disk.c they go to the first part of the list and block the
> remaining buffers. This leads to empty buffer freelists and a total lockup.
> 
> sys/kern/subr_disk::disksort
> 
>         /*
>          * If we lie after the first (currently active) request, then we
>          * must locate the second request list and add ourselves to it.
>          */
>         bq = ap->b_actf;
>         if (bp->b_cylinder < bq->b_cylinder) {
> 
> must read (to be exact):
> 
>         bq = ap->b_actf;
>         if (bp->b_cylinder < bq->b_cylinder ||
>            (bp->b_cylinder == bq->b_cylinder && bp->b_blkno < bq->b_blkno)) {
> 
> or (more lazy, les comparisions):
> 
>         bq = ap->b_actf;
>         if (bp->b_cylinder <= bq->b_cylinder) {
> 
> Is this a correct fix? Comments?
> 
> --
> Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
> 
> > maximum entropy wrote:
> > >
> > > The system happily chugged away for about 3 hours like this, then
> > > locked up solid.
> > >
> > > I'm totally out of ideas now...
> > >
> >       I haven't been following all of this conversation, but there's
> > something I'm working on which is probably worth mentioning. There is a
> > soft updates "livelock" under heavy use for which I'll be submitting a
> > fix soon. The main symptom is that the disk light will stay on, since
> > the disk is being continuously written to, yet no forward progress is
> > being made and the livelock never ends.
> >
> >       I realize that isn't the explanation for all of this, but it seems like
> > this discussion has encompassed more than one bug, and I expect that I'm
> > not the only one who has suffered with the livelock bug.
> >       -- Ethan