Subject: Re: Recent fs instability
To: Chris G. Demetriou <cgd@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: tech-kern
Date: 10/27/1999 09:52:35
On Mon, Oct 25, 1999 at 06:03:39PM -0700, Chris G. Demetriou wrote:
> Manuel Bouyer <bouyer@antioche.lip6.fr> writes:
> > For record, I've got problems with my ncr board recently. Hardware was stable
> > before and I did'nt change anything. The problem is an 'assertion failed'
> > in the ncr driver, which turn out to be a null pointer. I didn't look
> > closely yet but it seems that this null pointer can't be caused by hardware
> > problem.
> 
> what line was it from?

Soory for the delay, it was at home.
It's line 6731 in ncr.c * (seems to be 6733 now):
        default:
                /*
                **      lookup the ccb   
                */
                dsa = INL (nc_dsa);
                cp = np->ccb;
                while (cp && (CCB_PHYS (cp, phys) != dsa))
                        cp = cp->link_ccb;

                assert (cp);    
                if (!cp)
                        goto out;
                assert (cp == np->ncb_dma->header.cp);
                if (cp != np->ncb_dma->header.cp)
                        goto out;
        }

This seems to happen for multiple tranfers only: I can dd from the raw device
or work a bit on a mounted part  without much troubles. But once there are
some dirty buffer, a 'sync' will reliably trigger the problem.

The odd part is that this failed tranfer is not noticed by the upper level:
If I keep working on this fs after this message I'm sure to get a panic from
the filesystem after a few minutes at best, and fsck shows duplicate blocks.

If I unmount the fs just after the 'assertion failed' message fsck finds
in the allocated inode/block maps.


--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
--