Subject: Re: LFS related crash
To: None <current-users@NetBSD.org>
From: Paul Ripke <stix@stix.id.au>
List: current-users
Date: 12/07/2006 13:28:15
On Wed, Dec 06, 2006 at 04:32:11PM -0700, Sverre Froyen wrote:
> Hi,
> 
> I just had what I'm guessing is an LFS related crash.  I've attached a gdb 
> backtrace, below.  The system is current i386 with sources from this morning.  
> I was doing a sandbox compilation of several pkgsrc modules when it crashed.  
> Is there any additional information that would be useful?

I'm glad someone else has seen this!

I've been looking into an LFS system hang, and an LFS crash - which
appear to be related. From my investigation, it appears that the
lfs_pchain tailqueue is getting corrupted somehow. The hang I'm
seeing has lfs_pchain with one or more inodes on the chain (usually
just one), but none have IN_PAGING set, and it spins on the goto in
lfs_flush_pchain().

The crash, which I'm assuming is the same as yours, is attempting
to remove an inode from an empty lfs_pchain after clearing
IN_PAGING, in lfs_putpages().

Since all lfs_pchain operations are protected by lfs_interlock, I
can't see how this could happen. But I have 12+ dumps to prove that
it does. It appears to be much easier to reproduce if the system is
starved for RAM... the ancient system I've been testing this on only
has 128 MB RAM, and even then, it's far easier to reproduce if I
fill up RAM before torturing LFS. Most recently, I've duplicated
this with a stock GENERIC kernel, too, ruling out my custom MP
kernel config.

I'll raise a PR if I get a chance tonight.

-- 
stix