Subject: Re: Possible serious bug in NetBSD-1.6.1_RC2
To: Greg Oster <oster@cs.usask.ca>
From: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
List: port-i386
Date: 04/08/2003 00:17:09
	Hello folks.  I find it very curious that no one else is seeing the
same problems I am with NetBSD-1.6X.  I've narrowed things down to 2
problems:

1.  Periodically, usually during heavy i/o activity, the machine panics
with a uvm_fault indicating an invalid page table.  

2.  The machine hangs with one or more processes in "flt_pmfail[12]".

	In response to Greg's discovery that paging to a raid5 swap area
causes hangs, I changed my configuration to only swap and page to a single
disk.  This change does not change the behavior of my machine at all.  Once
it begins using the paging area on the disk, it won't be long until a hang
occurs.

	I've captured several core files from this hanging process, and would
be happy to provide details for anyone who might be able to help shed light
on the problem.

	I've captured core files which demonstrate both problems, and am
willing to try and troubleshoot this problem further if anyone can provide
guidence.  I have 1.6.1 sources as of April 4, 2003 and I have a full
symbol table copy of the kernel ready to trace with gdb, ps, vmstat, or
what ever.

	Right now, the machine will not stay up more than 24 hours, and,
usually, it crashes due to one of the problems within twelve hours of a
restart.  
	I'm usually pretty good at tracking down problems, but this one seems
pretty thorney, and, I confess, I'm getting pretty frustrated.  Would
someone be willing to help me troubleshoot this problem further?  I'm happy
to provide any details, images, moral support, free beer, what ever.  I've
been using NetBSD for 10 years, and, in fact, this machine is supposed to
be replacing my ancient NetBSD 0.9A system, but so far, that server is
still more reliable than this shiny new 1.6.1 system.

	Please, any help would be greatfully appreaciated.  
-Brian