Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

netbsd-6 instability - vmem



I am having a stabilty problem which is hard to figure out.  I recently
updated a machine from netbsd-5 to netbsd-6 (i386).  It's a pretty
normal box:

NetBSD 6.0_STABLE (GENERIC) #22: Wed Jan 23 18:08:47 EST 2013
        
gdt%ir.bbn.com@localhost:/u0/n1/obj/gdt-6/i386/sys/arch/i386/compile/GENERIC
total memory = 3569 MB
avail memory = 3497 MB
cpu0 at mainbus0 apid 0: Intel(R) Core(TM) i5-2310 CPU @ 2.90GHz, id 0x206a7
cpu1 at mainbus0 apid 2: Intel(R) Core(TM) i5-2310 CPU @ 2.90GHz, id 0x206a7
cpu2 at mainbus0 apid 4: Intel(R) Core(TM) i5-2310 CPU @ 2.90GHz, id 0x206a7
cpu3 at mainbus0 apid 6: Intel(R) Core(TM) i5-2310 CPU @ 2.90GHz, id 0x206a7
acpi0: X/RSDT: OemId <INTEL ,DH67CL  ,01072009>, AslId <AMI ,00010013>

I have 16G of RAM, but haven't switched to amd64 mode yet.

Just before the upgrade, I had a problem where the machine would reboot
just as amanda did estimates.  I had just upgraded to 3.3, and it was
using snapshots to do estimates.  Even though our dump supports estimate
only mode, amanda was killing dump after it printed the estimate (as it
does for older dumps).  My theory was that killing dump as it was
tearing down a snapshot was bad.  I rebuilt amanda w/o snapshot support
(so it didn't give dump the snapshot flags), and then things were mostly
ok.

Then, I still had crashes and hangs.

The main symptom is processes watiing on vmem.  While in this state, the
machine is up, but all processes and networking is hosed.  ps from ddb
shows many in vmem and tstile.  Hitting return gets a login: prompt but
then ^T shows vmem.  Looking at pools in ddb, I see failed requests for
mbufs and clusters, but I think this isa symptom, not the cause.

Sometimes, swwdog causes a reboot.  Sometimes it doesn't (perhaps
becuase as long as that process doesn't need memory it doesn't get
blocked; I keep meaning to make it do something else like read a random
file in betweeen tickles).

I find that the machine usually crashes overnight.  I suspected daily
cron job, so took those out, and last night it stayed up.  Running "find
. -tyep f > FILES" in my homedir resulted in find stuck in vmem and not
responsive to ^C.  I was able to ssh in and do 'reboot'.

My kernel is pretty normal, but has IPSEC (kame) and coda.  But I
disabled coda and IPsec from running.

I have seen a similar problem with an earlier netbsd-6 snapshot in a
private tree.  In that case, there's a kernel thread running over a huge
amount of (kernel) ram, and per-packet processing also uses this huge
amount of ram.  This mostly works, but the machine locks up on the daily
cron job.  With a small fs (no sources, just the bare install), it seems
ok.

I do have
  kern.maxvnodes = 391680
to help with git and a repository with 269268 files.  I wonder if this
is what makes me odd.

Is anyone else seeing anything that could be like this?

Attachment: pgpWfqS2NODE7.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index