Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: netbsd-6 instability - vmem



Dave B <spam%y2013.dberg.net@localhost> writes:

> On Thu, Feb 07, 2013 at 10:43:42AM -0500, Greg Troxel wrote:
>> 
>> Greg Troxel <gdt%ir.bbn.com@localhost> writes:
>> 
>> I have rebooted a few times (not because the machine was in trouble),
>> but so far with kern.maxvnodes much lower the machine is very stable.
>> I am leaving it at 104448, which is the default at boot.
>
>   Hmm.  That sounds like good news.
>
>   My sysctl.conf files (is my maxfiles a potential problem?):
>
>    MACPPC:
>
>     ddb.onpanic?=0
>     kern.maxfiles=32760
>
>    AMD64:
>
>     ddb.onpanic?=0
>     vm.anonmax=40

I don't really understand what's going on, but my working hypothesis is
that kernel virtual address space is being exhausted, and that this
isn't handled well.  So anything that causes more kernel virtual space
to be allocated than is typical could cause problems.  I don't know how
big the file structure is, but I think kern.maxfiles=3404 is typical.

On amd64, I am not at all clear on how much KVA is available.  You might
look at kern.maxvnodes, and lower it.  On one amd64 system with 8G, not
setting either, maxfiles is 3404 and maxvnodes is 415928.  So far that
system has been ok.

So you could have multiple problems, with the amd64 being unrelated.

If you can avoid running X, and then provoke a lockup, ddb may be
interesting.  I found processes in vmem and tstile, and 'show pool'
indicated failure of the pool code to get memory.

Also, without X, if there are any disk issues, you're more likely to see
the logs.

>   I later also ran the suite in /usr/tests, and while there were
> some unexpected failures, 
>
>     Failed test cases:
>         fs/tmpfs/t_rmdir:non_existent, fs/tmpfs/t_setattr:chowngrp,
>         fs/tmpfs/t_sockets:basic, fs/vfs/t_renamerace:lfs_renamerace_dirs,
>         lib/libc/net/getaddrinfo/t_getaddrinfo:basic,
>         lib/libc/net/getaddrinfo/t_getaddrinfo:empty_servname,
>         lib/libc/net/getaddrinfo/t_getaddrinfo:sock_raw,
>         lib/libc/net/t_servent:servent, rump/rumpkern/t_sp:reconnect,
>         toolchain/cc/t_hello:hello32
>     
> I couldn't tell whether any of them are related to my symptom.

Those don't immediately look related.

>   My stream of ideas may soon be running dry.  I guess I still
> haven't tried DIAGNOSTIC on the amd64 (just DEBUG + LOCKDEBUG), so
> I may do that next.

Certainly turn on DIAGNOSTIC.  Compared to DEBUG and LOCKDEBUG it
doesn't hurt, and I run machines with DIAGNOSTIC all the time.

>   Am willing to send my kernel config (and perhaps other info)
> privately if you'd be curious or happy to look at it, but didn't
> want to foist it upon you without permission.

You've described the diffs, so I don't think it's likely anything in
there.

Attachment: pgpjQALdJ3jtt.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index