Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: netbsd-6 instability - vmem



On Thu, Feb 07, 2013 at 10:43:42AM -0500, Greg Troxel wrote:
> 
> Greg Troxel <gdt%ir.bbn.com@localhost> writes:
> 
> I have rebooted a few times (not because the machine was in trouble),
> but so far with kern.maxvnodes much lower the machine is very stable.
> I am leaving it at 104448, which is the default at boot.

  Hmm.  That sounds like good news.

  My sysctl.conf files (is my maxfiles a potential problem?):

   MACPPC:

    ddb.onpanic?=0
    kern.maxfiles=32760

   AMD64:

    ddb.onpanic?=0
    vm.anonmax=40

(leaving out comments); so, I guess I'm still at the drawing board.

  The other night, I decided to try and apply a figurative "large
hammer" to the problem on the amd64, and redo much from scratch:

 o  re-rsync'ed my local NetBSD repository copy
 o  re-checked-out -rnetbsd-6 src & xsrc
 o  removed my DESTDIR
 o  cleared my ccache cache
 o  rebuilt a "release" target, plus my kernel
 o  re-installed all binaries from above (skipping *etc.tgz)

in hopes that maybe I'd done something wrong in a prior update-
cycle, which could then be fixed--especially given that I use
ccache for building OS stuff as well as pkgsrc, and that on
occasion, a corrupted ccache cache has caused problems.  Granted,
I've only noticed build-failures from that, but I suppose there
could've been run-time issues from it too that were harder to spot.

  Unfortunately, it didn't help: some time in the evening after
doing the above, I was putting heavy load on the machine, including
rebuilding packages from scratch, and it locked up again. (For
disk I/O at least... interestingly, I was still able to issue a few
builtin commands from an idle shell, e.g., "kill"s, and see some
windows disappear, although "kill 1" did nothing--but once I
issued commands requiring I/O, that shell locked up too).

  I later also ran the suite in /usr/tests, and while there were
some unexpected failures, 

    Failed test cases:
        fs/tmpfs/t_rmdir:non_existent, fs/tmpfs/t_setattr:chowngrp,
        fs/tmpfs/t_sockets:basic, fs/vfs/t_renamerace:lfs_renamerace_dirs,
        lib/libc/net/getaddrinfo/t_getaddrinfo:basic,
        lib/libc/net/getaddrinfo/t_getaddrinfo:empty_servname,
        lib/libc/net/getaddrinfo/t_getaddrinfo:sock_raw,
        lib/libc/net/t_servent:servent, rump/rumpkern/t_sp:reconnect,
        toolchain/cc/t_hello:hello32
    
I couldn't tell whether any of them are related to my symptom.

  My stream of ideas may soon be running dry.  I guess I still
haven't tried DIAGNOSTIC on the amd64 (just DEBUG + LOCKDEBUG), so
I may do that next.

  Am willing to send my kernel config (and perhaps other info)
privately if you'd be curious or happy to look at it, but didn't
want to foist it upon you without permission.

  Thanks for your on-going help with this, by the way.  It's
certainly a source of frustration having my primary workstation
become unstable, so having capable people to collaborate with is a
big relief.

-D


Home | Main Index | Thread Index | Old Index