Subject: ioflush process spinup and memory when building on 2.0.2/i386
To: NetBSD tech-kern Mailing List <tech-kern@netbsd.org>
From: Douglas Wade Needham <cinnion@ka8zrt.com>
List: port-i386
Date: 06/27/2005 22:21:40
	version=3.0.3
Sender: port-i386-owner@NetBSD.org

Greetings everyone,

Got an interesting problem which I have noticed while doing builds for
i386/current, prep/current, and hp700/current as part of a nightly
cron job running on a 2.0.2 machine.  Every few nights, generally
associated with one or more failed release builds, I get a problem
where ioflush starts using all available CPU (98%+ for ioflush), and
the host is nearly unresponsive (2-3 minutes to switch from X11 on ttyE5
to the console).  This condition can remain for hours, but I generally
never let it go beyond this, and have been trying to generate a dump
and collect other diags.

I am in the process of collecting some additional information, but I
wanted to see if anyone else has seen this.  Here are the details:

    Kernel:		2.0.2/i386
    RAM:    		768MB
    dmesg.boot:		See http://cinnion.ka8zrt.com/netbsd/sandbox-current/

Other items to note:

    1) Builds are run via the attached script
    2) Build scripts can be seen via cvsweb by going to
       http://www.ka8zrt.com/cgi-bin/cvsweb.cgi/netbsd_build/
    3) Kernel is pretty much a trimmed down GENERIC.  Unfortunately, I
       do not have the ability to get at this maching using serial kgdb,
       which is what it is configured for. 8/  The config file is in
       the same directory as the dmesg.boot, and I will be updating
       for no-serial debugging in the next few days.
    4) No crash dump as of yet.
    5) All of these builds mount src, xsrc and a lsrc directory via
       union mounts.  [Union mounts are still too flaky for my pkgsrc
       builds 8( ]  In rare cases, the union mounts appear to
       hang during unmount, and dfs will block.
    6) Memory usage can be extremly high.  Free memory generally
       starts out at 500000 or more, but can drop to an average of 30K
       well after the builds complete, and <<1000 during builds.
    7) When the next instance of this happens, I will be placing the
       build logs on the web and sending a follow-up message.

Anyone seen this or have any other diagnostics I have not yet thought
to collect?   I doubt it and suspect that this may be yet another
problem with union mounts, but I wanted to run the flag up the pole
anyways.

Thanks!

- Doug

-- 
Douglas Wade Needham - KA8ZRT        UN*X Consultant & UW/BSD kernel programmer
Email:  cinnion @ ka8zrt . com       http://cinnion.ka8zrt.com
Disclaimer: My opinions are my own.  Since I don't want them, why
            should my employer, or anybody else for that matter!