NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)



Frank Kardel <kardel%netbsd.org@localhost> writes:


> Thanks for your observation. Actually "large memory" could be seen more 
> like where
> vmem_size(kernel_arena, VMEM_ALLOC|VMEM_FREE) / 10 in pages being 
> significantly larger than uvmexp.freetarg.
> As you have observed this can already happen on smaller systems.
>
> -Frank


Sure...

I was able to perform the abusive build operation and was able to make
the system fall over.  The abuse is the following:

Have a 10.0 PVH guest with 16GB and 2vcpus.  Run the following builds at
the same time:

build.sh -j2 <- for amd64
build.sh -j2 <- for i386
build.sh -j2 <- for earmv7hf

The source tree is in a ZFS fileset and is used by all of the builds.
The artifacts (obj, dist, release, etc..) are all in their own ZFS
filesets for each of the arch types (that is /artifacts/amd64 would be
its own ZFS filesystem and contain object, release and dist
subdirectories, using the -O, -R and -D flags to build.sh to point to
/artifacts/amd64/OBJ and etc.  There would also be a /artifacts/i386 and
/artifacts/earmv7hf which are also theirs own filesets).

Everything will humming along just fine, until the earmv7hf build nears
the end and does /usr/src/distrib/utils/embedded/mkimage which does "dd
bs=1 count=4456448 if=/dev/zero" ... that dd will run with high CPU for
a little bit and then cause all active reads and writes going on with
the other builds and itself to more or less deadlock.  The CPU
utilization will fall to zero and disk utilization on the zpool will
fall to zero.  The system will be responsive, but if you try hitting any
of the files being used the command (ls, or whatever) will hang up.

As far as I can tell what was going on in the system was two objcopy and
two rm along with the dd.  One objcopy was stuck in tstile and the other
in &zilog.  The dd was stuck in &tx->t and both rm were stuck in &zio->
... all according to top.

I can almost reproduce this on demand, as long as the amd64 and i386
builds are actually building something and the earmv7hf build hits the
mkimage call at the same time.  A clean build of all three will probably
provoke it and update builds (-u flag to build.sh) may as well.

This is all probably unrelated to the patch that was provided and the
problem being reported.  The patch does appear to make the situation
better.

Might want to consider switching the arguments to that "dd" to be "dd
count=1 bs=4456448 if=/dev/zero" .. that is just write one block of
4456448 bytes instead of 4456448 one byte blocks.  Might be less
stressful.





-- 
Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org


Home | Main Index | Thread Index | Old Index