Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: building netbsd-9 2 'sync' processes stuck in 'tstile'



At Mon, 3 May 2021 17:52:03 -0500 (CDT), "John D. Baker" <jdbaker%consolidated.net@localhost> wrote:
Subject: building netbsd-9 2 'sync' processes stuck in 'tstile'
>
> While building netbsd-9/amd64 with "-j 2", the build process got stuck
> while linking "GENERIC_KASLR" and "GENERIC".  'top' shows two 'sync'
> processes stuck in 'tstile'.  Although the build could be aborted with
> "Ctrl-C", the two 'sync' processes remain and cannot be killed (even
> with -9).
>
> The host is netbsd-9/amd64 as of 30 April.  The filesystem on which the
> build process operates resides on a local RAIDframe RAID-R of eight 1TB
> SATA disks.
>
> The same filesystem is also NFS exported and clients otherwise continue
> to operate on it normally.

So, I've had a similar, but less critical, thing happen, though with a
somewhat opposite configuration.

I.e. I've seen lots of processes get "stuck" and/or very slow (with
processes sitting in "tstile" for long periods) on a similar system.

However the main problem seemed to be on a -current system that was
somewhat heavily accessing an NFS filesystem on another (older) NetBSD
system.  (i.e. /usr/src and /home are NFS mounts to the other server)

I don't know if these "tstile" processes were unkillable (though I've
experienced that before where a kernel deadlock caused it(*)).

However they eventually completed, and even more mysteriously the whole
problem resolved itself and disappeared without any knowing intervention!

I just left the machine to struggle along overnight and in the morning
it was running fine, and continued to do so for over a week until I
rebooted the other day to test some unrelated kernel fixes.

I never did find any possible cause for the slowness.

The older system that's serving NFS has an uptime of 117 days and didn't
seem to be suffering any ill effects during the slowness or since.


(*) The "tstile" hangs caused by a deadlock were on a Xen dom0 where
there were locking order problems in the xenstore interface and so "xl"
commands could deadlock in the kernel.  That bug has been fixed.

--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgpFbiVtcYTRe.pgp
Description: OpenPGP Digital Signature



Home | Main Index | Thread Index | Old Index