At Thu, 17 Jul 2025 18:09:15 -0700, Brian Buhrow <buhrow%nfbcal.org@localhost> wrote: Subject: Re: Processes getting stuck in "fstchg" with NetBSD-10.99.12/amd64 > > Hello. Thanks for the reply. I'm still trying to work out what's going on. It may very > well be a memory shortage issue, but I'm thinking it's either some kind of memory fragmentation > issue or a network related problem. The issue appears to be triggered when ssh sessions are > uncleanly terminated. Specifically, when dangling connections are left hanging by stateful > firewalls which timeout between client and server, causing the server side to shutdown > uncleanly. What appears to happen is that something gets hung up, a bunch of processes start, > things get stuck in fstchg and everything hangs, though the kernel doesn't crash. Sometimes I > see proc table full messages, but not always. > The next time it happens I'll call fstrans_dump from ddb to see if that yields any results, but > right now, I'm at a loss as to which process it is that gets stuck initially, causing the > pileup. And, while I am pretty sure I know what triggers the problem, I haven't quite figured > out how to reproduce it at will. > Anyone seen anything like this? > This is on amd64, NetBSD-10.99.12 on a xen VM with 2 processors. > I have a bunch of other machines, both VM's and bare metal, running the same code without > trouble. So, since this is happening in a specific VM, is it not easy to adjust the amount of memory allocated to it to see if that does make it more likely (with less memory) or less likely (with more memory) for the hangs to happen? Also is it possible to clone the VM and run tests in the clone with more/less RAM allocated? -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgpURHEb39upn.pgp
Description: OpenPGP Digital Signature