tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
re: Kernel memory allocation oddities in NetBSD-10.99.12
Hello Matthew. This problem is related to the thread I created about 6 weeks ago with the
subject:
Processes getting stuck in "fstchg" with NetBSD-10.99.12/amd64
The reply I received from Hannken, quoted below, caused me to start looking at the memory
allocation failures I'm seeing on this particular VM.
While I'm still waiting for a full out-right failure, I've been watching things to see if
I might be able to identify a precursor to the event before it happens.
-thanks
-Brian
--- Forwarded mail from "J. Hannken-Illjes" <hannken%mailbox.org@localhost>
Subject: Re: Processes getting stuck in "fstchg" with NetBSD-10.99.12/amd64
From: "J. Hannken-Illjes" <hannken%mailbox.org@localhost>
Date: Sun, 29 Jun 2025 13:45:25 +0200
Cc: tech-kern%netbsd.org@localhost
> On 29. Jun 2025, at 02:21, Brian Buhrow <buhrow%nfbcal.org@localhost> wrote:
>
> hello. I have a number of machines running NetBSD-10.99.12/amd64 running on real hardware
> and running as VM machines, mostly xen, but also as guests on KVM.
> Out of approximately 25 different instances, I have one Xen machine where processes "hang".
> The machine may run for a week before this happens, or it may run for months. To try and
> figure out what is going wrong, I installed a kernel with ddb in it and when the problem
> manifested itself, I discovered processes and threads that look like:
>
>
> PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
> 28144 28144 3 0 0 ffff9dde28395400 sshd fstchg
> 2416 2416 3 0 0 ffff9dde28395000 sshd fstchg
> 1481 1481 3 0 0 ffff9ddcf99dfc00 sshd fstchg
> 6400 6400 3 1 0 ffff9ddcf99df800 cron fstchg
> 28098 28098 3 0 0 ffff9ddcf99df400 cron fstchg
> 5484 5484 3 1 0 ffff9ddcf99df000 cron fstchg
> 1394 1394 3 0 0 ffff9ddd25e8fc00 cron fstchg
> 26447 26447 3 0 0 ffff9ddd25e8f800 cron fstchg
>
> . . .
>
> 0 123 3 0 200 ffff9dddfd0bac00 ioflush fstchg
>
> The system is:
>
> NetBSD lothlorien.nfbcal.org 10.99.12 NetBSD 10.99.12 (MIRKWOOD_PVH_DDB)
> #0: Mon Apr 7 05:50:18 PDT 2025
> buhrow%loth-9.nfbcal.org@localhost:/usr/src/sys/arch/amd64/compile/MIRKWOOD_PVH_DDB amd64
>
> In looking at the code, I see these processes are waiting to do something with the
> filesystem or filesystems. There are a number of mounted partitions, all ffs, plus a ptyfs
> filesystem running in compatibility mode, i.e. /dev/ttypx, rather than /dev/pts/*, which means
> it doesn't show up as a filesystem at all. My questions are as follows:
>
> 1. How do I find which one of these is the blocking process?
>
>
> 2. Has anyone else seen this behavior?
>
> As I say, only one of my many machines exhibits this behavior, and it is a Xen guest on
> which other VM's running the exact same code, are working fine for months at a time.
>
> Suggestions welcome.
> -thanks
> -Brian
These processes are waiting for a file system suspension. From ddb you may run
call fstrans_dump(1)
to dump the current state of the suspension subsystem. You will see which processes / lwps
are "inside" a file system and which file systems are suspending / suspended.
The syncer (ioflush) waiting is generally bad, is there still free kmem?
--
J. Hannken-Illjes - hannken%mailbox.org@localhost
--- End of forwarded message from "J. Hannken-Illjes" <hannken%mailbox.org@localhost>
Home |
Main Index |
Thread Index |
Old Index