Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: pgdaemon high CPU consumption
Matthias Petermann <mp%petermann-it.de@localhost> writes:
> Hello,
>
> On 01.07.22 12:48, Brad Spencer wrote:
>> "J. Hannken-Illjes" <hannken%mailbox.org@localhost> writes:
>>
>>>> On 1. Jul 2022, at 07:55, Matthias Petermann <mp%petermann-it.de@localhost> wrote:
>>>>
>>>> Good day,
>>>>
>>>> since some time I noticed that on several of my systems with NetBSD/amd64 9.99.97/98 after longer usage the kernel process pgdaemon completely claims a CPU core for itself, i.e. constantly consumes 100%.
>>>> The affected systems do not have a shortage of RAM and the problem does not disappear even if all workloads are stopped, and thus no RAM is actually used by application processes.
>>>>
>>>> I noticed this especially in connection with accesses to the ZFS set up on the respective machines - for example after checkout from the local CVS relic hosted on ZFS.
>>>>
>>>> Is there already a known problem or what information would have to be collected to get to the bottom of this?
>>>>
>>>> I currently have such a case online, so I would be happy to pull diagnostic information this evening/afternoon. At the moment all info I have is from top.
>>>>
>>>> Normal view:
>>>>
>>>> ```
>>>> PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
>>>> 0 root 126 0 0K 34M CPU/0 102:45 100% 100% [system]
>>>> ```
>>>>
>>>> Thread view:
>>>>
>>>>
>>>> ```
>>>> PID LID USERNAME PRI STATE TIME WCPU CPU NAME COMMAND
>>>> 0 173 root 126 CPU/1 96:57 98.93% 98.93% pgdaemon [system]
>>>> ```
>>>
>>> Looks a lot like kern/55707: ZFS seems to trigger a lot of xcalls
>>>
>>> Last action proposed was to back out the patch ...
>>>
>>> --
>>> J. Hannken-Illjes - hannken%mailbox.org@localhost
>>
>>
>> Probably only a slightly related data point, but Ya, if you have a
>> system / VM / Xen PV that does not have a whole lot of RAM and if you
>> don't back out that patch your system will become unusable in a very
>> short order if you do much at all with ZFS (tested with a recent
>> -current building pkgsrc packages on a Xen PVHVM). The patch does fix a
>> real bug, as NetBSD doesn't have the define that it uses, but the effect
>> of running that code will be needed if you use ZFS at all on a "low" RAM
>> system. I personally suspect that the ZFS ARC or some pool is allowed
>> to consume nearly all available "something" (pools, RAM, etc..) without
>> limit but have no specific proof (or there is a leak somewhere). I
>> mostly run 9.x ZFS right now (which may have other problems), and have
>> been setting maxvnodes way down for some time. If I don't do that the
>> Xen PV will hang itself up after a couple of 'build.sh release' runs
>> when the source and build artifacts are on ZFS filesets.
>
> Thanks for describing this use case. Apart from the fact that I don't
> currently use Xen on the affected machine, it performs similiar
> workload. I use it as pbulk builder with distfiles, build artifacts and
> CVS / Git mirror stored on ZFS. The builders themself are located in
> chroot sandboxes on FFS. Anyway, I can trigger the observations by doing
> a NetBSD src checkout from ZFS backed CVS to the FFS partition.
>
> The maxvnodes trick first led to pgdaemon behave normal again, but the
> system freezed shortly after with no further evidence.
>
> I am not sure if this thread is the right one for pointing this out, but
> I experienced further issues with NetBSD current and ZFS when I tried to
> perform a recursive "zfs send" of a particular snapshot of my data sets.
> After it initially works, I see the system freeze after a couple of
> seconds with no chance to recover (could not even enter the kernel
> debugger). I will come back and need to prepare a dedicated test VM for
> my cases.
>
> Kind regards
> Matthias
I saw something like that with a "zfs send..." and "zfs receive..."
locking up just one time. I do that sort of thing fairly often to move
filesets between one system and another and it has worked fine for me,
except in one case... the destination was a NetBSD-current with a ZFS
fileset set to use compression. The source is a FreeBSD with a ZFS
fileset created in such a manor that NetBSD is happy with it and it also
is set to use compression. No amount of messing around would let 'zfs
send <foo> | ssh destination "zfs receive <foo>"' complete without
locking up the destination. When I changed the destination to not use
compression I was able to perform the zfs send / receive pipeline
without any problems. The destination is a pretty recent -current Xen
PVHVM guest and the source is a FreeBSD 12.1 (running minio to back up
my Elasticsearch cluster).
--
Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org
Home |
Main Index |
Thread Index |
Old Index