Re: pgdaemon high CPU consumption

To: Matthias Petermann <mp%petermann-it.de@localhost>
Subject: Re: pgdaemon high CPU consumption
From: Brad Spencer <brad%anduin.eldar.org@localhost>
Date: Sun, 03 Jul 2022 13:06:46 -0400

Matthias Petermann <mp%petermann-it.de@localhost> writes:

> Hello,
>
> On 01.07.22 12:48, Brad Spencer wrote:
>> "J. Hannken-Illjes" <hannken%mailbox.org@localhost> writes:
>> 
>>>> On 1. Jul 2022, at 07:55, Matthias Petermann <mp%petermann-it.de@localhost> wrote:
>>>>
>>>> Good day,
>>>>
>>>> since some time I noticed that on several of my systems with NetBSD/amd64 9.99.97/98 after longer usage the kernel process pgdaemon completely claims a CPU core for itself, i.e. constantly consumes 100%.
>>>> The affected systems do not have a shortage of RAM and the problem does not disappear even if all workloads are stopped, and thus no RAM is actually used by application processes.
>>>>
>>>> I noticed this especially in connection with accesses to the ZFS set up on the respective machines - for example after checkout from the local CVS relic hosted on ZFS.
>>>>
>>>> Is there already a known problem or what information would have to be collected to get to the bottom of this?
>>>>
>>>> I currently have such a case online, so I would be happy to pull diagnostic information this evening/afternoon. At the moment all info I have is from top.
>>>>
>>>> Normal view:
>>>>
>>>> ```
>>>>   PID USERNAME PRI NICE   SIZE   RES STATE       TIME   WCPU    CPU COMMAND
>>>>     0 root     126    0     0K   34M CPU/0     102:45   100%   100% [system]
>>>> ```
>>>>
>>>> Thread view:
>>>>
>>>>
>>>> ```
>>>>   PID   LID USERNAME PRI STATE       TIME   WCPU    CPU NAME      COMMAND
>>>>     0   173 root     126 CPU/1      96:57 98.93% 98.93% pgdaemon  [system]
>>>> ```
>>>
>>> Looks a lot like kern/55707: ZFS seems to trigger a lot of xcalls
>>>
>>> Last action proposed was to back out the patch ...
>>>
>>> --
>>> J. Hannken-Illjes - hannken%mailbox.org@localhost
>> 
>> 
>> Probably only a slightly related data point, but Ya, if you have a
>> system / VM / Xen PV that does not have a whole lot of RAM and if you
>> don't back out that patch your system will become unusable in a very
>> short order if you do much at all with ZFS (tested with a recent
>> -current building pkgsrc packages on a Xen PVHVM).  The patch does fix a
>> real bug, as NetBSD doesn't have the define that it uses, but the effect
>> of running that code will be needed if you use ZFS at all on a "low" RAM
>> system.  I personally suspect that the ZFS ARC or some pool is allowed
>> to consume nearly all available "something" (pools, RAM, etc..) without
>> limit but have no specific proof (or there is a leak somewhere).  I
>> mostly run 9.x ZFS right now (which may have other problems), and have
>> been setting maxvnodes way down for some time.  If I don't do that the
>> Xen PV will hang itself up after a couple of 'build.sh release' runs
>> when the source and build artifacts are on ZFS filesets.
>
> Thanks for describing this use case. Apart from the fact that I don't 
> currently use Xen on the affected machine, it performs similiar 
> workload. I use it as pbulk builder with distfiles, build artifacts and 
> CVS / Git mirror stored on ZFS. The builders themself are located in 
> chroot sandboxes on FFS. Anyway, I can trigger the observations by doing 
> a NetBSD src checkout from ZFS backed CVS to the FFS partition.
>
> The maxvnodes trick first led to pgdaemon behave normal again, but the 
> system freezed shortly after with no further evidence.
>
> I am not sure if this thread is the right one for pointing this out, but 
> I experienced further issues with NetBSD current and ZFS when I tried to 
> perform a recursive "zfs send" of a particular snapshot of my data sets. 
> After it initially works, I see the system freeze after a couple of 
> seconds with no chance to recover (could not even enter the kernel 
> debugger). I will come back and need to prepare a dedicated test VM for 
> my cases.
>
> Kind regards
> Matthias


I saw something like that with a "zfs send..." and "zfs receive..."
locking up just one time.  I do that sort of thing fairly often to move
filesets between one system and another and it has worked fine for me,
except in one case...  the destination was a NetBSD-current with a ZFS
fileset set to use compression.  The source is a FreeBSD with a ZFS
fileset created in such a manor that NetBSD is happy with it and it also
is set to use compression.  No amount of messing around would let 'zfs
send <foo> | ssh destination "zfs receive <foo>"' complete without
locking up the destination.  When I changed the destination to not use
compression I was able to perform the zfs send / receive pipeline
without any problems.  The destination is a pretty recent -current Xen
PVHVM guest and the source is a FreeBSD 12.1 (running minio to back up
my Elasticsearch cluster).



-- 
Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org

References:
- Re: pgdaemon high CPU consumption
  - From: Matthias Petermann

Prev by Date: Re: pgdaemon high CPU consumption
Next by Date: daily CVS update output
Previous by Thread: Re: pgdaemon high CPU consumption
Next by Thread: Re: pgdaemon high CPU consumption
Indexes:

Home | Main Index | Thread Index | Old Index