Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,kardel%netbsd.org@localhost
Subject: Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)
From: Frank Kardel <kardel%netbsd.org@localhost>
Date: Thu, 3 Aug 2023 18:25:01 +0000 (UTC)

The following reply was made to PR kern/57558; it has been noted by GNATS.

From: Frank Kardel <kardel%netbsd.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)
Date: Thu, 3 Aug 2023 20:22:10 +0200

 Hi Chuck !

 Thanks for looking into that.

 I came up with the first patch due to pgdaemon looping due to

 uvm_km_va_starved_p() being true.

 vmstat -m shows the statistics of the pools is summary close to

 32Gb my DOM0 has.

 counting the conditions when the pgdaemon is looping gives

 /var/log/messages.0.gz:Jul 28 17:42:41 Marmolata /netbsd: [ 
 9789.2242179] pagedaemon: loops=16026699, cnt_needsfree=0, 
 cnt_needsscan=0, cnt_drain=16026699, cnt_starved=16026
 699, cnt_avail=16026699, fpages=337385
 /var/log/messages.0.gz:Jul 28 17:42:41 Marmolata /netbsd: [ 
 9795.2244437] pagedaemon: loops=16024007, cnt_needsfree=0, 
 cnt_needsscan=0, cnt_drain=16024007, cnt_starved=16024
 007, cnt_avail=16024007, fpages=335307
 /var/log/messages.0.gz:Jul 28 17:42:41 Marmolata /netbsd: [ 
 9801.2246381] pagedaemon: loops=16031141, cnt_needsfree=0, 
 cnt_needsscan=0, cnt_drain=16031141, cnt_starved=16031
 141, cnt_avail=16031141, fpages=335331

 uvm_km_va_starved_p(void)
 {
          vmem_size_t total;
          vmem_size_t free;

          if (kmem_arena == NULL)
                  return false;

          total = vmem_size(kmem_arena, VMEM_ALLOC|VMEM_FREE);
          free = vmem_size(kmem_arena, VMEM_FREE);

          return (free < (total / 10));
 }

 int
 uvm_availmem(bool cached)
 {
          int64_t fp;

          cpu_count_sync(cached);
          if ((fp = cpu_count_get(CPU_COUNT_FREEPAGES)) < 0) {
                  /*
                   * XXXAD could briefly go negative because it's impossible
                   * to get a clean snapshot.  address this for other 
 counters
                   * used as running totals before NetBSD 10 although less
                   * important for those.
                   */
                  fp = 0;
          }
          return (int)fp;
 }

 So, while uvm_km_va_starved_p() considers almost all memory used up 
 uvm_availmem(false) returns 337385 free pages (~1.28 Gb) well above 
 uvmexp.freetarg.

 So, why do we count so many free pages when the free vmem for kmem_arena 
 is less than 10% of the total kmem_arena?
 Maybe the pool pages have been allocated but not yet been referenced - I 
 didn't look that deep into the vmen/ZFS interaction.

 I understand the reasoning why .kmem size = phymem size should have worked

 There are still inconsistencies, though.
 Even if uvm_availmem(false) would account for all pages 
 allocated/reserved in the kmem_arena vmem on the 32Gb system the actual 
 freetarget is 2730 free pages (~10.7 Mb).
 %10 of 32Gb would be 3.2Gb which is a multiple of the free pages target. 
 So even then we would be stuck with a looping page daemon.

 I think we need to find a better way for coping with with the accounting 
 differences between vmem/uvm free pages. Looking at the vmem statistics 
 seemed logical to me as ZFS allocates almost everything from kmem_arena 
 via pools.
 I don't know what vmem does when there are less physical pages available 
 that the vmem allocation would require. This was the case you tried to
 avoid.

 So, looking at vmen statistic seems to be consistent with the starved 
 flag logic - that is why it does not trigger the looping pgdaemon. What 
 isn't
 covered is the case of less physical pages than the pool allocation 
 required.

 I think we have yet to find a correct, robust solution that does not 
 trigger the pgdaemon almost infinite loop.

 Frank

 On 08/03/23 18:30, Chuck Silvers wrote:
 > The following reply was made to PR kern/57558; it has been noted by GNATS.
 >
 > From: Chuck Silvers <chuq%chuq.com@localhost>
 > To: gnats-bugs%netbsd.org@localhost
 > Cc:
 > Subject: Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)
 > Date: Thu, 3 Aug 2023 09:27:50 -0700
 >
 >   On Thu, Aug 03, 2023 at 08:45:01AM +0000, kardel%netbsd.org@localhost wrote:
 >   > 	Patch 1:
 >   > 		let ZFS use a correct view on KVA memory:
 >   > 		With this patch arc reclaim now detects memory shortage and
 >   > 		frees pages. Also the ZFS KVA used by ZFS is limited to
 >   > 		75% KVA - could be made tunable
 >   >
 >   > 	Patch 1 is not sufficient though. arc reclaim thread kicks in at 75%
 >   > 	correctly, but pages are not fully reclaimed and ZFS depletes its cache
 >   > 	fully as the freed and now idle page are not reclaimed from the pools yet.
 >   > 	pgdaemon will now not trigger pool_drain, as uvm_km_va_starved_p() returns false
 >   > 	at this point.
 >   
 >   this patch is not correct.  it does not do the right thing when there
 >   is plenty of KVA but a shortage of physical pages.  the goal with
 >   previous fixes for ZFS ARC memory management problems was to prevent
 >   KVA shortages by making KVA big enough to map all of RAM, and thus
 >   avoid the need to consider KVA because we would always run low on
 >   physical pages before we would run low on KVA.  but apparently in your
 >   environment that is not working.  maybe we do something differently in
 >   a XEN kernel that we need to account for?
 >   
 >   
 >   > 	To reclaim the pages freed directly we need
 >   > 	Patch 2:
 >   > 		force page reclaim
 >   > 	that will perform the reclaim.
 >   
 >   this second patch is fine.
 >   
 >   -Chuck
 >

Prev by Date: Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)
Next by Date: NetBSD Nightly Trouble Ticket Report
Previous by Thread: Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)
Next by Thread: Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)
Indexes:

Home | Main Index | Thread Index | Old Index