tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Solving the last piece of the uvm_pageqlock problem



This is a diff against a tree containing the allocator patch I posted
previously:

	http://www.netbsd.org/~ad/2019/pdpol.diff

The idea here is to buffer updates to the page's status (active, inactive,
dequeued) and then sync those to the pdpolicy / pagedaemon queues regularly,
a bit like the way the file system syncer works.  Notes:

- Since uvm_pageqlock was replaced with pg->interlock & a private lock for
  the pdpolicy code, pages can occasionally appear on a pdpolicy queue when
  they shouldn't be considered for pageout & reclaim (if the pagedaemon and
  object owner race), but it's not a problem because the pagedaemon can take
  pg->interlock and determine that a page is wired or in a state of flux or
  whatever, and so should be ignored because it'll be gone from the queues
  soon.

- This patch takes it a little further.  The pdpolicy code gets a dedicated
  TAILQ_ENTRY in struct vm_page so it doesn't need to share with the page
  allocator.  A page can be PG_FREE and still on a pdpolicy queue (but not
  for long).  We set an intended state for the page on pg->pqflags using
  atomics (active, inactive, dequeued) and then those pages are queued in a
  per-CPU buffer for their status updates to be purged and made real in the
  pdpol code's global state at some point in the near future.

- The pagedaemon can also see those updates in real time by inspecting
  pg->pqflags and make real the page's status.  So basically what I'm doing
  is batching the updates, trying to not let the global state fall too far
  behind, and always give the pagedaemon enough information to know the true
  picture for individual pages when it does its labourious scan of the
  queues, even if viewed globally the queues are a little bit behind.

This seems to work really well, I think because a page can have multiple
state transitions while it's in a queue waiting for its intended status
change to be purged and made global.

Shortly before composing this e-mail it occurred to me that FreeBSD may do
something similar but to be honest I didn't dig into their code.

I need to tweak this to allocate a smaller buffer for uniprocessor systems
and maybe consider using prefetching instructions when purging, and want to
re-run the tests because I changed a couple of things but I'm basically
happy with it.

Results on my kernel build test:

72.66 real      1653.86 user       593.19 sys   new allocator
71.26 real      1671.13 user       502.94 sys   new allocator + pdpol.diff

Lock contention before and after:

Total%  Count   Time/ms          Lock                       Caller
------ ------- --------- ---------------------- ------------------------------
 28.86 44056935  77553.77 pdpol_state            <all>
 15.62 22177251  41978.93 pdpol_state            uvmpdpol_pageactivate+36
 13.12 21656129  35251.99 pdpol_state            uvmpdpol_pagedequeue+18
  0.12  223482    322.77 pdpol_state            uvmpdpol_pagedeactivate+18
  0.00      73      0.07 pdpol_state            uvmpdpol_pageenqueue+18

Total%  Count   Time/ms          Lock                       Caller
------ ------- --------- ---------------------- ------------------------------
  0.23   11301    362.35 pdpol_state            uvmpdpol_pageintent_set+b9

Andrew


Home | Main Index | Thread Index | Old Index