tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Fixing excessive shootdowns during FFS read (kern/53124) on x86 - emap, direct map



hi,

sorry for being so late to the party on this thread,
I should have replied long ago.


On Fri, May 11, 2018 at 05:28:18PM +0200, Jaromr Dole?ek wrote:
> I've modified the uvm_bio.c code to use direct map for amd64. Change
> seems to be stable, I've also run 'build.sh tools' to test out the
> system, build suceeded and I didn't observe any problems with
> stability. Test was on 6-core Intel CPU with HT on (i.e. "12" cores
> visible to OS), NVMe disk, cca 2GB file read done via:
> 
> dd if=foo of=/dev/null bs=64k msgfmt=human
> 
> Because of the proper formatting, the speed is now in proper Gibibytes.
> 
> 1. non-cached read: old: 1.7 GiB/s, new: 1.9 GiB/s
> 2. cache read: old: 2.2 GiB/s, new: 5.6 GiB/s
> 
> Seems the 1.9 GiB/s is device limit.
> 
> The patch for this is at:
> http://www.netbsd.org/~jdolecek/uvm_bio_direct.diff

the problem with keeping the pages locked (ie. PG_BUSY) while accessing
the user address space is that it can lead to deadlock if the page
we are accessing via the kernel mapping is the same page that we are
accessing via the user mapping.  if the user space access faults
(eg. because the page has not been accessed via that mapping yet)
then the fault handler will try to lock the same page again.
even if it's not the same page, it is generally not safe for a thread
to wait to lock a page while that thread has locked another page already.

however if you don't lock the object page while accessing user space,
then the object page can change identity out from under you,
corrupting whatever the page is being used for in its new identity.

my plan for this in the longer term was to add a mechanism in UVM
for preventing a page from changing identity, which would be somewhat
similar to the existing loan mechanism except that the page could
still be written to without making a copy of it.  linux and freebsd
(and solaris) each have something like this, though they are all
a bit different in the details.  using this new mechanism, we could
keep the page being accessed via the direct map from changing identity
while still allowing normal page faults to happen on the user mapping.

there are probably less intrusive ways to attack the problem at hand,
but the above approach seemed like the best way to go in the long run.

I was hoping the emap thing would work for the shorter term, but if
that's not practical to make work then we can look for something else.


> During testing I noticed that the read-ahead code slows down the
> cached case quite a lot:
> no read-ahead:
> 1. non-cached read: old: 851 MiB/s, new: 1.1 GiB/s
> 2. cached read: old: 2.3 GiB/s, new: 6.8 GiB/s
> 
> I've implemented a tweak to read-ahead code to skip the full
> read-ahead if last page of the range is already in cache, this
> improved things a lot:
> smarter read-ahead:
> 1. non-cached read: old: 1.7 GiB/s, new: 1.9 GiB/s (no change compared
> to old read-ahead)
> 2. cached read: old: 2.2 GiB/s, new: 6.6 GiB/s
> 
> Patch for this is at:
> http://www.netbsd.org/~jdolecek/uvm_readahead_skip.diff

the concept is good, but would using PGO_NOWAIT work about as well as
adding a new PGO_CACHEONLY flag?  if any pages are not in memory then
the pgo_get call would fail since that would require waiting for them
to be read.  the suggestion from others to just use uvm_pagelookup()
would be fine too.

it would also be nice to put this new logic for checking if an object page
is in memory into a separate function in case it's useful in other contexts
(and to isolate the implementation a bit).


> For the direct map, I've implemented new helper pmap function, which
> in turn calls a callback with the appropriate virtual address. This
> way the arch pmap is more free to choose an alternative
> implementations for this, e.g. without actually having a direct map,
> like sparc64.
> 
> The code right now has some instrumentation for testing, but the
> general idea should be visible.
> 
> Opinions? Particularly, I'm not quite sure if it safe to avoid
> triggering a write fault in ubc_uiomove(), and whether the new
> heuristics for read-ahead is good enough for general case.

is there a specific case about triggering a write fault in ubc_uiomove()
that you are concerned about?  the general rule is that a page being modified
must be marked dirty either as part of the modification (if it is being written
via a regular ref/mod-tracking pmap entry) or after the modifications are
complete (if it is being written via a non ref/mod-tracking mechanism such
as the direct map).

the read-ahead logic in post-UBC netbsd has never been very sophisticated,
I'd be impressed if you managed to make it worse.  :-)

-Chuck


Home | Main Index | Thread Index | Old Index