tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Fixing excessive shootdowns during FFS read (kern/53124) on x86 - emap, direct map



I've modified the uvm_bio.c code to use direct map for amd64. Change
seems to be stable, I've also run 'build.sh tools' to test out the
system, build suceeded and I didn't observe any problems with
stability. Test was on 6-core Intel CPU with HT on (i.e. "12" cores
visible to OS), NVMe disk, cca 2GB file read done via:

dd if=foo of=/dev/null bs=64k msgfmt=human

Because of the proper formatting, the speed is now in proper Gibibytes.

1. non-cached read: old: 1.7 GiB/s, new: 1.9 GiB/s
2. cache read: old: 2.2 GiB/s, new: 5.6 GiB/s

Seems the 1.9 GiB/s is device limit.

The patch for this is at:
http://www.netbsd.org/~jdolecek/uvm_bio_direct.diff

During testing I noticed that the read-ahead code slows down the
cached case quite a lot:
no read-ahead:
1. non-cached read: old: 851 MiB/s, new: 1.1 GiB/s
2. cached read: old: 2.3 GiB/s, new: 6.8 GiB/s

I've implemented a tweak to read-ahead code to skip the full
read-ahead if last page of the range is already in cache, this
improved things a lot:
smarter read-ahead:
1. non-cached read: old: 1.7 GiB/s, new: 1.9 GiB/s (no change compared
to old read-ahead)
2. cached read: old: 2.2 GiB/s, new: 6.6 GiB/s

Patch for this is at:
http://www.netbsd.org/~jdolecek/uvm_readahead_skip.diff

For the direct map, I've implemented new helper pmap function, which
in turn calls a callback with the appropriate virtual address. This
way the arch pmap is more free to choose an alternative
implementations for this, e.g. without actually having a direct map,
like sparc64.

The code right now has some instrumentation for testing, but the
general idea should be visible.

Opinions? Particularly, I'm not quite sure if it safe to avoid
triggering a write fault in ubc_uiomove(), and whether the new
heuristics for read-ahead is good enough for general case.

Jaromir

2018-04-19 22:39 GMT+02:00 Jaromír Doleček <jaromir.dolecek%gmail.com@localhost>:
> I've finally got my test rig setup, so was able to check the
> performance difference when using emap.
>
> Good news there is significant speedupon NVMe device, without
> observing any bad side effects so far.
>
> I've setup a test file of 2G, smaller than RAM to fit all in cache, test was:
> dd if=foo of=/dev/null bs=64k
>
> First read (not-cached): old: 1.7 GB/s, new: 2.1 GB/s
> Cached read: old: 2.2 GB/s, new: 3.1 GB/s
>
> Reads from raw device were the same in both cases, around 1.7 GB/s
> which is a bit bizzarde.
>
> If we want to modify the uvm_bio.c code to optionally use direct map,
> there is the problem with the fault technique it uses for read I/O.
> The code doesn't actually enter the pages into the KVA window, it lets
> uimove() to trigger the faults. Is there some advantage to this, why
> is this better than just mapping those 1-2 pages before calling
> uiomove()?
>
> Jaromir
>
> 2018-04-02 21:28 GMT+02:00 Jaromír Doleček <jaromir.dolecek%gmail.com@localhost>:
>> 2018-03-31 13:42 GMT+02:00 Jaromír Doleček <jaromir.dolecek%gmail.com@localhost>:
>>> 2018-03-25 17:27 GMT+02:00 Joerg Sonnenberger <joerg%bec.de@localhost>:
>>>> Yeah, that's what ephemeral mappings where supposed to be for. The other
>>>> question is whether we can't just use the direct map for this on amd64
>>>> and similar platforms?
>>>
>>> Right, we could/should use emap. I haven't realized emap is actually already
>>> implemented. It's currently used for pipe for the loan/"direct" write.
>>>
>>> I don't know anything about emap thought. Are there any known issues,
>>> do you reckon it's ready to be used for general I/O handling?
>>
>> Okay, so I've hacked to gether a patch to switch uvm_bio.c to ephemeral
>> mapping:
>>
>> http://www.netbsd.org/~jdolecek/uvm_bio_emap.diff
>>
>> Seems to boot, no idea what else it will break.
>>
>> Looking at the state of usage though, the emap is only used for disabled code
>> path for sys_pipe and nowhere else. That code had several on-and-off passes
>> for being enabled in 2009, and no further use since then. Doesn't give too much
>> confidence.
>>
>> The only port actually having optimization for emap is x86. Since amd64
>> is also the only one supporting direct map, we are really at liberty to pick
>> either one. I'd lean towards direct map, since that doesn't require
>> adding/removing any mapping in pmap_kernel() at all. From looking on the code,
>> I gather direct map is quite easy to implement for other archs like
>> sparc64. I'd say
>> significantly easier than adding the necessary emap hooks into MD pmaps.
>>
>> Jaromir


Home | Main Index | Thread Index | Old Index