tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Fixing excessive shootdowns during FFS read (kern/53124) on x86



Le 25/03/2018 à 13:48, Michael van Elst a écrit :
jaromir.dolecek%gmail.com@localhost (=?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?=) writes:

The problem there is that FFS triggers a pathologic case. I/O transfer maps
and then unmaps each block into kernel pmap, so that the data could be
copied into user memory.

For some reason it's not per block. There is a mechanism that moves
an 8kbyte window, independent of the block size. You can easily change
the window size (I'm currently experimenting with 128kbyte) by building
a kernel with e.g. options UBC_WINSHIFT=17.

Independent of how the scaling problem will be handled, I suggest to
increase that window to at least MAXPHYS.

Yes.

3. change UVM code to not do this mapping via kernel map, so pmap_update()
and the IPIs are not triggered in first place

What I currently don't understand is that we only see one TLB shootdown
per window and that happens when it is _unmapped_. Mapping the pages
(with the exception of overwriting) is done by the regular fault handler
and apparently this does not flush the TLB.

IIRC that's normal. Each time we map a page, we know it wasn't mapped before,
so no need to flush the TLB, since we know the page isn't cached. However,
when unmapping, we do need to flush the TLB. It's an optimization; if we map a
page that we end up not touching, it won't have "used" flag, and we can unmap
itvwithout flushing the TLB. So it saves us one tlb shootdown.

By the way I saw your answer to the PR (by looking at mail-index.netbsd.org,
because you didn't CC me in the mail).

I think you are right - I was more thinking about a problem in pmap, but the
issue may just be that there is a latency in the cpu wakeup, as you said.

The main cpu (on which your program executes) sends IPIs to remote cpus, these
cpus were idle and entered the halted state, and they take some time to wake
up and process the IPI. And that's time the main cpu spends waiting.

As you said yourself it's not trivial to fix this, because the wakeup path
can be whatever interrupt entry point, and if we were to add some code there
it would slow down interrupt entry in general.

We could perhaps have something to instruct the main cpu not to wait for cpus
that are idle, only wait for those that aren't.

Maxime


Home | Main Index | Thread Index | Old Index