Hello,
I'd like to gather some feedback on how to best tackle kern/53124.
The problem there is that FFS triggers a pathologic case. I/O transfer maps and then unmaps each block into kernel pmap, so that the data could be copied into user memory. This triggers TLB shootdown IPIs for each FS block, sent to all CPUs which happen to be idle, or otherwise running on kernel pmap. On systems with many idle CPUs these TLB shootdowns cause a lot of synchronization overhead.
I see three possible ways how to fix this particular case:
1. make it possible to lazy invalidate TLB also for kernel pmap, i.e. make pmap_deactivate()/pmap_reactivate() work with kernel pmap - this avoids the TLB shootdown IPIs to be sent to idle CPUs
2. make idle lwp run in it's own pmap and address space - variant to #1, avoids changing invariant about kernel map being always loaded
3. change UVM code to not do this mapping via kernel map, so pmap_update() and the IPIs are not triggered in first place
I reckon #2 would add a lot of overhead into the idle routine, it would require at least an address space switch. This address space switch might be fairly cheap on architectures with address space ID (avoiding TLB flushes), but still much more than what the idle entry/exit does now.
Variants of problems with #3 was discussed on and off during the years as I recall, but there is no resolution yet, and I'm not aware of anyone actively working on this. I understand this would be Hard, with nobody currently having the time and courage.
This leaves #1 as a short-term practical solution. Anyone foresees any particular problems with this, does it have a chance to fly? Any other idea?
Jaromir