Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: faults on copy operations

Le 29/04/2020 à 04:44, Chuck Silvers a écrit :
On Sun, Apr 26, 2020 at 09:45:04PM +0200, Maxime Villard wrote:
Le 26/04/2020 à 21:26, Chuck Silvers a écrit :
On Sat, Apr 25, 2020 at 09:50:26AM +0200, Maxime Villard wrote:
Small tactical mitigation I want to (re)introduce.

The copyin and copyout functions (plus their ufetch and ustore variants)
register a specific handler, which causes page faults in kernel mode to
be dismissed as user faults. This way, when userland passes an unmapped
page to copyin for example, the kernel will page-fault but will consider
it as a non-fatal user fault.

The problem I see, is that when discarding such faults, the kernel doesn't
check the VA that faulted, and this hides certain faults that should be
considered as fatal: those occuring on kernel pages, betraying a bug in the
caller of the copy functions.

Suppose there is a crazy syscall that passes a user-controllable length to
copyout (something I've encountered already in the past). An attacker can
do a gigantic copyout and retrieve kernel memory until a page fault occurs
on an unmapped kernel page; the page fault will actually be dismissed, and
the syscall will just error out, meaning that the system remains alive and
the attacker remains alive too. With simple forms of heap re-modelling, and
by repeating this operation, the attacker could manage to dump strictly all
of the kernel memory.

I want to apply this change [1], that causes the handler to panic if the
fault came from a kernel page. This isn't a very excellent mitigation, but
at least, it will expose bogus memory accesses and make it less easy for
severe buffer overflows to go unpunished.

The change is based on KASSERTs I initially added in 2016:
But which I quickly reverted without really understanding what was wrong
with them. In 2018 I realized that CR2 was getting accessed in non-page
faults, which was a bug:
In retrospect I see that my KASSERTs were firing because of this bug. Now
I want to re-introduce these KASSERTs as panics.


[1] https://m00nbsd.net/garbage/x86/copy-trap.diff

Improving the detection of buggy copyin/out()s would be great, however
any copyin/out where the both the kernel and user side are pageable
can legitimately fail on either side.  The examples of this that I can
think of right now are ubc_uiomove() and copyinargs().  Please make sure
that we recover from such legitimate fault failures than treat a failure
in such a situation as a bug.

I am aware of that.

My patch enforces the policy when the fault couldn't be resolved, AND for a
reason different than ENOMEM. The cases you mentioned therefore do not get
classified as fatal faults.


Kernel faults can legitimately fail for more reasons than just ENOMEM,
for example they can fail with EIO if a page cannot be read back
from a storage device.  Other error codes might also be legitimate,
I haven't gone through all of them.

I see. In fact, there is just no way to make this work, because the error
code could be polluted by an underlying driver. Down the drain.


Home | Main Index | Thread Index | Old Index