NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/56535: Memory corruption in multi-threaded Go parent process following fork() on AMD CPUs
The following reply was made to PR kern/56535; it has been noted by GNATS.
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: chs%NetBSD.org@localhost
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/56535: Memory corruption in multi-threaded Go parent process following fork() on AMD CPUs
Date: Mon, 14 Aug 2023 05:47:12 +0000
> Module Name: src
> Committed By: chs
> Date: Sun Aug 13 23:06:07 UTC 2023
>
> Modified Files:
> src/sys/uvm: uvm_fault.c
>
> Log Message:
> uvm: prevent TLB invalidation races during COW resolution
> [...]
> This fixes PR 56535 as well as the netbsd versions of problems
> described in various bug trackers:
I don't think this is accurate. I have no theory for how this could
actually fix PR 56535, and I don't think anyone else has presented
such a theory either. This change avoids a race between memory access
and TLB invalidation IPI, as in the following code:
struct { long x; char pad[512]; int y; } *f = mmap(...);
/* thread A */
for (;;)
f->y = 1; /* trigger cow */
/* thread B */
for (long x = 0;; x++) {
lock();
f->x = x; /* write to new page after cow tlb flush */
unlock();
}
/* thread C */
for (;;) {
lock();
long x0 = f->x; /* read from old page before tlb flush */
/* XXX tlb flush may happen asynchronously here */
long x1 = f->x; /* read from new page after tlb flush */
unlock();
assert(x0 == x1);
}
/* main thread */
for (;;) {
pid_t pid = fork();
if (pid == 0) _exit(0);
waitpid(pid, NULL, 0);
}
When thread A triggers copy-on-write, the kernel allocates a new
backing page for f, copies the old page content to the new page, and
then updates the page tables to point at the new page with write
permission.
However, the new page may be exposed writably to thread B while thread
C still has TLB entries pointing at the old page. So the following
sequence of events may happen:
thread B thread C
-------- --------
*interrupt* tlb flush
lock();
f->x = x; /* write to new page */
unlock();
lock();
x0 = f->x; /* read from old page */
*interrupt* tlb flush
x1 = f->x; /* read from new page */
unlock();
assert(x0 == x1);
*crash*
This change addresses the problem by making the old page nonwritable
before copying its content to the new page and publishing the new
page; that way, in this scenario, threads B and C will both fault and
wait for the cow handling to complete.
(Actually, I'm a little confused: this change only changes _one
virtual mapping_ of the old page, but there can be other virtual
mappings of the same page. Why isn't it necessary to use
pmap_page_remove here?)
But by instrumenting the reproducer here with lfence;rdtsc around
page->b = 2;
uint64_t pb = page->b;
we have found latencies as small as 10ns, without migration across
cores which might have desynchronized TSCs, and yet pb is shown to be
102, not 2. It is physically impossible for the TLB invalidation IPI
handler to be triggered between the two reads of page->b within a span
of 10ns.
My best hypothesis is still that this is an AMD CPU bug, and the fix
happens to paper over the obscure microarchitectural circumstances
that lead to the bug.
Home |
Main Index |
Thread Index |
Old Index