NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/56535: Memory corruption in multi-threaded Go parent process following fork() on AMD CPUs



The following reply was made to PR kern/56535; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: chs%NetBSD.org@localhost
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/56535: Memory corruption in multi-threaded Go parent process following fork() on AMD CPUs
Date: Mon, 14 Aug 2023 05:47:12 +0000

 > Module Name:	src
 > Committed By:	chs
 > Date:		Sun Aug 13 23:06:07 UTC 2023
 >
 > Modified Files:
 > 	src/sys/uvm: uvm_fault.c
 >
 > Log Message:
 > uvm: prevent TLB invalidation races during COW resolution
 > [...]
 > This fixes PR 56535 as well as the netbsd versions of problems
 > described in various bug trackers:
 
 I don't think this is accurate.  I have no theory for how this could
 actually fix PR 56535, and I don't think anyone else has presented
 such a theory either.  This change avoids a race between memory access
 and TLB invalidation IPI, as in the following code:
 
         struct { long x; char pad[512]; int y; } *f = mmap(...);
 
         /* thread A */
         for (;;)
                 f->y = 1;  /* trigger cow */
 
         /* thread B */
         for (long x = 0;; x++) {
                 lock();
                 f->x = x; /* write to new page after cow tlb flush */
                 unlock();
         }
 
         /* thread C */
         for (;;) {
                 lock();
                 long x0 = f->x; /* read from old page before tlb flush */
                 /* XXX tlb flush may happen asynchronously here */
                 long x1 = f->x; /* read from new page after tlb flush */
                 unlock();
                 assert(x0 == x1);
         }
 
         /* main thread */
         for (;;) {
                 pid_t pid = fork();
                 if (pid == 0) _exit(0);
                 waitpid(pid, NULL, 0);
         }
 
 When thread A triggers copy-on-write, the kernel allocates a new
 backing page for f, copies the old page content to the new page, and
 then updates the page tables to point at the new page with write
 permission.
 
 However, the new page may be exposed writably to thread B while thread
 C still has TLB entries pointing at the old page.  So the following
 sequence of events may happen:
 
         thread B                thread C
         --------                --------
         *interrupt* tlb flush
         lock();
         f->x = x; /* write to new page */
         unlock();
                                 lock();
                                 x0 = f->x; /* read from old page */
                                 *interrupt* tlb flush
                                 x1 = f->x; /* read from new page */
                                 unlock();
                                 assert(x0 == x1);
                                 *crash*
 
 This change addresses the problem by making the old page nonwritable
 before copying its content to the new page and publishing the new
 page; that way, in this scenario, threads B and C will both fault and
 wait for the cow handling to complete.
 
 (Actually, I'm a little confused: this change only changes _one
 virtual mapping_ of the old page, but there can be other virtual
 mappings of the same page.  Why isn't it necessary to use
 pmap_page_remove here?)
 
 But by instrumenting the reproducer here with lfence;rdtsc around
 
         page->b = 2;
         uint64_t pb = page->b;
 
 we have found latencies as small as 10ns, without migration across
 cores which might have desynchronized TSCs, and yet pb is shown to be
 102, not 2.  It is physically impossible for the TLB invalidation IPI
 handler to be triggered between the two reads of page->b within a span
 of 10ns.
 
 My best hypothesis is still that this is an AMD CPU bug, and the fix
 happens to paper over the obscure microarchitectural circumstances
 that lead to the bug.
 


Home | Main Index | Thread Index | Old Index