tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Memory corruption after fork, only on AMD CPUs



Michael Pratt <mpratt%google.com@localhost> writes:

> On Tue, Dec 14, 2021 at 1:06 PM Michael Pratt <mpratt%google.com@localhost> wrote:
>>
>> [This is a reply to
>> https://mail-index.netbsd.org/tech-kern/2021/12/01/msg027830.html. I
>> just joined the mailing list and can't seem to find the metadata
>> required for a proper reply. Apologies.]
>>
>> I filed https://gnats.netbsd.org/56535 for this a while ago, which has
>> an even simpler reproducer: a direct fork() call with a child that
>> immediately exits sometimes causes memory corruption in the parent
>> process.
>>
>> We've kept looking since filing https://gnats.netbsd.org/56535 but
>> haven't had luck on further simplification. No C reproducer yet,
>> unfortunately. (No crashes if the Go parent process is single-threaded
>> either.)
>
> I spoke too soon here, we managed to get a reproducer in C today,
> which I've posted at
> https://github.com/golang/go/issues/34988#issuecomment-994115345.

I don't have a big collection of AMD systems, but I do have a couple.
Everything here is Xen, however and nothing is really very recent either
from the hardware POV or the OS in a lot of cases...

Ryzen 3 2200G - 2 vcpu DOMU running 9.0_STABLE and a 1 processor DOM0
running 8.99.25 could not reproduce this running the code from the DOMU
or DOM0.

Athlon 64 X2 5600+ - 1 vcpu DOMU running 9.99.74 and a 1 processor DOM0
running 8.0_STABLE could not reproduce this running the code from the
DOMU or DOM0.


As a control test a 2 vcpu 9.0_STABLE DOMU running on an Intel system
could also not reproduce this.

Since this test is a bit brutal, I didn't let this run too long as the
systems are doing other stuff, but it was several minutes and no fails
reported.  Are of the systems are NetBSD/amd64.

>> This feels like a bug in memory management somewhere (TLB invalidation
>> issue, bug in copy-on-write?). Fundamentally, we have the parent
>> process getting corrupt memory after calling fork with an
>> (effectively) no-op child. That just shouldn't happen.
>>
>> I think we need someone familiar with NetBSD memory management
>> internals to help take a look. Otherwise I'm afraid we won't figure it
>> out and will have to declare that Go doesn't work on NetBSD on AMD
>> CPUs.
>>
>> gdt: that does sound like a different issue to me. It may be worth
>> filing a bug at https://github.com/golang/go/issues with the crash
>> details.
>>
>> Thanks,
>> Michael



-- 
Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org


Home | Main Index | Thread Index | Old Index