On Jun 21, 2021, at 5:14 AM, Shin'ichiro TAYA <taya%ba2.so-net.ne.jp@localhost> wrote:
Increasing PERCPU_IMPORT_SIZE may solve this issue.
That seems like the wrong way to fix the issue, though, and won’t actually fix anything (just maybe paper over the problem). The root of the problem is that the curlwp of the newly-hatched CPU not marked as LSONPROC, and it’s that way because it’s the idle lwp for that CPU.
I was just looking at arch/aarch64/aarch64/cpu.c:cpu_hatch(), and it looks like there are multiple opportunities for this issue to fire. For example, the first thing that happens is to acquire cpu_hatch_lock, which could also block, and thus trigger the same problem (I guess we’re just lucky that it basically never blocks? :-) fpu_attach() also calls evcnt_attach_dynamic(), which acquires a mutex, and thus could theoretically also block (although the nature of the adaptive mutex code would make that extremely unlikely in this particular scenario).
Anyway, the aarch64 cpu_hatch() does quite a lot of stuff, and obviously has ample opportunities to block. If you look at e.g. the alpha cpu_hatch(), it does nothing that would block (there are spin lock acquires and busy loops to be sure, but nothing that would sleep), so it does not suffer from this problem.
I was thinking that it might be sufficient to mark the CPU’s idle lwp as LSONPROC while the initialization is done and then mark it again as LSIDL (which I think is the state it’s in at this point, but I’d have to go double check) when done, since the next thing that happens when cpu_hatch() is over is to jump into the idle loop…. BUT…. I don’t think that’s quite correct, either, because what will happen if you do actually block? The idle lwp is doing something other than idling!