Port-arm archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: aarch64{,eb} fails to boot on RPI3 (Re: 2021-06-18-netbsd-raspi-earmv6hf.img (Re: Raspberry Pi update please.))

On 21/06/2021 14:53, Jason Thorpe wrote:

On Jun 21, 2021, at 5:14 AM, Shin'ichiro TAYA <taya%ba2.so-net.ne.jp@localhost> wrote:

Increasing PERCPU_IMPORT_SIZE may solve this issue.

That seems like the wrong way to fix the issue, though, and won’t actually fix anything (just maybe paper over the problem).  The root of the problem is that the curlwp of the newly-hatched CPU not marked as LSONPROC, and it’s that way because it’s the idle lwp for that CPU.

I was just looking at arch/aarch64/aarch64/cpu.c:cpu_hatch(), and it looks like there are multiple opportunities for this issue to fire.  For example, the first thing that happens is to acquire cpu_hatch_lock, which could also block, and thus trigger the same problem (I guess we’re just lucky that it basically never blocks? :-)  fpu_attach() also calls evcnt_attach_dynamic(), which acquires a mutex, and thus could theoretically also block (although the nature of the adaptive mutex code would make that extremely unlikely in this particular scenario).

Anyway, the aarch64 cpu_hatch() does quite a lot of stuff, and obviously has ample opportunities to block.  If you look at e.g. the alpha cpu_hatch(), it does nothing that would block (there are spin lock acquires and busy loops to be sure, but nothing that would sleep), so it does not suffer from this problem.

I was thinking that it might be sufficient to mark the CPU’s idle lwp as LSONPROC while the initialization is done and then mark it again as LSIDL (which I think is the state it’s in at this point, but I’d have to go double check) when done, since the next thing that happens when cpu_hatch() is over is to jump into the idle loop…. BUT…. I don’t think that’s quite correct, either, because what will happen if you do actually block? The idle lwp is doing something other than idling!

Most/some of the problem here is that cpus start "late" on arm. This is
mostly because of autoconf interrupt controllers, I think.

Added to this is that mp_online seems wrong to me

    825 	mp_online = true;
    826 #if defined(MULTIPROCESSOR)
    827 	cpu_boot_secondary_processors();
    828 #endif

I'm not sure why mp_online is set to true before
cpu_boot_secondary_processors. So, I made this...


which might be making the problem worse.

I was going to try and find the time to make cpus start earlier (right
after interrupt controllers attach, I guess), but not managed to do that


Home | Main Index | Thread Index | Old Index