NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-arm/56459: Rasberry Pi 3B+ boot failure and solution



>Number:         56459
>Category:       port-arm
>Synopsis:       Rasberry Pi 3B+ boot failure and solution
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-arm-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Oct 19 19:05:00 +0000 2021
>Originator:     Rory Bolt
>Release:        9.99.91
>Organization:
Kioxia
>Environment:
NetBSD arm64 9.99.91 NetBSD 9.99.91 (GENERIC64) 
>Description:
The Raspberry Pi 3B+ platform has been broken since 9.99.88 with various boot problems. The latest is a panic with the following backtrace (sorry, this is from a picture of the panic and I did not type in all the addresses/details):

panic: kernel diagnostic assertion "l->l_stat == LSONPROC" failed in kern_sleepq.c

vpanic()
kern_assert()
sleepq_enqueue()
cv_enter()
cv_wait()
xc_wait()
pic_establish_intr()
bcm2836mp_intr_init()
arm_fdt_cpu_hatch()
cpu_hatch()
cpu_mpstart()

By adding debugging info I was able to verify that l->l_stat was LSIDL, we were trying to sleep on the idle lwp.

The fundamental problem is the same as the earlier ones Rin fixed: when the secondary processors are initializing on the idle lwp, they cannot suspend/sleep. As has been previously mentioned on the port-arm mailing list, there are MANY opportunities for locking in the processor initialization path - and it would be great if this were reworked. 

The specific problem here is that the "cold" flag has been cleared before pic_establish_intr() was called, and as a result xc_broadcast() and xc_wait() are being executed instead of just pic_unblock_irqs(). 

>How-To-Repeat:
Attempt to boot any of the daily builds since June 2021 on a Raspberry Pi 3B+.
>Fix:
In this case the fix is easy, although as mentioned in the description I see many other opportunities to enter sleepq_enqueue() during the arm secondary processor initialization path.

The solution to the current problem is to move the "cold = 0" statement in sys/kern/init_main.c from its current location in configure2() at line 808 until AFTER the call to cpu_boot_secondary_processors() at line 827. I inserted it immediately prior to the "mp_ready" = true line.

By doing this I can successfully boot the latest development kernel on my Raspberry Pi 3B+



Home | Main Index | Thread Index | Old Index