NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

re: kern/57737: netbsd-10 panics on current Epyc CPU



The following reply was made to PR kern/57737; it has been noted by GNATS.

From: matthew green <mrg%eterna23.net@localhost>
To: gnats-bugs%netbsd.org@localhost, hf%spg.tu-darmstadt.de@localhost
Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
    netbsd-bugs%netbsd.org@localhost
Subject: re: kern/57737: netbsd-10 panics on current Epyc CPU
Date: Wed, 13 Dec 2023 19:20:51 +1100

 > 	netbsd-10 panics early on current multi-core Ryzen cpus.
 >
 > 	See the boot log for an Epyc 9554P cpu on a Gigabyte R263-Z70
 > 	board at
 >
 > 	<ftp://oak.causeuse.org/pub/NetBSD/netbsd-10-GA_R263-Z70_epyc9554p.boot=
 log.gz>
 >
 > 	and the related discussion on current-users, where Martin
 > 	suggested
 >
 > 	"That sounds like an fpu xsave size issue Taylor looked at
 > 	recently (but it is not fixed)."
 
 there are multiple issues with this system, ouch.
 
 no CPUs attach in this dmesg.  cpu0 remains half-attached.  this
 is some problem with the MADT parser i guess (i don't know this
 very well.)
 
 [   1.0000040] bogus MADT X2APIC entry (id =3D 0x0)
 [   1.0000040] bogus MADT X2APIC entry (id =3D 0x2)
 ...
 [   1.0000040] bogus MADT X2APIC entry (id =3D 0x7e)
 ...
 [   1.0000040] bogus MADT X2APIC entry (id =3D 0x5e)
 [   1.0000040] bogus MADT X2APIC entry (id =3D 0x1)
 ...
 [   1.0000040] bogus MADT X2APIC entry (id =3D 0x7f)
 ...
 [   1.0000040] bogus MADT X2APIC entry (id =3D 0x5f)
  =
 
 ie, 128 cpu threads fail to attach (which matches the specs for
 epyc 9554p - 64c/128t.)  some devices still attach things to
 cpu0 for affinity, even though it's in UP mode:
 
 [   1.0525126] nvme0: for io queue 1 interrupting at msix0 vec 1 affinity =
 to cpu0
 ... plus nvme1/2/3.
 
 some of the dmesg items seem to have 'nul' chars in them:
 
 [   1.0000040] ACPI: XSDT 0x00000000A4E13728 000^@0DC (v01 GBT   BTUACPI 0=
 3042021 AMI  01000013)
 
 [   1.0525126] AMD 19h/1xh RCEC (Root Complex^@ Event Collectosystem) at p=
 ci0 dev 0 function 3 not configured
 
 and then the final crash as reported in this PR:
 
 [   1.0525126] fatal privileged instruction fault in supervisor mode
 [   1.0525126] trap type 0 code 0 rip 0xffffffff8023c24e cs 0x8 rflags 0x1=
 0256 cr2 0 ilevel 0x6 rsp 0xffffffff81d3bab8
 [   1.0525126] curlwp 0xffffffff8188ac00 pid 0.0 lowest kstack 0xffffffff8=
 1d362c0
 kernel: privileged instruction fault trap, code=3D0
 Stopped in pid 0.0 (system) at  netbsd:xrstor+0xa:      fxsavel
 xrstor() at netbsd:xrstor+0xa
 aes_selftest() at netbsd:aes_selftest+0x26
 aes_modcmd() at netbsd:aes_modcmd+0xe9
 module_do_builtin() at netbsd:module_do_builtin+0x142
 module_do_builtin() at netbsd:module_do_builtin+0xfa
 module_init_class() at netbsd:module_init_class+0x142
 
 
 .mrg.
 


Home | Main Index | Thread Index | Old Index