On Tue, 26 Oct 2021, Martin Husemann wrote:
On Tue, Oct 26, 2021 at 01:30:31PM +0200, BALATON Zoltan wrote:
The current 9.2 version is not affected and still boots before and after
but
older versions which booted without this change now get strange errors as
reported in the above OpenBIOS list thread. Any idea what may be wrong or
how to
debug it further?
The memory disk root file system, which for this kernel is embedded
into the kernel data section, is corrupt.
So either something goes horribly wrong when reading the kernel from the
boot medium (ofwboot.xcf does that using OF callbacks, so very unlikely),
or the new OF node makes the kernel code corrupt some memory post-loading.
I don't remember related fixes (nor see any in the CHANGES-9.2 file), so
maybe
it is just pure luck that 9.2 boots (because the corruption hits in a
different
location).
I also saw errors about claim and out of memory with some other versions so
could be OpenBIOS allocates memory differently than Apple's OpenFirmware and
this collides with the kernel data somehow? I've tried enabling OpenBIOS
client interface debug and got this when booting NetBSD-8.0-macppc.iso:
claim(0x00000000, 4096, 4096) = 0x1feff000
claim(0x00000000, 12288, 4096) = 0x1fefc000
claim(0x00000000, 20480, 4096) = 0x1fef7000
claim(0x00000000, 8192, 4096) = 0x1fef5000
claim(0x00000000, 36864, 4096) = 0x1feec000
claim(0x00000000, 4096, 4096) = 0x1feeb000
6075428+127808>> claim(0x00000000, 4096, 4096) = 0x1feea000
But I think these are made by ofwboot.xcf so don't know what happens after
that.
Would it help if we print the memory range where the MD image lives
early and you could use qemu tracing to catch any writes to that range
afterwards? The range is probably constant for consecutive boots with
the same args and the MD file system should still be read-only at the
point where it crashes for you.
I don't know how can I trace writes but I can dump memory and look at the
result. The netbsd-GENERIC_MD (from that 8,0 iso) ELF file says:
Idx Name Size VMA LMA File off Algn
11 .data 002c668c 00a1ad80 00a1ad80 0091ae00 2**6
But I could not find it at the load address, I've found most of it in memory
at 0x4383C0 but there are ranges in it that are different comparing data from
the ELF file and in memory at (decimal) offsets:
10-29
35-37
44-45
62-64
67-69
79-81
83-85
87-89
91-93
101
378-385
389
404-405
408-409
412-413
583-585
619-621
627-629
635-637
639-641
643-645
657
then they are the same until
2227271-2227373
2622103-
The differences don't look something I recognise but near the end the memory
is mostly zeroed with some strings from device tree while the .data segment
still has data there so maybe something is using the memory overlapping the
end of the loaded MD image? (In the .data part I see copyright messages by
Express Logic Inc. ThreadX THUMB-F/ARM Version G3.0f.3.0b where memory has
already 0s and some stings from device tree.) Where should the MD image be
loaded and what and how decides the location of it in memory?
The intereting thing is that it only happens if I add another /pci node. If I
name it differently then it boots, if I add only a /pci node with device-type
pci and no other info it also breaks so looks like processing these /pci
nodes make it use memory that overwrites the ram disk somehow?
Here are the logs from the boot which has some memory addresses but I don't
know how to understand them:
s>> et_property: NULL phandle
=============================================================
OpenBIOS 1.1 [Oct 16 2021 13:31]
Configuration device id QEMU version 1 machine id 1
CPUs: 1
Memory: 512M
UUID: 00000000-0000-0000-0000-000000000000
CPU type PowerPC,G4
milliseconds isn't unique.
Welcome to OpenBIOS v1.1 built on Oct 16 2021 13:31
Trying cd:,\ofwboot.xcf...
switching to new context:
Invalid form of CMPI at 0x00e00038, L = 1
NetBSD/macppc OpenFirmware Boot, Revision 1.12 (Tue Jul 17 14:59:51 UTC
2018)
open /netbsd: No such file or directory
open /netbsd.gz: No such file or directory
6075428+127808=0x5eab28
start=0x100000
mem region 0 start=0 size=20000000
avail region 0 start=0x4000 size=0x3ffc000
avail region 1 start=0x4800000 size=0x1b458000
avail region 2 start=0x1fe10000 size=0xda000
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
2018 The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 8.0 (INSTALL) #0: Tue Jul 17 14:59:51 UTC 2018
mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/macppc/compile/INSTALL
total memory = 512 MB
avail memory = 482 MB
found openpic PIC at 80040000
OpenPIC Version 1.2: Supports 1 CPUs and 64 interrupt sources.
bootpath: /pci@f2000000/mac-io@c/ata-3@21000/cdrom@0:0/netbsd.macppc
If this loads 0x5eab28 bytes from 0x100000 (although 6075428+127808 is 5EA764
when I calculate it) then that lasts until 0x6eab28 then the end of it would
be around 0x4383c0 but that's less than the data segment size as
6EAB28-2C668C is 42449C (and there are other segments in the file too so it's
not loaded linearly anyway).
Regards,
BALATON Zoltan