Port-mac68k archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Stability of netbsd-10 on real hardware?



On 04/03/2023 00:34, Paul Ripke wrote:

On Wed, Mar 01, 2023 at 06:40:11AM +0000, Mark Cave-Ayland wrote:
On 26/02/2023 22:56, Paul Ripke wrote:

I've managed to get netbsd-10 installed running under qemu-system-m68k,
which emulates a Apple Macintosh Quadra 800. Building a few random things
and running some random stuff, I've tripped over a bunch of little stability
issues, and I'm wondering if these are as a result of the emulation, netbsd,
or perhaps compilers. MacOS 8.1 seems stable, although I'd imagine that
netbsd stresses the emulation far more. I've seen internet breadcrumbs
saying that Linux is also stable.

Can I ask specifically which branch you are currently using? If you are
using a branch with "upstream" in the name then those branches are
constantly being rebased onto QEMU git master for testing with the aim of
submitting upstream.

I've just pushed the latest version of the patches to
https://github.com/mcayland/qemu/tree/q800.upstream3 if you can confirm
whether the issues still exist there.

Yes, that's the branch I've been using. Just updated and rebuilt, and...
crash on first boot:

/home/stix/src/github/q800-upstream3/build/qemu-system-m68k \
         -M q800 -cpu m68040 -m 256 -bios Quadra800.rom \
         -rtc base=localtime \
         -g 1152x870x8 \
         -boot d \
         -drive file=pram-macos.img,format=raw,if=mtd \
         -device scsi-hd,scsi-id=0,drive=hd0,vendor="SEAGATE",product="ST225N",ver="1.0" \
         -drive id=hd0,file=MacHD.img,media=disk,format=raw,if=none \
         -device scsi-hd,scsi-id=1,drive=hd1,vendor="SEAGATE",product="ST225N",ver="1.0" \
         -drive id=hd1,file=netbsd-10.img,media=disk,format=raw,if=none \
         -device scsi-cd,scsi-id=3,drive=cd1,vendor="MATSHITA",product="CD-ROM CR-8005",ver="1.0k" \
         -drive id=cd1,file=MacOS_81.toast,media=cdrom,if=none \
         -nic tap,model=dp83932,ifname=tap3,script=no,downscript=no,mac=aa:00:04:4d:52:a5 \
         -serial mon:stdio

A couple of things I spotted here:

- Firstly a real Q800 can only use a maximum of 132MB (which is 128MB for QEMU which requires a power of 2 amount of RAM

- There is no need to supply the SCSI vendor/product/version information on the command line since this is now done automatically by newer versions of the patchset

In theory the extra RAM shouldn't make any difference, unless something is also playing with the djMEMC controller. Can you try the updated command line below instead?


 /home/stix/src/github/q800-upstream3/build/qemu-system-m68k \
          -M q800 -cpu m68040 -m 256 -bios Quadra800.rom \
          -rtc base=localtime \
          -g 1152x870x8 \
          -boot d \
          -drive file=pram-macos.img,format=raw,if=mtd \
          -device scsi-hd,scsi-id=0,drive=hd0 \
          -drive id=hd0,file=MacHD.img,media=disk,format=raw,if=none \
          -device scsi-hd,scsi-id=1,drive=hd1 \
          -drive id=hd1,file=netbsd-10.img,media=disk,format=raw,if=none \
          -device scsi-cd,scsi-id=3,drive=cd1 \
          -drive id=cd1,file=MacOS_81.toast,media=cdrom,if=none \
-nic tap,model=dp83932,ifname=tap3,script=no,downscript=no,mac=aa:00:04:4d:52:a5 \
          -serial mon:stdio


Also note that QEMU forces the first 3 bytes of the MAC address to 08:00:07 (seen in the output below) as otherwise MacOS refuses to recognise the NIC.

Loaded Quadra800ROM.elf symbols!
[   1.0000000] Loaded initial symtab at 0x473594, strtab at 0x4d6ab4, # entries 24481
[   1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
[   1.0000000]     2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013,
[   1.0000000]     2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023
[   1.0000000]     The NetBSD Foundation, Inc.  All rights reserved.
[   1.0000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[   1.0000000]     The Regents of the University of California.  All rights reserved.

[   1.0000000] NetBSD 10.0_BETA (QEMUM68K) #3: Sun Feb 26 12:32:43 AEDT 2023
[   1.0000000]  stix@slave:/home/netbsd/netbsd-10/obj.mac68k/home/netbsd/netbsd-10/src/sys/arch/mac68k/compile/QEMUM68K
[   1.0000000] Apple Macintosh Quadra 800  (68040)
[   1.0000000] cpu: delay factor 9045
[   1.0000000] fpu: mc68040
[   1.0000000] total memory = 256 MB
[   1.0000000] avail memory = 247 MB
[   1.0000000] mrg: 'Quadra/Centris ROMs' ROM glue, tracing off, debug off, silent traps
[   1.0000000] mrg: I/O map kludge for ROMs that use hardware addresses directly.
[   1.0000000] mainbus0 (root)
[   1.0000000] obio0 at mainbus0
[   1.0000000] esp0 at obio0 addr 0 (quick): address 0x55e000: NCR53C96, 16MHz, SCSI ID 7
[   1.0000000] scsibus0 at esp0: 8 targets, 8 luns per target
[   1.0000000] adb0 at obio0
[   1.0000000] intvid0 at obio0 @ f9000080: DAFB video subsystem, monitor sense 7
[   1.0000000] intvid0: 1152 x 870, monochrome
[   1.0000000] macfb0 at intvid0
[   1.0000000] wsdisplay0 at macfb0 (kbdmux ignored)
[   1.0000000] sn0 at obio0: integrated SONIC Ethernet adapter
[   1.0000000] sn0: Ethernet address 08:00:07:4d:52:a5
[   1.0000000] zsc0 at obio0 chip type 0
[   1.0000000] zsc0 channel 0: d_speed   9600 DCD clk 0 CTS clk 0
[   1.0000000] zstty0 at zsc0 channel 0 (console i/o)
[   1.0000000] zsc0 channel 1: d_speed   9600 DCD clk 0 CTS clk 0
[   1.0000000] zstty1 at zsc0 channel 1
[   1.0000000] nubus0 at mainbus0
[   1.0212576] scsibus0: waiting 2 seconds for devices to settle...
[   1.0495805] adb0 (direct, II series): 2 targets
[   1.0955249] aed0 at adb0 addr 0: ADB Event device
[   1.0955249] akbd0 at adb0 addr 2: standard keyboard
[   1.1027680] wskbd0 at akbd0 (mux ignored)
[   1.1027680] ams0 at adb0 addr 3: 1-button, 200 dpi mouse
[   1.1027680] wsmouse0 at ams0 (mux ignored)
[   1.1357911] WARNING: system needs entropy for security; see entropy(7)
[   3.1377085] sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST225N, 1.0> disk fixed
[   3.1875349] sd0: 1024 MB, 2080 cyl, 16 head, 63 sec, 512 bytes/sect x 2097152 sectors
[   3.2543943] sd1 at scsibus0 target 1 lun 0: <SEAGATE, ST225N, 1.0> disk fixed
[   3.2713878] sd1: 2048 MB, 4161 cyl, 16 head, 63 sec, 512 bytes/sect x 4194304 sectors
[   3.2713878] cd0 at scsibus0 target 3 lun 0: <MATSHITA, CD-ROM CR-8005, 1.0k> cdrom removable
[   3.2886621] swwdog0: software watchdog initialized
[   3.3038773] boot device: sd1
[   3.3378745] root on sd1a dumps on sd1b
[   3.3535671] mountroot: trying lfs...
[   3.3535671] mountroot: trying ffs...
[   3.3709040] root file system type: ffs
[   3.3870664] kern.module.path=/stand/mac68k/10.0/modules
[   3.3870664] time read from PRAM: 0xe026a61d
[   3.3870664] Date and time: March 2, 2023   19:14:37
[   3.4871698] init: copying out path `/sbin/init' 11
Thu Mar  2 19:14:38 AEDT 2023
Starting root file system check:
/dev/rsd1a: file system is clean; not checking
Setting sysctl variables:
ddb.onpanic: 1 -> 0
vfs.generic.magiclinks: 0 -> 1
[   7.1043854] WARNING: cd0: end of partition `g' exceeds the size of cd0 (820764)
[   7.1043854] WARNING: cd0: end of partition `h' exceeds the size of cd0 (820764)
swapctl: setting dump device to /dev/sd1b
swapctl: adding /dev/sd1b as swap device at priority 0
Starting file system checks:
[   8.7712474] entropy: ready
Loaded entropy from /var/db/entropy-file.
Setting tty flags.
Starting network.
Hostname: qemu-m68k
IPv6 mode: host
Configuring network interfaces: sn0.
Adding interface aliases:.
Waiting for duplicate address detection to finish...
Starting dhcpcd.
[  14.5227422] uvm_fault(0x460f10, 0x36652000, 0x1) -> 0xe
[  14.5227422]   type 8, code [mmu,,ssw]: 545
[  14.5227422] trap type 8, code = 0x545, v = 0x36653635
[  14.5227422] kernel program counter = 0x257a28
[  14.5227422] kernel: MMU fault trap
[  14.5227422] pid = 381, lid = 381, pc = 00257A28, ps = 2010, sfc = 1, dfc = 1
[  14.5227422] Registers:
[  14.5227422]              0        1        2        3        4        5        6        7
[  14.5227422] dreg: 0BD35E80 FFFFFFF8 00000000 00004450 00000000 00000003 00000000 00000000
[  14.5227422] areg: 00000000 0BD35EF0 0BD35E80 0BD35EE0 00000000 00D0ACB4 0BD35EB4 FFFF99AC

[  14.5227422] Kernel stack (0BD35CCC):
[  14.5227422] D35CCC: 00026FF6 0BD35DB0 00000080 00000000 00004450 00000000 00000003 00000000
[  14.5227422] D35CEC: 00000000 0BD35E80 0BD35EE0 00000000 00D0ACB4 36652000 00460F10 00000001
[  14.5227422] D35D0C: 00D15AA0 001C7D1A 00D15AA4 00000001 00000000 00000000 00000000 00000000
[  14.5227422] D35D2C: 00000000 00000000 00000000 00000000 00000008 00000000 00000000 00000000
[  14.5227422] D35D4C: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  14.5227422] D35D6C: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  14.5227422] D35D8C: 00000000 00000000 0405C000 0BD35EB4 0000308C 0BD35DB0 00000008 00000545
[  14.5227422] D35DAC: 36653635 0BD35E80 FFFFFFF8 00000000 00004450 00000000 00000003 00000000
[  14.5227422] D35DCC: 00000000 00000000 0BD35EF0 0BD35E80 0BD35EE0 00000000 00D0ACB4 0BD35EB4
[  14.5227422] D35DEC: FFFF99AC 00000000 20100025 7A287008 36653635 05450000 00000000 36653635
[  14.5227422] D35E0C: 36653635 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  14.5227422] D35E2C: 00000000 00232B96 00000003 00000001 0BD35EE0 0BD35E78 0BD35EA0 00000000
[  14.5227422] D35E4C: 0BD35EE0 00000000 00000003 00000000 00000000 0BD35F38 004375D4 00D34EC0
[  14.5227422] D35E6C: 00D0ACB4 0BD35EC0 0BD35FB4 00D0ACB4 0BD35F40 00026BE0 0035CFE1 00000074
[  14.5227422] D35E8C: 00000000 00000000 00000002 00000000 00000000 00000011 FFFFFFFF FFFFFFFD
[  14.5227422] D35EAC: 00D34EC0 00D0ACB4 0BD35F04 00232EAE 00D34EC0 00000003 00000001 0BD35EE0
[  14.5683611] panic: MMU fault
[  14.5683611] cpu0: Begin traceback...
[  14.5683611] ?(?)
[  14.5683611] db_panic(8,255300,bd35ccc,2554ec,35cf88) at 0
[  14.5733167] vpanic(35cf88,bd35cd8,bd35d98,27014,35cf88) + 18c
[  14.5733167] panic(35cf88,0,4450,0,3) + c
[  14.5733167] trap(bd35db0,8,545,36653635) + 2b0
[  14.5733167] ts2timo(?)
[  14.5733167] nanosleep1(d34ec0,3,1,bd35ee0,0) + 8
[  14.5733167] sys_clock_nanosleep(d34ec0,bd35f38,bd35f30,2,0) + 48
[  14.5733167] syscall_plain(1dd,d34ec0,bd35fb4,0,3) + ce
[  14.5733167] syscall(1dd) + 70
[  14.5733167] trap0() + e
[  14.5733167] cpu0: End traceback...
[  14.5733167] rebooting...
qemu: fatal: DOUBLE MMU FAULT

D0 = 000000df   A0 = 4080271a   F0 = 0000 0000000000000000  (           0)
D1 = 4080280e   A1 = 4080280e   F1 = 0000 0000000000000000  (           0)
D2 = 00000003   A2 = 40800833   F2 = 7fff ffffffffffffffff  (         nan)
D3 = 619608f8   A3 = 00180024   F3 = 7fff ffffffffffffffff  (         nan)
D4 = 4080413a   A4 = 40800870   F4 = 7fff ffffffffffffffff  (         nan)
D5 = 0000c000   A5 = 0ffbe024   F5 = 7fff ffffffffffffffff  (         nan)
D6 = 00000002   A6 = 4080413a   F6 = 7fff ffffffffffffffff  (         nan)
D7 = 408027a6   A7 = 00000002   F7 = 7fff ffffffffffffffff  (         nan)
PC = 4080282e   SR = 2600 T:0 I:6 SI -----
FPSR = 04000000 --Z-
                                 FPCR =     0000 X RN
   A7(MSP) = 00000000   A7(USP) = ffff99ac ->A7(ISP) = 00000002
VBR = 0x00000000
SFC = 1 DFC 1
SSW 00000525 TCR 0000c000 URP 0f472000 SRP 0ffffa00
DTTR0/1: f900c060/807fc040 ITTR0/1: f900c060/807fc040
MMUSR 00000000, fault at fffffffe
[1]   Abort trap (core dumped) /home/stix/src/github/q800-upstream3/build/qem...


Second boot appeared fine, but I'm still seeing random crashes:

Core was generated by `tmux'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0425b04a in je_extent_heap_remove () from /usr/lib/libc.so.12
(gdb) bt
#0  0x0425b04a in je_extent_heap_remove () from /usr/lib/libc.so.12
#1  0x042719ec in ?? () from /usr/lib/libc.so.12
#2  0x042731a4 in je_arena_dalloc_bin_junked_locked () from /usr/lib/libc.so.12
#3  0x04234e4e in je_tcache_bin_flush_small () from /usr/lib/libc.so.12
#4  0x04235f62 in je_tcache_event_hard () from /usr/lib/libc.so.12
#5  0x042795d2 in free () from /usr/lib/libc.so.12
#6  0x00042246 in grid_destroy ()
#7  0x0005cbfe in screen_free ()
#8  0x000380e6 in format_draw ()
#9  0x00068986 in status_redraw ()
#10 0x00057efe in screen_redraw_screen ()
#11 0x00061d84 in server_client_loop ()
#12 0x0006354a in server_loop ()
#13 0x00056668 in proc_loop ()
#14 0x00063c48 in server_start ()
#15 0x0001c7a8 in client_main ()
#16 0x00088b90 in main ()


Core was generated by `make'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00017916 in Parse_File ()
(gdb) bt
#0  0x00017916 in Parse_File ()
#1  0x04248804 in ?? ()
#2  0x0003e714 in ?? ()
#3  0x00032c44 in ?? ()
#4  0x00000000 in ?? ()


And dhcpcd spins:
0x0000e786 in eloop_q_timeout_delete ()
(gdb) bt
#0  0x0000e786 in eloop_q_timeout_delete ()
#1  0x000311de in ipv6nd_recvmsg ()
#2  0x0001f1d8 in ps_inet_dispatch ()
#3  0x0001da38 in ps_recvpsmsg ()
#4  0x0001ef12 in ps_inet_dodispatch ()
#5  0x0000ec46 in eloop_start ()
#6  0x00039bec in main ()

The only other differences I can see between your setup and my setup are:

- My test installation is running NetBSD 9.1 rather than 10.0
- I am using a userspace network instead of a tap interface

If you temporarily drop the -nic tap part of the command line, does that improve reliability at all? I'd also be interested to see if you see the same problem with NetBSD 9.1.


ATB,

Mark.


Home | Main Index | Thread Index | Old Index