Port-mac68k archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Stability of netbsd-10 on real hardware?



On 26/02/2023 22:56, Paul Ripke wrote:

I've managed to get netbsd-10 installed running under qemu-system-m68k,
which emulates a Apple Macintosh Quadra 800. Building a few random things
and running some random stuff, I've tripped over a bunch of little stability
issues, and I'm wondering if these are as a result of the emulation, netbsd,
or perhaps compilers. MacOS 8.1 seems stable, although I'd imagine that
netbsd stresses the emulation far more. I've seen internet breadcrumbs
saying that Linux is also stable.

Can I ask specifically which branch you are currently using? If you are using a branch with "upstream" in the name then those branches are constantly being rebased onto QEMU git master for testing with the aim of submitting upstream.

I've just pushed the latest version of the patches to https://github.com/mcayland/qemu/tree/q800.upstream3 if you can confirm whether the issues still exist there.

Booted kernel is:
NetBSD qemu-m68k 10.0_BETA NetBSD 10.0_BETA (QEMUM68K) #3: Sun Feb 26 12:32:43 AEDT 2023  stix@slave:/home/netbsd/netbsd-10/obj.mac68k/home/netbsd/netbsd-10/src/sys/arch/mac68k/compile/QEMUM68K mac68k
which is basically GENERIC with:
options MAXDSIZE=268435456
options DIAGNOSTIC
options DEBUG
options LOCKDEBUG

FWIW I did my NetBSD installation and testing for the QEMU patches using NetBSD-9.1-mac68k.iso.

So, issues I've noticed:

First off, top(1) occasionally dies:

Core was generated by `top'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0417904a in je_extent_heap_remove () from /usr/lib/libc.so.12
(gdb) bt
#0  0x0417904a in je_extent_heap_remove () from /usr/lib/libc.so.12
#1  0x0418f9ec in ?? () from /usr/lib/libc.so.12
#2  0x041911a4 in je_arena_dalloc_bin_junked_locked () from /usr/lib/libc.so.12
#3  0x04152e4e in je_tcache_bin_flush_small () from /usr/lib/libc.so.12
#4  0x04153f62 in je_tcache_event_hard () from /usr/lib/libc.so.12
#5  0x041975d2 in free () from /usr/lib/libc.so.12
#6  0x041df6c6 in __vfprintf_unlocked_l () from /usr/lib/libc.so.12
#7  0x041d3dee in vsprintf_l () from /usr/lib/libc.so.12
#8  0x041d3e36 in vsprintf () from /usr/lib/libc.so.12
#9  0x041d3e76 in sprintf () from /usr/lib/libc.so.12
#10 0x0000dd1a in format_next_process ()
#11 0x0000b284 in do_display ()
#12 0x0000e3f2 in main ()

I've seen pretty much the same backtrace from make building archivers/zstd:

Core was generated by `make'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0413904a in je_extent_heap_remove () from /usr/lib/libc.so.12
(gdb) bt
#0  0x0413904a in je_extent_heap_remove () from /usr/lib/libc.so.12
#1  0x041390f8 in ?? () from /usr/lib/libc.so.12
#2  0x0413a816 in ?? () from /usr/lib/libc.so.12
#3  0x0414f372 in ?? () from /usr/lib/libc.so.12
#4  0x04150aac in je_arena_tcache_fill_small () from /usr/lib/libc.so.12
#5  0x0411290c in je_tcache_alloc_small_hard () from /usr/lib/libc.so.12
#6  0x04155ebe in malloc () from /usr/lib/libc.so.12
#7  0x041baee2 in __smakebuf () from /usr/lib/libc.so.12
#8  0x041bab8e in __swsetup () from /usr/lib/libc.so.12
#9  0x0419ce98 in __vfprintf_unlocked_l () from /usr/lib/libc.so.12
#10 0x0419f8e2 in vfprintf () from /usr/lib/libc.so.12
#11 0x0419b3dc in printf () from /usr/lib/libc.so.12
#12 0x00009ae6 in Compat_RunCommand ()
#13 0x00009ed2 in Compat_Make ()
#14 0x00009d54 in Compat_Make ()
#15 0x00009d54 in Compat_Make ()
#16 0x0000a3f4 in Compat_MakeAll ()
#17 0x000279ec in main ()

And also ksh, trying to build devel/cmake:

Core was generated by `ksh'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0412f04a in je_extent_heap_remove () from /lib/libc.so.12
(gdb) bt
#0  0x0412f04a in je_extent_heap_remove () from /lib/libc.so.12
#1  0x0412f0f8 in ?? () from /lib/libc.so.12
#2  0x0412f1a2 in ?? () from /lib/libc.so.12
#3  0x0412f418 in ?? () from /lib/libc.so.12
#4  0x0412f67c in ?? () from /lib/libc.so.12
#5  0x04145172 in je_arena_extents_dirty_dalloc () from /lib/libc.so.12
#6  0x04129d00 in je_large_dalloc () from /lib/libc.so.12
#7  0x0414d968 in free () from /lib/libc.so.12
#8  0x041b403a in _finidir () from /lib/libc.so.12
#9  0x041b3cc8 in closedir () from /lib/libc.so.12
#10 0x00012260 in globit ()
#11 0x00012176 in globit ()
#12 0x00012344 in glob_str ()
#13 0x00012392 in ksh_glob ()
#14 0x000137d8 in expand ()
#15 0x0000db68 in x_file_glob ()
#16 0x0000e326 in x_cf_glob ()
#17 0x0002a282 in complete_word.isra ()
#18 0x0002bb58 in x_vi ()
#19 0x0000dd9c in x_read ()
#20 0x0001bd74 in getsc__ ()
#21 0x0001c216 in getsc_bn.part ()
#22 0x0001c478 in yylex ()
#23 0x0002302e in get_command ()
#24 0x00023f28 in pipeline ()
#25 0x0002404e in c_list ()
#26 0x000247e8 in compile ()
#27 0x0001eaf6 in shell ()
#28 0x0002d05e in main ()

Building gcc10 from pkgsrc consistently spins on compiling the same file, at
(apparently?) the same instruction:

gmake[3]: Entering directory '/home/tmp/pkgwrk.qemu-m68k/lang/gcc10/work/build/gcc'
c++ -std=gnu++98 -fno-PIE -c   -g -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wno-format -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -Wno-unused -DHAVE_CONFIG_H -I. -I. -I../../gcc-10.4.0/gcc -I../../gcc-10.4.0/gcc/. -I../../gcc-10.4.0/gcc/../include -I./../intl -I../../gcc-10.4.0/gcc/../libcpp/include -I/home/tmp/pkgwrk.qemu-m68k/lang/gcc10/work/build/./gmp -I/home/tmp/pkgwrk.qemu-m68k/lang/gcc10/work/gcc-10.4.0/gmp -I/home/tmp/pkgwrk.qemu-m68k/lang/gcc10/work/build/./mpfr/src -I/home/tmp/pkgwrk.qemu-m68k/lang/gcc10/work/gcc-10.4.0/mpfr/src -I/home/tmp/pkgwrk.qemu-m68k/lang/gcc10/work/gcc-10.4.0/mpc/src  -I../../gcc-10.4.0/gcc/../libdecnumber -I../../gcc-10.4.0/gcc/../libdecnumber/dpd -I../libdecnumber -I../../gcc-10.4.0/gcc/../libbacktrace -I/home/tmp/pkgwrk.qemu-m68k/lang/gcc10/work/build/./isl/include -I/home/tmp/pkgwrk.qemu-m68k/lang/gcc10/work/gcc-10.4.0/isl/include -I/usr/include -o gimple-match.o -MT gimple-match.o -MMD -MP -MF ./.deps/gimple-match.TPo gimple-match.c

(gdb) bt
#0  0x000c5c84 in memset@plt ()
#1  0x0098e8f4 in ?? ()
#2  0x0098fa1a in ggc_collect() ()
#3  0x00402382 in execute_one_pass(opt_pass*) ()
#4  0x0040246e in ?? ()
#5  0x004024d4 in execute_pass_list(function*, opt_pass*) ()
#6  0x003ffb62 in ?? ()
#7  0x00402a76 in execute_ipa_pass_list(opt_pass*) ()
#8  0x0090887e in ?? ()
#9  0x0090ae30 in symbol_table::finalize_compilation_unit() ()
#10 0x003822bc in ?? ()
#11 0x00c9f616 in toplev::main(int, char**) ()
#12 0x00c9dfc4 in main ()
(gdb) x/i 0x000c5c84
=> 0xc5c84 <memset@plt>:        jmp %pc@(0x1087f78 <memset%got.plt@localhost>)@(00000000)

Also, periodically I find dhcpcd's "manager" process spinning, requiring
dhcpcd restart:

(gdb) bt
#0  0x0000e756 in eloop_q_timeout_delete ()
#1  0x000311c8 in ipv6nd_recvmsg ()
#2  0x0001f1d8 in ps_inet_dispatch ()
#3  0x0001da38 in ps_recvpsmsg ()
#4  0x0001ef12 in ps_inet_dodispatch ()
#5  0x0000ec46 in eloop_start ()
#6  0x00039bec in main ()

I've also seen an illegal instruction kernel crash while starting dhcpcd.
Unfortunately, I didn't copy down the stack, but somehow I don't think
it's all that important. If this were real hardware, I'd be wondering if
we're flushing caches correctly.


ATB,

Mark.



Home | Main Index | Thread Index | Old Index