NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NVMM not working, NetBSD 9x amd64



On Thu, 16 Jul 2020 at 21:21, Chavdar Ivanov <ci4ic4%gmail.com@localhost> wrote:
>
> On Wed, 15 Jul 2020 at 14:25, Chavdar Ivanov <ci4ic4%gmail.com@localhost> wrote:
> >
> > Checking once more, it would appear I haven't tried qemu-nvmm on the
> > kernel from 12-th of July; the last successful execution was on the
> > 11th with a kernel and system from the 9th of July, so the window is a
> > bit wider than initially expected.
>
>
> Does anybody use nvmm under -current at all? It hasn't been functional
> since at least the 12th of July, e.g.
>
> (crash immediately after starting up a Linux guest)
> .....
>  crash -M netbsd.22.core -N netbsd.22
> Crash version 9.99.69, image version 9.99.69.
> crash: _kvm_kvatop(0)
> Kernel compiled without options LOCKDEBUG.
> System panicked: trap
> Backtrace from time of crash is available.
> crash> bt
> _KERNEL_OPT_NARCNET() at 0
> ?() at ffffa0819ba16000
> sys_reboot() at sys_reboot
> vpanic() at vpanic+0x15b
> snprintf() at snprintf
> startlwp() at startlwp
> calltrap() at calltrap+0x19
> kqueue_register() at kqueue_register+0x43e
> kevent1() at kevent1+0x138
> sys___kevent50() at sys___kevent50+0x33
> syscall() at syscall+0x26e
> --- syscall (number 435) ---
> syscall+0x26e:
>
> and
>
> (starting a Windows 10 x86 guest - does not panic, system continues to
> respond to ping, no further input is possible though, in this case I
> can get into the debugger, here is the not very useful trace)
>
> crash> bt
> _KERNEL_OPT_NARCNET() at 0
> _KERNEL_OPT_ACPI_SCANPCI() at _KERNEL_OPT_ACPI_SCANPCI+0x5
> sys_reboot() at sys_reboot
> db_fncall() at db_fncall
> db_command() at db_command+0x127
> db_command_loop() at db_command_loop+0xa6
> db_trap() at db_trap+0xe6
> kdb_trap() at kdb_trap+0xe1
> trap() at trap+0x2b7
> --- trap (number 1) ---
> breakpoint() at breakpoint+0x5
> wskbd_translate() at wskbd_translate+0xff5
> wskbd_input() at wskbd_input+0xbe
> pckbd_input() at pckbd_input+0x7f
> pckbcintr() at pckbcintr+0x6a
> intr_biglock_wrapper() at intr_biglock_wrapper+0x23
>
> When I try to start a FreeBSD 12 guest, machine locks hard.
>
>
> >
> > On Wed, 15 Jul 2020 at 14:20, Chavdar Ivanov <ci4ic4%gmail.com@localhost> wrote:
> > >
> > > On Wed, 15 Jul 2020 at 11:08, Chavdar Ivanov <ci4ic4%gmail.com@localhost> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I decided to reuse this thread; nvmm again ceased to work from yesterday.
> > > >
> > > > On
> > > >
> > > > # uname -a
> > > > NetBSD ymir 9.99.69 NetBSD 9.99.69 (GENERIC) #15: Tue Jul 14 11:07:52
> > > > BST 2020  sysbuild@ymir:/home/sysbuild/amd64/obj/home/sysbuild/src/sys
> > > > /arch/amd64/compile/GENERIC amd64
> > > >
> > > > I can 'modload nvmm', but when I try to start a vm with nvmm
> > > > acceleration, I get a hard lock, immediately after the message about
> > > > the interface being initialized. I cannot break into the debugger to
> > > > trace and I don't get a dump on reboot. It appears the machine is in a
> > > > deep CPU loop, although it doesn't appear too  hot.
> > > >
> > > > I then tried booting onetbsd, which is from the 12th of July and on
> > > > which nvmm used to work just fine. It is also the same micro version -
> > > > 9.99.59, so n theory should work - but in this case I get a panic when
> > > > I 'modload nvmm' - again, I see the short panic message on the screen
> > > > and the machine apparently gets into another loop here, which I cannot
> > > > break the usual way into the debugger and the only thing I can do is
> > > > hit the power button. There weren't that many kernel changes in this
> > > > period, most notably the per-CPU IDT patch, but I don't know if it is
> > > > relevant.
> > > >
> > >
> > > I rebuilt my system again today, this time I managed to get a core
> > > dump after the panic:
> > >
> > >  crash -M netbsd.22.core -N netbsd.22
> > > Crash version 9.99.69, image version 9.99.69.
> > > crash: _kvm_kvatop(0)
> > > Kernel compiled without options LOCKDEBUG.
> > > System panicked: trap
> > > Backtrace from time of crash is available.
> > > crash> bt
> > > _KERNEL_OPT_NARCNET() at 0
> > > ?() at ffffa0819ba16000
> > > sys_reboot() at sys_reboot
> > > vpanic() at vpanic+0x15b
> > > snprintf() at snprintf
> > > startlwp() at startlwp
> > > calltrap() at calltrap+0x19
> > > kqueue_register() at kqueue_register+0x43e
> > > kevent1() at kevent1+0x138
> > > sys___kevent50() at sys___kevent50+0x33
> > > syscall() at syscall+0x26e
> > > --- syscall (number 435) ---
> > > syscall+0x26e:
> > >
> > > Any ideas?
> > >
> > > The dmesg shows, BTW:
> > >
> > > Jul 15 14:09:33 ymir /netbsd: [ 108.7517032] nvmm0: attached, using
> > > backend x86-vmx
> > > Jul 15 14:11:40 ymir syslogd[946]: restart
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2116186] fatal protection fault in
> > > supervisor mode
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2116186] trap type 4 code 0x323
> > > rip 0xffffffff80c89e21 cs 0x8 rflags 0x10282 cr2 0x784321f9f000 ilevel
> > > 0
> > >  rsp 0xffffa0819ba1ac50
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2116186] curlwp 0xffffd066fc45b100
> > > pid 2869.2869 lowest kstack 0xffffa0819ba162c0
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2116186] panic: trap
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2116186] cpu0: Begin traceback...
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2116186] vpanic() at netbsd:vpanic+0x152
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2116186] snprintf() at netbsd:snprintf
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2116186] startlwp() at netbsd:startlwp
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2116186] alltraps() at netbsd:alltraps+0xc3
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2116186] kqueue_register() at
> > > netbsd:kqueue_register+0x43e
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2116186] kevent1() at netbsd:kevent1+0x138
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2116186] sys___kevent50() at
> > > netbsd:sys___kevent50+0x33
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2116186] syscall() at netbsd:syscall+0x26e
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2116186] --- syscall (number 435) ---
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2216185] netbsd:syscall+0x26e:
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2216185] cpu0: End traceback...
> > > Jul 15 14:11:40 ymir /netbsd:
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2216185] dumping to dev 168,2
> > > (offset=8, size=5225879):
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2216185] dump <5>ktrace timeout
> > > Jul 15 14:11:40 ymir /netbsd: ktrace timeout
> > > Jul 15 14:11:40 ymir /netbsd: [ 131.2216185] ktrace timeout
> > > Jul 15 14:11:40 ymir syslogd[946]: last message repeated 2 times
> > >
> > > > Chavdar
> > > >
> > > > On Wed, 20 May 2020 at 22:09, Maxime Villard <max%m00nbsd.net@localhost> wrote:
> > > > >
> > > > > Le 09/05/2020 à 10:54, Maxime Villard a écrit :
> > > > > > Le 01/05/2020 à 19:13, Chavdar Ivanov a écrit :
> > > > > >> On Fri, 1 May 2020 at 13:59, Rhialto <rhialto%falu.nl@localhost> wrote:
> > > > > >>>
> > > > > >>> On Sun 26 Apr 2020 at 21:39:12 +0200, Maxime Villard wrote:
> > > > > >>>> Maybe I should add a note in the man page to say that you cannot expect a CPU
> > > > > >>>> from before ~2010 to have virtualization support.
> > > > > >>>
> > > > > >>> Or even better, what one should look for in the output of, for example,
> > > > > >>> "cpuctl identify 0". Since I didn't exactly know, I made some guesses
> > > > > >>> and assumed that my cpu ("Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz")
> > > > > >>> did't have the required features (it is from 2009 or so).  But this
> > > > > >>> thread inspired me to modload nvmm, which actually helped, so I found
> > > > > >>> out that it even works on this cpu.
> > > > > >
> > > > > > On Intel CPUs the information is hidden in privileged registers that cpuctl
> > > > > > cannot access, so no, it won't be possible.
> > > > > >
> > > > > > However the day before I had added clear warnings:
> > > > > >
> > > > > >      https://mail-index.netbsd.org/source-changes/2020/04/30/msg116878.html
> > > > > >
> > > > > > So now it will tell you what's missing.
> > > > > >
> > > > > >>> Of course I immediately tried it with Haiku (the BeOS clone) from
> > > > > >>> https://download.haiku-os.org/nightly-images/x86_64/ and I got mixed
> > > > > >>> results. Once it manages to boot it works fine and nicely fast (much
> > > > > >>> better than without nvmm), but quite often it crashes into its kernel
> > > > > >>> debugger during the first 10 seconds of booting, with different messages
> > > > > >>> (I have seen "General Protection Exception" and "ASSERT failed ...
> > > > > >>> fCPUCount >= 0").  ("qemu-system-x86_64 -accel nvmm -m 2G -cdrom
> > > > > >>> haiku-master-hrev54106-x86_64-anyboot.iso" on a 9.0 GENERIC kernel)
> > > > > >
> > > > > > This was a missing filtering in the CPU identification, on CPUs that have SMT,
> > > > > > leading Haiku to believe it had SMT threads that it didn't.
> > > > > >
> > > > > >      https://mail-index.netbsd.org/source-changes/2020/05/09/msg117188.html
> > > > > >
> > > > > > As far as I can tell, your CPU has SMT.
> > > > > >
> > > > > >> I've never used Haiku so far; upon reading this I decided to try it on
> > > > > >> my NetBSD-current laptop with nvmm.
> > > > > >>
> > > > > >> So far, with several attempts, it works with no problem whatsoever,
> > > > > >> directly booting the newest image on the site pointed above.
> > > > > >>
> > > > > >> Another OS to play with...
> > > > > >>
> > > > > >> The host cpu is Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz, id 0x306a9.
> > > > > >
> > > > > > This CPU too has SMT.
> > > > > >
> > > > > > Le 01/05/2020 à 20:10, Rhialto a écrit :
> > > > > >> There might well be an improvement between 9.0 and -current, of course.
> > > > > >> It's good to hear that it works for you; I might upgrade to a -current
> > > > > >> kernel.
> > > > > >
> > > > > > Overall, no, each improvement in -current is propagated to 9, so you should
> > > > > > get the same results on both (modulo kernel bugs added in places not
> > > > > > related to NVMM).
> > > > > >
> > > > > > Le 01/05/2020 à 20:52, Chavdar Ivanov a écrit :
> > > > > >> Earlier I had similar issues with OmniOS under qemu-nvmm - sometimes
> > > > > >> it worked without a problem, sometimes I couldn't even boot. I still
> > > > > >> have no idea why.
> > > > > >
> > > > > > Maybe that's the same problem, I'll test.
> > > > >
> > > > > I tested the other day, and I saw no problem. With debugging I noticed that
> > > > > OmniOS, too, uses the CPU information that used to be mis-reported by NVMM,
> > > > > so probably my fix must have helped.
> > > > >
> > > > > Please confirm the issues are fixed (HaikuOS+OmniOS).
> > > >
> > > >
> > > >
> > > > --
> > > > ----
> > >
> > >
> > >
> > > --
> > > ----
> >
> >
> >
> > --
> > ----
>
>
>
> --
> ----


All good now (with the weird exception of one Linux guest, crashing on
boot, which has to be restarted with (qemu)system_reset on every boot,
but otherwise works fine).


-- 
----


Home | Main Index | Thread Index | Old Index