Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Panic on a -current from 13/12/2018
I managed to build the VBox v6.0 additions under 8.99.28, now when
using vioif interface I get reasonable results:
...
PS C:\bin\iperf-3.1.3-win64> .\iperf3.exe -c marge
Connecting to host marge, port 5201
[ 4] local 192.168.0.35 port 10152 connected to 192.168.0.6 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 68.2 MBytes 572 Mbits/sec
[ 4] 1.00-2.00 sec 71.9 MBytes 603 Mbits/sec
[ 4] 2.00-3.00 sec 69.8 MBytes 585 Mbits/sec
[ 4] 3.00-4.00 sec 71.9 MBytes 603 Mbits/sec
[ 4] 4.00-5.00 sec 68.9 MBytes 578 Mbits/sec
[ 4] 5.00-6.00 sec 69.4 MBytes 581 Mbits/sec
[ 4] 6.00-7.00 sec 70.2 MBytes 590 Mbits/sec
[ 4] 7.00-8.00 sec 75.6 MBytes 634 Mbits/sec
[ 4] 8.00-9.00 sec 70.1 MBytes 589 Mbits/sec
[ 4] 9.00-10.00 sec 73.8 MBytes 619 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 710 MBytes 595 Mbits/sec sender
[ 4] 0.00-10.00 sec 710 MBytes 595 Mbits/sec receiver
iperf Done.
.....
It is interesting also that when NetBSD is ran under XenServer (XCP-NG
actually) in PV mode, benchmarked against the same 8.99.28 version
running on a physical machine, everything on a 1GB interface and
switch, I get maximum saturated line (~ 933Mb/s). When the iperf3
server is on the same XCP-BG guest and the client - a CentOS guest -
the figures approach 2.3Gb/sec.
On Wed, 19 Dec 2018 at 12:36, Chavdar Ivanov <ci4ic4%gmail.com@localhost> wrote:
>
> The workaround is fine. In the mean time I upgraded my VirtualBox
> installation to 6.0 (released yesterday) and will check again.
>
> While here I did some, admittedly not very scientific, benchmarks on
> network performance under VirtualBox. I started a single guest of a
> different type, had iperf3 installed and running as server on the
> guest and tested the iperf3 client connection from the host. All
> guests were configured to use bridged adapter to the active (WiFi, in
> my case Intel AC-7265, but it shouldn't matter), using the first
> (desktop) Intel emulation (82540EM). The results varied wildly between
> different guests, the best being the latest Linux guests (OpenSUSE
> Tumbleweed and Fedora 29), the worst happened to be NetBSD-current. I
> also tested on a vew systems the difference in speed between the above
> chosen adapter type and the virtio one; this again showed differences
> - NetBSD was better, on some tests by a factor of two, when using
> virtio, whereas OpenBSD was the other way round - the Intel emulation
> was twice as fast. I've attached the log file of some of these
> attempts for reference. I didn't have Guest additions running on any
> of the BSD guests, which perhaps is relevant; the other systems had it
> configured. I also switched the emulation on the NetBSD host from KVM
> to default, as you suggested.
>
> As I said, we shouldn' t be reading too much from this, but it is
> still a point.
>
>
> On Wed, 19 Dec 2018 at 02:35, Masanobu SAITOH <msaitoh%execsw.org@localhost> wrote:
> >
> > On 2018/12/18 20:13, Masanobu SAITOH wrote:
> > > Hi!
> > >
> > > On 2018/12/17 19:38, Chavdar Ivanov wrote:
> > >> I went through a series of tests. It is indeed that point the panic
> > >> takes place, the two parts of the screendump are in
> > >>
> > >> http://ci4ic4.tx0.org/nb-panic-wm-03.png and
> > >> http://ci4ic4.tx0.org/nb-panic-wm-04.png .
> > >
> > > Thanks. This is the workaround code for broken lapic timer
> > > counter which was added in:
> > >
> > > http://mail-index.netbsd.org/source-changes/2017/11/23/msg089946.html
> > > http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/x86/x86/lapic.c.diff?r1=1.63&r2=1.64&f=h
> > >
> > > Your VM is configured act as KVM
> > > (See system->acceleration(L) tab or see .box file's "Paravirt provider=")
> > >
> > > I set up my vm to KVM and
> > >
> > >> VirtualBox gives three Intel NIC options:
> > >>
> > >> Intel PRO/1000 MT Desktop (82540EM)
> > >> Intel PRO/1000 T Server (82543GC)
> > >> Intel PRO/1000 MT Server (82545EM)
> > >>
> > >> I was able to get a panic with the same kernel from 13/12/2018 only
> > >> when I select the second option:
> > >
> > > I changed my VM's setting to use 82543GC. I tried hibernation
> > > three times but I couldn't reproduce the problem. I couldn't reproduce
> > > the same problem, but this problem must be exist because you had the
> > > problem.
> > >
> > > The possibilities are:
> > > a) VirtualBox's lapic is not good.
> > > b) Our workaround code is not perfect or somewhere is not good.
> > > c) any others
> > >
> > > I suspect this problem is not from if_wm.c. but from
> > >> There was a VirtualBox upgrade a few weeks ago, perhaps the problem is there.
> > >
> > >
> > > I read vbox/src/VBox/Devices/Network/DevE1000.cpp. One of the
> > > difference between 82543GC emulation and other two is that
> > > it generates interrupt when chip reset occurred. If other network
> > > device emulation works well, I suspect that the reset timing in vbox
> > > is not good and it makes no update of lapic timer.
> > >
> > > Workarounds are:
> > > a) Don't use KVM mode and use "Default" or other.
> > > On my Windows7's virtual box, "Default" makes
> > > CPUID2_RAZ bit not set. It makes NetBSD recognize
> > > it's not on KVM.
> >
> > If the problem which lapic timer stops also exist on the "Defalut" mode,
> > that workaround isn't used and delay() won't work. If so, b) is the best
> > to avoid the problem.
> >
> > > b) Use Other than 82543GC.
> > > c) any others
> > >
> > > BTW, when I use 82543GC emulation, I got the following bug:
> > >> makphy0 at wm0 phy 0: Marvell 88E1000 Gigabit PHY, rev. 0
> > >> makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> > >> makphy1 at wm0 phy 1: Marvell 88E1000 Gigabit PHY, rev. 0
> > >> makphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> > > (snip)
> > >> makphy31 at wm0 phy 31: Marvell 88E1000 Gigabit PHY, rev. 0
> > >> makphy31: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> > >> ifmedia_match: multiple match for 0x20/0xfbff9ff, selected instance 0
> > >
> > > This _IS_ a bug of VirtualBox's 82543GC emulation.
> > > DevE1000Phy.cpp line 568 says:
> > >
> > > /* Note: A single PHY is supported, ignore PHYADR */
> > >
> > > So I recommend all users not to use 82543GC emulation until this PHY
> > > bug is fixed.
> > >
> > >> ......
> > >> -rw------- 1 root wheel 2199810 Dec 17 09:24 netbsd.9
> > >> -rw------- 1 root wheel 147348504 Dec 17 09:24 netbsd.9.core
> > >> /var/crash # gdb netbsd.9
> > >> GNU gdb (GDB) 8.0.1
> > >> Copyright (C) 2017 Free Software Foundation, Inc.
> > >> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> > >> This is free software: you are free to change and redistribute it.
> > >> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> > >> and "show warranty" for details.
> > >> This GDB was configured as "x86_64--netbsd".
> > >> Type "show configuration" for configuration details.
> > >> For bug reporting instructions, please see:
> > >> <http://www.gnu.org/software/gdb/bugs/>.
> > >> Find the GDB manual and other documentation resources online at:
> > >> <http://www.gnu.org/software/gdb/documentation/>.
> > >> For help, type "help".
> > >> Type "apropos word" to search for commands related to "word"...
> > >> Reading symbols from netbsd.9...(no debugging symbols found)...done.
> > >> (gdb) target kvm netbsd.9.core
> > >> 0xffffffff80222d75 in cpu_reboot ()
> > >> (gdb) bt
> > >> #0 0xffffffff80222d75 in cpu_reboot ()
> > >> #1 0xffffffff8076e6f7 in db_reboot_cmd ()
> > >> #2 0xffffffff8076ee92 in db_command ()
> > >> #3 0xffffffff8076f20c in db_command_loop ()
> > >> #4 0xffffffff80772b80 in db_trap ()
> > >> #5 0xffffffff8021f5c2 in kdb_trap ()
> > >> #6 0xffffffff802244b1 in trap ()
> > >> #7 0xffffffff8021d568 in alltraps ()
> > >> #8 0xffffffff8021de45 in breakpoint ()
> > >> #9 0xffffffff809d54b0 in vpanic ()
> > >> #10 0xffffffff809d5550 in panic ()
> > >> #11 0xffffffff802514f0 in lapic_delay ()
> > >> #12 0xffffffff80353270 in wm_gmii_i82543_readreg ()
> > >> #13 0xffffffff807b1aa5 in makphy_status ()
> > >> #14 0xffffffff807b1cf7 in makphy_service ()
> > >> #15 0xffffffff807a826c in mii_tick ()
> > >> #16 0xffffffff80360926 in wm_tick ()
> > >> #17 0xffffffff809b6b96 in callout_softclock ()
> > >> #18 0xffffffff809aaa55 in softint_dispatch ()
> > >> #19 0xffffffff8021d21f in Xsoftintr ()
> > >>
> > >>
> > >> I rebuilt the kernel (on a different physical host, but there may
> > >> have been an update on the 14th there) and tried to get a panic with
> > >> the .gdb kernel, but it never happened.
> > >>
> > >> Obviously it is not a problem for me or anyone running NetBSD as a
> > >> VirtualBox guest, as using vioif / virtio is almost as twice as fast,
> > >> but I reported the panic thinking it may be relevant in other use
> > >> cases.
> > >
> > > Thank you for your report!
> > >
> > >
> > >
> > >> On Mon, 17 Dec 2018 at 07:49, Masanobu SAITOH <msaitoh%execsw.org@localhost> wrote:
> > >>>
> > >>> On 2018/12/17 1:09, Chavdar Ivanov wrote:
> > >>>> I have no idea. As I said, it is running under VirtualBox on a Windows
> > >>>> 10 host; I put the host in hibernation whilst the NetBSD guest is
> > >>>> running.
> > >>>
> > >>> I tested today's -current on VirtualBox 5.2.22 on Windows 7 64bit
> > >>> (on Core i7-2600). I tried hybernate(shutdown ->hybernate(H)) a few times
> > >>> but I couldn't reproduce the problem yet.
> > >>>
> > >>>>>>> while (deltat > 0) {
> > >>>>>>> xtick = lapic_gettick();
> > >>>>>>> if (lapic_broken_periodic && xtick == 0 && otick == 0) {
> > >>>>>>> lapic_initclocks();
> > >>>>>>> xtick = lapic_gettick();
> > >>>>>>> if (xtick == 0)
> > >>>>>>> panic("lapic timer stopped ticking"); <=========== here!
> > >>>>>>> }
> > >>>
> > >>> If that panic is from this, lapic_broken_periodic must be true, but it's set only
> > >>> when the VM is KVM:
> > >>>> /*
> > >>>> * Apply workaround for broken periodic timer under KVM
> > >>>> */
> > >>>> if (vm_guest == VM_GUEST_KVM) {
> > >>>> lapic_broken_periodic = true;
> > >>>> lapic_timecounter.tc_quality = -100;
> > >>>> aprint_debug_dev(ci->ci_dev,
> > >>>> "applying KVM timer workaround\n");
> > >>>> }
> > >>>
> > >>> Could you try to reproduce the problem and see the panic message?
> > >>> ci4ic4-panic-01.png has backtrace and it wiped out the panic message.
> > >>>
> > >>> Regards.
> > >>>
> > >>>> Previously it survived this, using the Intel Desktop NIC
> > >>>> emulation within VirtualBox, even my ssh connections (from the host to
> > >>>> the guest) remained active. I switched the NIC emulation for the
> > >>>> NetBSD guest to virtio-net, now it behaves as before, surviving a
> > >>>> hibernation.
> > >>>>
> > >>>> There was a VirtualBox upgrade a few weeks ago, perhaps the problem is there.
> > >>>> On Sun, 16 Dec 2018 at 15:55, SAITOH Masanobu <msaitoh%execsw.org@localhost> wrote:
> > >>>>>
> > >>>>> Hi.
> > >>>>>
> > >>>>> On 2018/12/16 18:09, Chavdar Ivanov wrote:
> > >>>>>> Repeated this morning. Happens when the host hibernates when the
> > >>>>>> machine is running. The initial trace is slightly different, but the
> > >>>>>> lines with wm_gmii are the same, so for now I will switch to a
> > >>>>>> different NIC emulator.
> > >>>>>>
> > >>>>>
> > >>>>> In your .png:
> > >>>>>> vpanic()
> > >>>>>> lapic_delay()
> > >>>>>> wm_gmii_mdic_readreg()
> > >>>>>> .
> > >>>>>> .
> > >>>>>> .
> > >>>>>
> > >>>>> There is no panic message itself, but I suspect it's:
> > >>>>>> static void
> > >>>>>> lapic_delay(unsigned int usec)
> > >>>>>> {
> > >>>>>> int32_t xtick, otick;
> > >>>>>> int64_t deltat; /* XXX may want to be 64bit */
> > >>>>>>
> > >>>>>> otick = lapic_gettick();
> > >>>>>>
> > >>>>>> if (usec <= 0)
> > >>>>>> return;
> > >>>>>> if (usec <= 25)
> > >>>>>> deltat = lapic_delaytab[usec];
> > >>>>>> else
> > >>>>>> deltat = (lapic_frac_cycle_per_usec * usec) >> 32;
> > >>>>>>
> > >>>>>> while (deltat > 0) {
> > >>>>>> xtick = lapic_gettick();
> > >>>>>> if (lapic_broken_periodic && xtick == 0 && otick == 0) {
> > >>>>>> lapic_initclocks();
> > >>>>>> xtick = lapic_gettick();
> > >>>>>> if (xtick == 0)
> > >>>>>> panic("lapic timer stopped ticking"); <=========== here!
> > >>>>>> }
> > >>>>>> if (xtick > otick)
> > >>>>>> deltat -= lapic_tval - (xtick - otick);
> > >>>>>> else
> > >>>>>> deltat -= otick - xtick;
> > >>>>>> otick = xtick;
> > >>>>>>
> > >>>>>> x86_pause();
> > >>>>>> }
> > >>>>>> }
> > >>>>>
> > >>>>> Why does it cause?
> > >>>>>
> > >>>>>
> > >>>>>> And yes, it used to survive many hibernations of the hosts before. I
> > >>>>>> only had to adjust the time after waking the host up.
> > >>>>>> On Sat, 15 Dec 2018 at 10:59, Chavdar Ivanov <ci4ic4%gmail.com@localhost> wrote:
> > >>>>>>>
> > >>>>>>> Hi,
> > >>>>>>>
> > >>>>>>> On 8.99.27 AMD64 running under VirtualBox I got this morning the panic
> > >>>>>>> in http://ci4ic4.tx0.org/ci4ic4-panic-01.png
> > >>>>>>>
> > >>>>>>> I have the coredump, if it is of interest. I thought it might be
> > >>>>>>> useful, as it is apparently in the wm driver.
> > >>>>>>>
> > >>>>>>> Chavdar
> > >>>>>>> --
> > >>>>>>> ----
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> -----------------------------------------------
> > >>>>> SAITOH Masanobu (msaitoh%execsw.org@localhost
> > >>>>> msaitoh%netbsd.org@localhost)
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> -----------------------------------------------
> > >>> SAITOH Masanobu (msaitoh%execsw.org@localhost
> > >>> msaitoh%netbsd.org@localhost)
> > >>
> > >>
> > >>
> > >
> > >
> >
> >
> > --
> > -----------------------------------------------
> > SAITOH Masanobu (msaitoh%execsw.org@localhost
> > msaitoh%netbsd.org@localhost)
>
>
>
> --
> ----
--
----
Home |
Main Index |
Thread Index |
Old Index