Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: HVM performance deficiencies



On Thu, Jan 14, 2010 at 18:21:30 +0100, Wolfgang Solfrank wrote:
> NetBSD has several performance deficiencies when run as an HVM guest
> under Xen.  The problem stems from the fact that the qemu hardware
> emulation isn't completely compatible with real hardware, and our
> drivers misidentify and/or misuse some features:

I noticed the very same thing when I experimented with NetBSD HVM
guests a few years ago.

I patched the ata and re driver sources (see below). The changes are
conditioned on preprocessor symbols, so the right options have to be
used in the HVM optimized kernel config file. The patches are against
NetBSD 4, which I was using at the time, but should be trivial to port
to NetBSD 5 or -current.

> 1. While the emulated IDE-controller (part of the PIIX3 chipset)
> is capable of DMA (but not UDMA), and the emulated disk claims to
> support DMA mode 2 and UDMA mode 5, this is not detected.  The
> problem is that the disk claims to not support any PIO modes.
> Our ata driver finds this suspicious and assumes that DMA would
> be buggy, too (see sys/dev/ata.c, around line 1260), using only
> PIO mode 0 subsequently.  This results in excess interrupt
> handling overhead in the wd driver.

This is a trivial changes, just ignore (and log) the PIO error...

Index: ata.c
===================================================================
RCS file: /cvsroot/src/sys/dev/ata/ata.c,v
retrieving revision 1.83
diff -u -r1.83 ata.c
--- ata.c       16 Nov 2006 01:32:47 -0000      1.83
+++ ata.c       20 Jan 2010 12:18:53 -0000
@@ -1227,7 +1227,13 @@
                         * We didn't find a valid PIO mode.
                         * Assume the values returned for DMA are buggy too
                         */
+#ifdef ATA_IGNORE_BAD_PIO
+                       aprint_normal("%s: drive supports no PIO",
+                           drv_dev->dv_xname);
+                       sep = ",";
+#else
                        return;
+#endif
                }
                s = splbio();
                drvp->drive_flags |= DRIVE_MODE;



> 2. The emulated realtek 8139C+ device doesn't implement the
> timer interrupt (while there is a compile time option in qemu
> to enable this, it's normally disabled, and I'm not sure whether
> it would be of much help either).  This interrupt is unconditionally
> (in contrast to the FreeBSD driver, which our driver claims to
> be based upon) used within our re(4) driver to reduce the transmission
> interrupt rate, i.e., the driver doesn't use 'TX done' interrupts, but
> instead posts several packets for transmission and then starts
> the timer for some short period, handling transmission completion
> when the timer triggers.  As the emulated device doesn't implement
> this timer, we get a lot of watchdog timeouts (see PR#41679).

This is just a little bit more complex. If I remember correctly, I
copied/ported the relevant parts from the FreeBSD 8139C+ driver
(sys/dev/re/if_re.c), which as you note already have conditional
compilation of interrupt moderation (turned off by default, actually).

Index: rtl8169.c
===================================================================
RCS file: /cvsroot/src/sys/dev/ic/rtl8169.c,v
retrieving revision 1.72.2.9
diff -u -r1.72.2.9 rtl8169.c
--- rtl8169.c   24 Mar 2008 20:50:33 -0000      1.72.2.9
+++ rtl8169.c   20 Jan 2010 12:19:50 -0000
@@ -110,6 +110,10 @@
  * driver is 7500 bytes.
  */
 
+#ifndef RE_NO_TX_MODERATION
+#define RE_TX_MODERATION
+#endif
+
 #include "bpfilter.h"
 #include "vlan.h"
 
@@ -1356,14 +1360,16 @@
        if (sc->re_ldata.re_txq_free > RE_NTXDESC_RSVD)
                ifp->if_flags &= ~IFF_OACTIVE;
 
-       /*
-        * If not all descriptors have been released reaped yet,
-        * reload the timer so that we will eventually get another
-        * interrupt that will cause us to re-enter this routine.
-        * This is done in case the transmitter has gone idle.
-        */
        if (sc->re_ldata.re_txq_free < RE_TX_QLEN) {
+#ifdef RE_TX_MODERATION
+               /*
+                * If not all descriptors have been released reaped yet,
+                * reload the timer so that we will eventually get another
+                * interrupt that will cause us to re-enter this routine.
+                * This is done in case the transmitter has gone idle.
+                */
                CSR_WRITE_4(sc, RTK_TIMERCNT, 1);
+#endif
                if ((sc->sc_quirk & RTKQ_PCIE) != 0) {
                        /*
                         * Some chips will ignore a second TX request
@@ -1495,8 +1501,12 @@
                if (status & (RTK_ISR_RX_OK | RTK_ISR_RX_ERR))
                        re_rxeof(sc);
 
-               if (status & (RTK_ISR_TIMEOUT_EXPIRED | RTK_ISR_TX_ERR |
-                   RTK_ISR_TX_DESC_UNAVAIL))
+#ifdef RE_TX_MODERATION
+               if (status & (RTK_ISR_TIMEOUT_EXPIRED |
+#else
+               if (status & (RTK_ISR_TX_OK |
+#endif
+                       RTK_ISR_TX_ERR | RTK_ISR_TX_DESC_UNAVAIL))
                        re_txeof(sc);
 
                if (status & RTK_ISR_SYSTEM_ERR) {
@@ -1749,6 +1759,7 @@
                else
                        CSR_WRITE_1(sc, RTK_GTXSTART, RTK_TXSTART_START);
 
+#ifdef RE_TX_MODERATION
                /*
                 * Use the countdown timer for interrupt moderation.
                 * 'TX done' interrupts are disabled. Instead, we reset the
@@ -1758,6 +1769,7 @@
                 * the timer count is reset to 0.
                 */
                CSR_WRITE_4(sc, RTK_TIMERCNT, 1);
+#endif
 
                /*
                 * Set a timeout in case the chip goes out to lunch.
@@ -1923,6 +1935,7 @@
        CSR_WRITE_1(sc, RTK_COMMAND, RTK_CMD_TX_ENB | RTK_CMD_RX_ENB);
 #endif
 
+#ifdef RE_TX_MODERATION
        /*
         * Initialize the timer interrupt register so that
         * a timer interrupt will be generated once the timer
@@ -1930,18 +1943,18 @@
         * reloaded on each transmit. This gives us TX interrupt
         * moderation, which dramatically improves TX frame rate.
         */
-
        if ((sc->sc_quirk & RTKQ_8139CPLUS) != 0)
                CSR_WRITE_4(sc, RTK_TIMERINT, 0x400);
-       else {
+       else
                CSR_WRITE_4(sc, RTK_TIMERINT_8169, 0x800);
+#endif
 
-               /*
-                * For 8169 gigE NICs, set the max allowed RX packet
-                * size so we can receive jumbo frames.
-                */
+       /*
+        * For 8169 gigE NICs, set the max allowed RX packet
+        * size so we can receive jumbo frames.
+        */
+       if ((sc->sc_quirk & RTKQ_8139CPLUS) == 0)
                CSR_WRITE_2(sc, RTK_MAXRXPKTLEN, 16383);
-       }
 
        if (sc->re_testmode)
                return 0;




Regards,
Michael Eriksson


Home | Main Index | Thread Index | Old Index