Qemu storage performance drops when smp > 1 (NetBSD 9.3 + Qemu/nvmm + ZVOL)

To: current-users%netbsd.org@localhost, netbsd-users%netbsd.org@localhost
Subject: Qemu storage performance drops when smp > 1 (NetBSD 9.3 + Qemu/nvmm + ZVOL)
From: Matthias Petermann <mp%petermann-it.de@localhost>
Date: Wed, 17 Aug 2022 17:43:08 +0200

Hello,

I'm trying to find the cause of a performance problem and don't reallyknow how to proceed.



## Test Setup

Given a host (Intel NUC7CJYHN, 2 physical cores, 8 GB RAM, 500 GB SSD)with a fresh NetBSD/amd64 9.3_STABLE. The SSD contains ESP, an FFS rootpartition and swap, and a large ZPOOL.

The host is to be used as a virtual host for VMs. For this, VMs are runwith Qemu 7.0.0 (from pkgsrc 2022Q2) and nvmm. The VMs also run NetBSD9.3. Storage is provided by ZVOLs through virtio.

Before explaining the issue I face, here some rounded numbers showingthe performance of the host OS (sampled with iostat):


1) Starting one writer

```
# dd if=/dev/zero of=/dev/zvol/rdsk/tank/vol/test1 bs=4m &
````

---> ~ 200 MByte/s

2) Adding another writer

```
# dd if=/dev/zero of=/dev/zvol/rdsk/tank/vol/test2 bs=4m &
```

---> ~ 300 MByte/s

3) Adding another writer

```
# dd if=/dev/zero of=/dev/zvol/rdsk/tank/vol/test3 bs=4m &
```

---> ~ 500 MByte/s

From my understanding, this represents the write performance I canexpect with my hardware when I write raw data in parallel to discreteZVOLS located on the same physical storage (SSD).

This picture changes completely when Qemu comes into play. I did installa basic NetBSD 9.3 on each of the ZVOLs (standard layout with FFSv2 +WAPBL) and operate them with this QEMU command:

```

qemu-system-x86_64 -machine pc-q35-7.0 -smp $VM_CORES -m $VM_RAM -accelnvmm \

                        -k de -boot cd \
                        -machine graphics=off -display none -vga none \

-objectrng-random,filename=/dev/urandom,id=viornd0 \

                        -device virtio-rng-pci,rng=viornd0 \
                        -object iothread,id=t0 \
                        -device virtio-blk-pci,drive=hd0,iothread=t0 \
                        -device virtio-net-pci,netdev=vioif0,mac=$VM_MAC \

-chardevsocket,id=monitor,path=$MONITOR_SOCKET,server=on,wait=off \

                        -monitor chardev:monitor \

-chardevsocket,id=serial0,path=$CONSOLE_SOCKET,server=on,wait=off \

                        -serial chardev:serial0 \
                        -pidfile /tmp/$VM_ID.pid \
                        -cdrom $VM_CDROM_IMAGE \

-drivefile=$VM_HDD_VOLUME,if=none,id=hd0,format=raw \-netdevtap,id=vioif0,ifname=$VM_NETIF,script=no,downscript=no \

                        -device virtio-balloon-pci,id=balloon0
```

The command already includes following optimizations:

 - use virtio driver instead of emulated SCSI device
 - use a separate I/O thread for block device access


## Test Case 1

The environment is set for this test:

 - VM_CORES: 1
 - VM_RAM: 256
 - ....

- VM_HDD_VOLUME (e.g. /dev/zvol/rdsk/tank/vol/test3), each VM has itsdedicated ZVOL


My test case is the following:

0) Launch iostat -c on the Host and monitor continuously
1) Launch 3 instances of the VM configuration (vm1, vm2, vm3)
2) SSH into vm1
3) Issue dd if=/dev/zero of=/root/test.img bs=4m
   Observation: iostat on Host shows ~140 MByte /s
4) SSH into vm2
5) Issue dd if=/dev/zero of=/root/test.img bs=4m
   Observation: iostat on Host shows ~180 MByte /s
6) SSH into vm3
7) Issue dd if=/dev/zero of=/root/test.img bs=4m
   Observation: iostat on Host shows ~220 MByte /s

Intermediate summary:

 - pretty good results :-)
 - with each additional writer, the bandwidth utilization raises


## Test Case 2

The environment is modified for this test:

 - VM_CORES: 2

The same test case yields completely different results:

0) Launch iostat -c on the Host and monitor continuously
1) Launch 3 instances of the VM configuration (vm1, vm2, vm3)
2) SSH into vm1
3) Issue dd if=/dev/zero of=/root/test.img bs=4m
   Observation: iostat on Host shows ~ 30 MByte /s
4) SSH into vm2
5) Issue dd if=/dev/zero of=/root/test.img bs=4m
   Observation: iostat on Host shows ~ 3 MByte /s
6) SSH into vm3
7) Issue dd if=/dev/zero of=/root/test.img bs=4m
   Observation: iostat on Host shows < 1 MByte /s

Intermediate summary:

 - unexpected bad performance - even with one writer performance is far
   below the values compared to using only one core per VM
 - bandwidth drops dramatically with each additional writer

## Summary and Questions

- adding more cores to Qemu seems to considerable impact disk I/Operformance

 - Is this expected / known behavior?
 - What could I do to mitigate / help to find root cause?

By the way - except for this hopefully solvable problem I am surprisedhow well the team of NetBSD, ZVOL, Qemu and NVMM works.



Kind regards
Matthisa

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Follow-Ups:
- Re: Qemu storage performance drops when smp > 1 (NetBSD 9.3 + Qemu/nvmm + ZVOL)
  - From: Brian Buhrow

Prev by Date: Re: Xeon E3-1225 v2 Ivy bridge and MB DH67BL support
Next by Date: Re: Qemu storage performance drops when smp > 1 (NetBSD 9.3 + Qemu/nvmm + ZVOL)
Previous by Thread: Xeon E3-1225 v2 Ivy bridge and MB DH67BL support
Next by Thread: Re: Qemu storage performance drops when smp > 1 (NetBSD 9.3 + Qemu/nvmm + ZVOL)
Indexes:

Home | Main Index | Thread Index | Old Index