Hello,I'm trying to find the cause of a performance problem and don't really know how to proceed.
## Test SetupGiven a host (Intel NUC7CJYHN, 2 physical cores, 8 GB RAM, 500 GB SSD) with a fresh NetBSD/amd64 9.3_STABLE. The SSD contains ESP, an FFS root partition and swap, and a large ZPOOL.
The host is to be used as a virtual host for VMs. For this, VMs are run with Qemu 7.0.0 (from pkgsrc 2022Q2) and nvmm. The VMs also run NetBSD 9.3. Storage is provided by ZVOLs through virtio.
Before explaining the issue I face, here some rounded numbers showing the performance of the host OS (sampled with iostat):
1) Starting one writer ``` # dd if=/dev/zero of=/dev/zvol/rdsk/tank/vol/test1 bs=4m & ```` ---> ~ 200 MByte/s 2) Adding another writer ``` # dd if=/dev/zero of=/dev/zvol/rdsk/tank/vol/test2 bs=4m & ``` ---> ~ 300 MByte/s 3) Adding another writer ``` # dd if=/dev/zero of=/dev/zvol/rdsk/tank/vol/test3 bs=4m & ``` ---> ~ 500 MByte/sFrom my understanding, this represents the write performance I can expect with my hardware when I write raw data in parallel to discrete ZVOLS located on the same physical storage (SSD).
This picture changes completely when Qemu comes into play. I did install a basic NetBSD 9.3 on each of the ZVOLs (standard layout with FFSv2 + WAPBL) and operate them with this QEMU command:
```qemu-system-x86_64 -machine pc-q35-7.0 -smp $VM_CORES -m $VM_RAM -accel nvmm \
                        -k de -boot cd \
                        -machine graphics=off -display none -vga none \
                        -object 
rng-random,filename=/dev/urandom,id=viornd0 \
                        -device virtio-rng-pci,rng=viornd0 \
                        -object iothread,id=t0 \
                        -device virtio-blk-pci,drive=hd0,iothread=t0 \
                        -device virtio-net-pci,netdev=vioif0,mac=$VM_MAC \
                        -chardev 
socket,id=monitor,path=$MONITOR_SOCKET,server=on,wait=off \
                        -monitor chardev:monitor \
                        -chardev 
socket,id=serial0,path=$CONSOLE_SOCKET,server=on,wait=off \
                        -serial chardev:serial0 \
                        -pidfile /tmp/$VM_ID.pid \
                        -cdrom $VM_CDROM_IMAGE \
                        -drive 
file=$VM_HDD_VOLUME,if=none,id=hd0,format=raw \
                        -netdev 
tap,id=vioif0,ifname=$VM_NETIF,script=no,downscript=no \
                        -device virtio-balloon-pci,id=balloon0
```
The command already includes following optimizations:
 - use virtio driver instead of emulated SCSI device
 - use a separate I/O thread for block device access
## Test Case 1
The environment is set for this test:
 - VM_CORES: 1
 - VM_RAM: 256
 - ....
 - VM_HDD_VOLUME (e.g. /dev/zvol/rdsk/tank/vol/test3), each VM has its 
dedicated ZVOL
My test case is the following: 0) Launch iostat -c on the Host and monitor continuously 1) Launch 3 instances of the VM configuration (vm1, vm2, vm3) 2) SSH into vm1 3) Issue dd if=/dev/zero of=/root/test.img bs=4m Observation: iostat on Host shows ~140 MByte /s 4) SSH into vm2 5) Issue dd if=/dev/zero of=/root/test.img bs=4m Observation: iostat on Host shows ~180 MByte /s 6) SSH into vm3 7) Issue dd if=/dev/zero of=/root/test.img bs=4m Observation: iostat on Host shows ~220 MByte /s Intermediate summary: - pretty good results :-) - with each additional writer, the bandwidth utilization raises ## Test Case 2 The environment is modified for this test: - VM_CORES: 2 The same test case yields completely different results: 0) Launch iostat -c on the Host and monitor continuously 1) Launch 3 instances of the VM configuration (vm1, vm2, vm3) 2) SSH into vm1 3) Issue dd if=/dev/zero of=/root/test.img bs=4m Observation: iostat on Host shows ~ 30 MByte /s 4) SSH into vm2 5) Issue dd if=/dev/zero of=/root/test.img bs=4m Observation: iostat on Host shows ~ 3 MByte /s 6) SSH into vm3 7) Issue dd if=/dev/zero of=/root/test.img bs=4m Observation: iostat on Host shows < 1 MByte /s Intermediate summary: - unexpected bad performance - even with one writer performance is far below the values compared to using only one core per VM - bandwidth drops dramatically with each additional writer ## Summary and Questions- adding more cores to Qemu seems to considerable impact disk I/O performance
- Is this expected / known behavior? - What could I do to mitigate / help to find root cause?By the way - except for this hopefully solvable problem I am surprised how well the team of NetBSD, ZVOL, Qemu and NVMM works.
Kind regards Matthisa
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature