Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

NetBSD 9.2 AMD64: Starts first DOMU and disk performance drops by about 100x?



I have an extremely strange and frustrating issue with the performance of a box that runs many (~80) Xen DOMUs set up as a virtual training lab. The server has 12 cores, 32GB quality memory, 4 SSD drives and an M.2 card for storage (mostly VM images). I’ve used this server for years and while I’ve replaced disks and upgraded NetBSD the rest of the h/w has just kept ticking with no issues. Until this weekend.

Last week I taught a class on this server and my students used more than 90 VMs in a virtual topology for various lab exercises. Everything worked perfectly, no issues whatsoever. Then I shutdown the server and take it with me to a new city, unpack the box and as I prepare for a new class that starts tomorrow… setting up takes literally forever. It turns out that the issue is disk performance and more specifically disk performance *after* booting the first VM.

Here is an example of disk performance copying a file from one SSD to another *before* starting any DOMUs:

witch# time dd if=/usr/store/hdd0/vm/test.img of=/usr/store/hdd1/vm/test.img bs=4m count=16
16+0 records in
16+0 records out
67108864 bytes transferred in 0.503 secs (133417224 bytes/sec)

real    0m0.733s
user    0m0.000s
sys     0m0.320s

Then I start the first DOMU and do exactly the same thing again:

witch# time dd if=/usr/store/hdd0/vm/test.img of=/usr/store/hdd1/vm/test.img bs=4m count=16
16+0 records in
16+0 records out
67108864 bytes transferred in 53.595 secs (1252147 bytes/sec)

real    0m53.604s
user    0m0.010s
sys     0m27.902s

The difference is more than 100x! The numbers change a little depending on block size, size of file, etc, but that really doesn’t matter. It is still around 100x which is just insane.

There is nothing in the system logs, nothing in the xen logs, nothing (that I have found) anywhere. It is just that disk performance has dropped by 100x. Furthermore it doesn’t matter whether I read and write to the same disk or between disks. It doesn’t matter if I shutdown the DOMU (so that there is only the DOM0 running). Performance remains abysmal once there has been a DOMU running at some point since the DOM0 booted. 

The server is running stock NetBSD 9.2 installed in February 2022, i.e. it has been running perfectly for a long time and I have not made any upgrades since.

I’m at a complete loss. As there is nothing in any logs anywhere I don’t even know where to start. I have not touched the hardware in any way since a long time (other than putting the box in its travel case when I take it on airplanes (as carry-on, it’s been with me the entire time). 

Any suggestions welcome.

Regards,
Johan



Home | Main Index | Thread Index | Old Index