Hi Mathew, On 23.05.23 15:11, Mathew, Cherry G.* wrote:
     MP> I came across Qemu/NVMM more or less out of necessity, as I had
     MP> been struggling for some time to set up a proper Xen
     MP> configuration on newer NUCs (UEFI only). The issue I encountered
     MP> was with the graphics output on the virtual host, meaning that
     MP> the screen remained black after switching from Xen to NetBSD
     MP> DOM0. Since the device I had at my disposal lacked a serial
     MP> console or a management engine with Serial over LAN
     MP> capabilities, I had to look for alternatives and therefore got
     MP> somewhat involved in this topic.
     MP> I'm using the combination of NetBSD 9.3_STABLE + Qemu/NVMM on
     MP> small low-end servers (Intel NUC7CJYHN), primarily for classic
     MP> virtualization, which involves running multiple independent
     MP> virtual servers on a physical server. The setup I have come up
     MP> with works stably and with acceptable performance.
I have a follow-on question about this - Xen has some config tooling
related to startup - so you can say something like
'xendomains = dom1, dom2' in /etc/rc.conf, and these domains will be
started during bootup.
If you did want that for nvmm, what do you use ?
Unfortunately, I didn't find anything suitable and was in a big hurry to make the issue controllable for me. Therefore I wrote a shellscript quick and dirty. It encapsulates the aspects of starting VMs from the command line and from an rc script, creating appropriate Unix domain sockets to serve the guest's serial terminal and the Qemu frontend's monitoring console. If you want to have a look at it, I have uploaded it here (unfortunately without documentation and with a big warning that it is all done with a hot needle):
https://forge.petermann-it.de/mpeterma/vmctl
     MP> Scenario:
     MP> I have a small root filesystem with FFS on the built-in SSD, and
     MP> the backing store for the VMs is provided through ZFS ZVOLs. The
     MP> ZVOLs are replicated alternately every night (full and
     MP> incremental) to an external USB hard drive.
Are these 'zfs send' style backups ? or is the state on the backup USB
hard drive ready for swapping, if the primary fails for eg ?
Yes, I use zfs send, saving the stream from zfs send to files on the USB drive for my regular backups. So they are not directly usable. The idea is interesting though - I chose this way back then because I do it quite similar on my FFS systems with dump and the incremental aspect was important to me. On the other hand, I've also tested pulling a zfs send of all ZVOLs from the mini-server to my laptop, and then playing around locally with Qemu/nvmm with a "production copy".
     MP> There are a total of 5 VMs:
     MP>     net (DHCP server, NFS and SMB server, DNS server) app
     MP> (Apache/PHP-FPM/PostgreSQL hosting some low-traffic web apps)
     MP> comm (ZNC) iot (Grafana, InfluxDB for data collection from two
     MP> smart meters every 10 seconds) mail (Postfix/Cyrus IMAP for a
     MP> handful of mailboxes)
     MP> Most of the time, the Hosts CPU usage of the host with this
     MP> "load" is around 20%. The provided services consistently respond
     MP> quickly.
Ok - and these are accounted as the container qemu processes' quota
scheduling time, I assume ? What about RAM ? Have you had a situation
where the host OS has to swap out ? Does this cause trouble ? Or does
qemu/nvmm only use pinned memory ?
I configured the VMs' RAM to have a few hundred MB buffer left on the host. Memory has run out in the past, especially when zfs send makes use of the buffer cache. Then swapping also occurred and together with the i/o load already increased by zfs send, the system was slowed down so badly that the response times were no longer acceptable. A complete recovery brought in this state also only a restart of the host. I got this under control with a tip someone gave me in #netbsd - I now pipe the output of zfs send first into dd, which has set the oflag "direct" and takes over the writing of the file. Obviously this bypasses some of the caching and avoids this situation.
Regarding pinned memory I can't say anything - the memory consumption of the VMs is stable from the host point of view, ballooning I haven't really tried with it yet.
     MP> However, I have noticed that depending on the load, the clocks
     MP> of the VMs can deviate significantly. This can be compensated
     MP> for by using a higher HZ in the host kernel (HZ=1000) and
     MP> tolerant ntdps configuration in the guests. I have also tried
     MP> various settings with schedctl, especially with the FIFO
     MP> scheduler, which helped in certain scenarios with high I/O
     MP> load. However, this came at the expense of stability.
I assume this is only *within* your VMs, right ? Do you see this across
guest Operating Systems, or just specific ones ?
The deviation of the time is caused by missed interrupts of the guests. As I said, there are a number of workarounds for this and a number of very good explanations in this thread:
https://mail-index.netbsd.org/netbsd-users/2022/08/31/msg028894.htmlI do not use operating systems other than NetBSD as guests in this setup. As a test, I also had various Linux distributions running under nvmm. I didn't do the tests in depth, but I had a test VM with Alpine Linux running for a while and had the impression that this ran as well as NetBSD.
     MP> Furthermore, in my system configuration, granting a guest more
     MP> than one CPU core does not seem to provide any
     MP> advantage. Particularly in the VMs where I am concerned about
     MP> performance (net with Samba/NFS), my impression is that
     MP> allocating more CPU cores actually decreases performance even
     MP> further. I should measure this more precisely someday...
ic - this is interesting - are you able to run some tests to nail this
down more precisely ?
I should definitely do that and if you have a specific idea of what I should try once, feel free to let me know. I think that the observations from back then should also be seen in the context of my concrete system. Since I have only two CPU cores available, virtually with one Qemu process and one Qemu IO thread running outside the Qemu process, both cores are already fully occupied under full I/O load of one VM. Therefore, it seems to me in this setup not so improbable that when adding another Qemu process (for the 2nd CPU of the VM) then resources become rare.
     MP> If you have specific questions or need assistance, feel free to
     MP> reach out. I have documented everything quite well, as I
     MP> intended to contribute it to the wiki someday. By the way, I am
     MP> currently working on a second identical system where I plan to
     MP> test the combination of NetBSD 10.0_BETA and Xen 4.15.
There's quite a bit of goodies wrt Xen in 10.0 - mainly you can now run
accelerated as a Xen guest (hvm with the PV drivers active).
For now I only use the "conventional" PV for my guest systems. But I also have a pure NetBSD setup here at the moment. I'm curious about the comparison myself. Currently I have measured about 5 times the bandwidth with Xen on the identical hardware with Samba from a VM when transferring a large file minus all caching effects. This is my focus at the moment, because on the Xen system I use VNDs on FFS, while on the Qemu/nvmm ZVOLS are in use. There are too many variables in the equation at the moment ;-)
Kind regards Matthias
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature