[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Problems with many DOMUs on a single DOM0.
At the risk of being completely and utterly wrong in a public forum, I would
you look at your open file descriptor limits to at least rule out the
xenconsoled is running out of file descriptors for the pty's it's managing. I
haven't looked into this too deeply or examined the source, but lsof seems to
indicate that there are 2 fd used per domU ( which makes sense ), plus a few
used for overhead. It wouldn't take long to run out if you didn't take some
steps to increase things from the defaults.
I've been using xen for a long time now -- nearly a decade -- and with each
major version, the console support seems to be improving. Once 4.2 hits pkgsrc
and has a chance to gel a bit, you may want to consider upgrading.
Also, with that many domUs, even with SSD, that's a lot of backend I/O, so
you'll also want to the normal steps to make sure dom0 gets the resources it
Best of luck!
On Mon, 7 Jan 2013 18:53:06 +0100
Johan Ihrén <johani%johani.org@localhost> wrote:
> All of this is NetBSD-6.0, XEN 3.3.2, with ptyfs mounted, all VND-devices
> created, etc. However, the results are basically the same for 5.2. I have
> looked at the XEN logs, but haven't found any clues there.
> I run many DOMUs on the same DOM0. No need for optimal performance, but
> strong need for many separate DOMUs. They are all file-backed, using VND and
> PV (not HVM). The DOM0 is always amd64, while the DOMUs used to be i386pae,
> but I'm migrating them to also be amd64.
> Previously over the years I've been limited by CPU, by disk IO, by available
> memory, etc, to make the reasonable limit around 30 DOMUs on a quad core box
> with 8GB memory and four SSDs, and that works like a charm. I.e. I've been
> constrained by the hardware, not the OS.
> But I would like to get to around 50-60 DOMUs and current hardware has enough
> cores and memory to provide that without too much fuss. I.e. if there are
> constraints now, they are likely OS or XEN constraints.
> And I'm running into problems. Several problems actually.
> As I start more DOMUs eventually I reach a point where the consoles no longer
> witch:labconfig# xm console domu38
> NetBSD/amd64 (domu38) (console)
> login: # login prompt, this DOMU is fine
> witch:labconfig# xm console domu39 # this one, however, is not:
> xenconsole: Could not read tty from store: No such file or directory
> It is interesting to note that the limit is "soft" in the sense that if I
> kill a couple of machines it is possible to start a few other ones that will
> then get working consoles. I.e. it is not a permanent resource exhaustion.
> What's also interesting, though, is that sometimes (but not always) "domu39"
> is fine, except for the lack of a console. I.e. as long as I don't screw up
> my networking, I can add some more DOMUs... until I hit the next problem.
> This time, all machines up to and including "domu44" was ok. But "dom45" is
> not working ("not working" defined as "doesn't respond to ping").
> There's another problem with non-working DOMUs, and that is that they tend to
> go to 100% CPU and stay there. It is not exactly clear to me when this
> happens. Sometimes it is immediately when the DOMU is created, sometimes I've
> been able to use a DOMU for hours with no problems (except lack of console)
> and then it goes to 100% CPU when try to kill it off with "xm shutdown"
> (which doesn't work). "xm destroy" does kill them off, though.
> And now it gets really strange. If I kill off the non-working DOMUs with "xm
> destroy" and then start them again then sometimes they work (still no
> console, but networking ok, so it is possible to get to them). This way, by
> booting DOMUs, and destroying and rebooting them until they work, I've been
> able to get to 52 working DOMUs, which is enough for me. But the last few
> machines are really skittish and may require several restarts before they
> work at all.
> And sometimes (but not always) I get problems with xend:
> Unable to connect to xend: Connection refused. Is xend running?
> xend IS running. But not functioning for some reason.
> When this happens, it is not possible to restart xend with "/etc/rc.d/xend
> restart". Only way to kill xend is with "kill -9" (it is in state "Il"). But
> once xend is restarted it is possible to recover without rebooting.
> The first problem (no console for machines ~40 and up) is likely some sort of
> PTY resource exhaustion, although I don't understand why or where. When it
> happens I've run a small python script to check whether (the python) openpty
> function is able to allocate a PTY and that seems to work ok. I used python
> only because xen is written in python. Other suggestions for what to try
> would be appreciated.
> The second problem (some DOMUs going to 100% CPU and in general not
> functioning) is probably more difficult. But without a console it is
> difficult to diagnose.
> The third problem (xend becoming catatonic) happens less frequently, and
> sometimes not at all. And as it is possible to recover by killing xend and
> restarting it it is less of a pain than the others. But there's still a
> problem in there somewhere.
> Suggestions anyone?
> Johan Ihrén
Main Index |
Thread Index |