Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

"route_enqueue: queue full, dropped message" blast from a 8.99.32 amd64 domU



So, I had a weird thing happen on one of my regularly used Xen-hosted
virtual servers this morning...

The host is a Dell PE2950, running Xen-4.5 with 8.99.32 amd64.

The domU is also 8.99.32 amd64.

(I'm in the slow process of upgrading packages so I can upgrade Xen, but
that's not been completed quite yet.)

From all observations on shell and xterm sessions to the domU just
seemed very sluggish and sometimes non-responsive.  "systat vm" reported
a very high "interrupts" count, and a rather high "sys" CPU use.

The console seemed completely dead, but had reported a stream of
messages like:

[Thu May  9 09:24:08 2019][ 6442662.0806318] route_enqueue: queue full, dropped message

There were thousands of identical lines, all separated by a few
microseconds.  No doubt this spew was the real cause of the apparent
interrupt storm and the resulting sluggishness.

The other domUs and the dom0 seemed A-OK.

So I decided to reboot it from the dom0 and it did the right thing:

[Thu May  9 10:09:46 2019][ 6445400.3265991] xenbus_shutdown_handler: xenbus_rm 13
[Thu May  9 10:09:46 2019]May  9 10:09:46 future shutdown: poweroff by root: power button pressed 
[Thu May  9 10:10:05 2019]May  9 10:10:05 future syslogd[155]: Exiting on signal 15
[Thu May  9 10:10:40 2019][ 6445454.6233182] syncing disks... 2 done
[Thu May  9 10:10:40 2019][ 6445454.8073215] unmounting 0xffffbe00102cb008 /more/archive (more.local:/archive)...
[Thu May  9 10:10:40 2019][ 6445454.9233295] ok
[Thu May  9 10:10:40 2019][ 6445454.9233295] unmounting 0xffffbe00102c6008 /more/home (more.local:/home)...

But "Because NFS" it stuck there trying to unmount /home and I ended up
typing the unfortunate command:

	xl destroy future

I've never had to be quite so emphatic before!  :-)

However rebooting got the "future" running quite happily again!

As mentioned it's been taking a while to upgrade, and the whole Xen
server and all its production domains has been running for 87 days.

However when I looked back through the console log I was surprised to
find another blast of these messages from two months ago (after nearly a
month of uptime).  However that spew stopped without me knowingly
intervening, after nearly 7000 lines (but just 20 seconds elapsed),
though curiously there's another odd message within seconds of the spew
stopping.

[Wed Mar 20 16:19:01 2019][ 2147554.5719048] route_enqueue: queue full, dropped message
[Wed Mar 20 16:19:09 2019][ 2147562.9851727] pid 28947 (emacs): user write of 1019904@0x3640000 at 48052784 failed: 28

If that last message is from a core dump, it might have been caused by
the route_enquue problem (because it lost its X11 connection and emacs
likes to dump core when that happens), or it might have caused the
problem since it would have been dumping to an NFS server (because emacs
on rare occasions ups and dumps core when you least expect it to, though
thankfully far less so in recent releases).

Today though I don't think there was a core dump -- I was using two
different emacs sessions on that host while experiencing the sluggish
behaviour right up until it got too sluggish to use.  There are no other
interesting messages in my console logs.

Does anyone have any clues/suggestions/questions for me?

-- 
					Greg A. Woods <gwoods%acm.org@localhost>

+1 250 762-7675                           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgpDmL5dw7PeG.pgp
Description: OpenPGP Digital Signature



Home | Main Index | Thread Index | Old Index