tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Understanding PR kern/43997 (kernel timing problems / qemu)



Robert Elz wrote:
> I want to leave /bin/sh to percolate for a while, make sure there are
> no issues with it as it is, before starting on the next round of
> cleanups and bug fixes, so I was looking for something else to poke
> my nose into ...
> 
> [Aside: the people I added to the cc of this message are those who have
>  added text to PR kern/43997 and so I thought might be interested, if you're
>  not, just say...]
> 
> kern/43997 is the "qemu is too slow, clock interrupts get lost, timing
> gets all messed up" problem that plagues many of the ATF tests that kind
> of expect time to be maintained rationally.

Thank you for looking into this.

> Now there's no question that qemu is slow, for example, on my amd64 Xen
> DomU test system, the shell arithmetic test of ++x (etc) takes:
> 	var_preinc: [0.077617s] Passed.
> whereas from the latest completed b5 (qemu) test run (as of this e-mail)
> 	var_preinc   Passed   N/A   6.200489s
> 
> That's about 80 times slower (and most of the other tests show similar
> factors).   I don't think we can blame qemu for that, given what it is
> doing.
> 
> So, it is hardly surprising that, to borrow Paul's words from the PR:
> 	On (at least) amd64 architecture, qemu cannot simulate clock
> 	interrupts at 100Hz.

I don't think the slowness of qemu's emulation is the actual cause of
its inability to simulate clock interrupts at 100 Hz.  Rather, I think
it is more likely caused by the inability of qemu to sleep for periods
shorter than 10 ms due to limitations of the underlying host OS, such
as that documented in the BUGS section of nanosleep(2).

That this is at least partly a host system issue is supported by the
observation that when qemu is hosted on a Linux system, the timing in
the NetBSD guest is much more accurate than when qemu is hosted on
NetBSD, on similar hardware:

  NetBSD-on-qemu-on-NetBSD# time sleep 10
	 13.00 real         0.00 user         0.03 sys

  NetBSD-on-qemu-on-Linux# time sleep 10
	 10.13 real         0.02 user         0.02 sys

If my theory is correct, there are at least three ways the problem
could be fixed:

 - Improve the time resolution of sleeps on the host system, as
   recently discussed on tech-kern in a thread starting with
   http://mail-index.netbsd.org/tech-kern/2017/07/02/msg022024.html

 - Make qemu deal better with hosts unable to sleep for short
   periods of time, or

 - Make the guest system deal better with missed timer interrupts.

-- 
Andreas Gustafsson, gson%gson.org@localhost


Home | Main Index | Thread Index | Old Index