Subject: Re: HEAD instability on Xen
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Andrew Doran <ad@NetBSD.org>
List: port-xen
Date: 11/19/2007 21:19:14
On Mon, Nov 19, 2007 at 12:47:26PM +0100, Manuel Bouyer wrote:

> On Mon, Nov 19, 2007 at 12:00:00PM +0100, Manuel Bouyer wrote:
> > On Sun, Nov 18, 2007 at 11:12:38PM +0100, Manuel Bouyer wrote:
> > > On Sun, Nov 18, 2007 at 10:08:03PM +0000, Andrew Doran wrote:
> > > > 
> > > > Acutally, it already has all the necessary changes. Is there a point where
> > > > the kernel is entered with Xen that the direction flag might not be cleared?
> > > 
> > > I'll have a look. Could it be a missing lock or splxx() in the pmap ?
> > > 
> > > > I can't see one. It is cleared for the copyout() in the traceback that you
> > > > posted.
> > > > 
> > > > > I've also seen it in the pool code. It was always handling a trap after a
> > > > > copyin or copyout though.
> > > > 
> > > > I did many hours of low-memory stress testing on the updated pool code
> > > > before checking it in, so I don't believe that there is an (obvious) problem
> > > > there. It could be perhaps be related to the removal of the _CPU options.
> > > 
> > > No, I've seen it with a kernel from before the _CPU options removal
> > > (the bouyer-xenamd64-base2 tag in src/sys). 
> > 
> > I've tracked it down to a change between 2007.11.05.10.25.03 and
> > 2007.11.08.10.25.03 on HEAD. Here's another instance of the panic (with
> > 2007.11.08.10.25.03):
> 
> It's between 2007.11.07.00.00.00 and (2007.11.07.02.00.00 +
> xennetback_xenbus.c 1.19). 

I think I know what the problem is. pool_cache_invalidate() can't invalidate
CPU-local caches, so stale entries are being allocated from the pmap's PDP
cache. I'll fix it shortly.

Andrew