Subject: Re: i386 pmap bug
To: Neil Ludban <nludban@columbus.rr.com>
From: Stephan Uphoff <ups@stups.com>
List: port-i386
Date: 02/05/2003 00:49:09
Neil Ludban wrote:

> The pmap lock only prevents another CPU from allocating a former pmap
> page (PDE or PTE) and modifying it for use as a new pmap page?  A couple
> situations left that could use a quick explanation:
> 
> Thread 1, on CPU 1, unmaps a 4MB memory region.  At the same time, thread
> 2 of the same process, running on CPU 2, is in a loop writing one byte to
> each page in the region in a non-random attempt to generate table walks
> using the cached PDE.
> 
> The former PTE page, which is still pointed to by the PDE in the TLB, is
> allocated from the free list by a process or interrupt running on CPU 3,
> for internal use by any kernel subsystem other than pmap.
>

Exactly !
Thanks for the nice example.
 
Frank van der Linden wrote:
> > As for 'random table walks'.. I guess the manual text you quoted doesn't
> > rule that out 100%, but it would seem very unlikely. If this really would
> > be a problem, you'd have to wait with freeing the page until the shootdown
> > has been handled. Which might not be hard to implement.  Though I really
> > don't see this happening.. I think I need to be convinced by actual
> > occurences of the problem :-) Or maybe an Intel engineer who will
> > confirm that this can happen.
> > 
> > - Frank
> 

'Random table walks' is probably a bad description on what is going on.
( Also it is a good description of the overall effect)

As far as I understand it the following happens:
	Speculative TLB loads are triggered by speculative data loading.
	This means every miss-predicted branch in execution might lead
	to (speculatively) dereferencing an invalid pointer.

	If an invalid pointer contains a memory address to the right 4MB region and is
	speculatively dereferenced while the PTP is being reused and the PDE is not
	yet flushed 'speculative page walks' can cause messy problems.

Unlikely ?    - Definitely.
Impossible ?  - No

A little searching also shows a good example:
Consider pmap_collect: Is unmaps the lowest 4MB region (and more).
However there are probably hundreds of speculative dereferencing of NULL 
pointers
before the PDE - TLB entry is finally flushed.
( And TLB entries speculatively loaded from reused PTP might have the global 
bit set !)

How can you ever find a problem like this in the field ?
The random entry in the TLB is invisible, might have disappeared by the time 
you drop
into the debugger and never shows on a core-dump.
( A true heisenbug? :-)
 
Frank did this convince you ?

I agree that the fix is simple to postpone freeing the page until the PDE is 
flushed.
( I am willing to contribute a patch - but any excuse not to work is always 
welcome
  - just let know)

-------

Some more nitpicking:
 
I still have other concerns for multiprocessor systems:
	Case I
		Processor A loads PTE X into TLB  (X allows read write)
		for a read access.
		Processor B zaps PTE X
		Processor A modifies (for the first time) a page described by X 
                (X still resident in TLB of A)	
	
	Case II - not a problem - just interesting
		Processor A speculatively loads PTE X into TLB 
		Processor B zaps PTE X
		Processor A uses the PTE X. (X still resident in A)

Will processor A in both cases
	(1) reload the PTE into the TLB before setting referenced or dirty bits ?
	(2) blindly set the bits without updating the TLB ?

The Intel documentation does not describe what will happen.

If (2) is true the TLB shootdown code could loose a dirty bit.
If (1) is true we could avoid TLB shoot-down for entries that were never 
referenced and
avoid TLB shootdown on remapping of read-only entries that never had the dirty 
bit set.

Anyone knows an Intel engineer who could help ?
Any documentation that might answer the questions? 

I might be able to write some TLB blackbox tests in the next week or so ...
... but results, from limited blackbox tests on a single CPU type, are not
what I would want to base the TLB shootdown code on. 


	Stephan