tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Why NetBSD x86's bus_space_barrier does not use [sml]fence?



On Mon, Dec 02, 2019 at 06:30:21AM +0000, Taylor R Campbell wrote:

> > Date: Fri, 29 Nov 2019 14:49:33 +0000
> > From: Andrew Doran <ad%netbsd.org@localhost>
> > 
> > It no, the CALL instruction that calls bus_space_barrier() produces a write
> > to the stack when storing the return address.  On x86, stores are totally
> > ordered, and loads are never reordered around stores.  No further barrier
> > is needed.  This is the idea anyway, sometimes reality does not match..
> 
> I don't understand this.  The Intel manual (volume 3 of the 3-volume
> manual, System Programming Guide, Sec. 8.2.2 `Memory Ordering in P6
> and More Recent Processor Families') says:

I have not been paying attention, refused to engage my brain and gotten
confused.  My fault.  On the case of ordering in WB regions I put a comment
in sys/arch/x86/include/lock.h to describe the situation, the last time this
discussion came up and it was in regards to locks.  I have strong suspicions
about what Intel writes but probably better not to muddy the discussion with
them!

We're talking UC/WB mappings here though, so:

> It's not entirely clear to me from the Intel manual, but in the AMD
> manual (e.g., Volume 2, chapter on Memory System) it is quite clear
> that write-combining `WC' regions may issue stores out of order -- and
> that's what you get from BUS_SPACE_MAP_PREFETCHABLE.
> 
> So it seems to me we really do need
> 
> switch (flags) {
> case BUS_SPACE_BARRIER_READ:
> 	lfence or locked instruction;
> 	break;
> case BUS_SPACE_BARRIER_WRITE:
> 	sfence or locked instruction;
> 	break;
> case BUS_SPACE_BARRIER_READ|BUS_SPACE_BARRIER_WRITE:
> 	mfence or locked instruction;
> 	break;
> }

I see a couple of problems with what's in x86_bus_space_barrier() now:

- It will apply to UC mappings too and it's an unneeded pessimisation,
  because the majority of bus_space acceses on x86 don't need any kind of
  barrier.  Maybe the other systems are good with that, I think we should
  try to do better.

- It will also apply to I/O space mappings.  There are, if I recall
  correctly, some drivers that can work with either I/O or memory mapped
  chips and bus_space is designed to allow this.  In the case of I/O port
  access there is no memory access going on (at least from the kernel's POV,
  I don't know how the chipset implements it under the covers).  There will
  also be drivers that are I/O port mapping only and the author has followed
  the bus_space manual page dilligently and put in barriers anyway.

I think the way to avoid both of those would be key off the bus_space_tag_t. 
There are already two, one for I/O and one for memory.  We could do multiple
memory tags.  That's probably very easy, I think the harder bit would likely
be passing those down to PCI and having it choose.  I don't know that code.
 
Andrew


Home | Main Index | Thread Index | Old Index