Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Why NetBSD x86's bus_space_barrier does not use [sml]fence?



> Date: Fri, 29 Nov 2019 14:49:33 +0000
> From: Andrew Doran <ad%netbsd.org@localhost>
> 
> It no, the CALL instruction that calls bus_space_barrier() produces a write
> to the stack when storing the return address.  On x86, stores are totally
> ordered, and loads are never reordered around stores.  No further barrier
> is needed.  This is the idea anyway, sometimes reality does not match..

I don't understand this.  The Intel manual (volume 3 of the 3-volume
manual, System Programming Guide, Sec. 8.2.2 `Memory Ordering in P6
and More Recent Processor Families') says:

- Writes are not reordered with _older_ reads.
- Reads _may be_ reordered with older writes to different locations
  but not with older writes to the same location.
[emphasis added]

I understand this to mean that

	*q = y; x = *p;

can be reordered into

	x = *p; *q = y;

but not the other way around.  (For example, *q = y might be the store
into the stack for the return address of `call bus_space_barrier', and
x = *p might be bus_space_read_4(bst, bsh, off).)

Your reasoning seems at odds with our x86 definitions of membar_*:
membar_consumer (L|L) and membar_sync (LS|LS) either use a locked
instruction or are respectively LFENCE and MFENCE if available.  This
is especially puzzling because the revision history suggests that you
(ad) wrote the x86 membar_* and bus_space_barrier two months apart in
2007!


The above applies to normal write-back memory regions.  For
uncacheable `UC' regions -- which is most bus_space mappings for
device registers -- memory operations happen strictly in program
order, so for them, an empty bus_space_barrier is fine.

It's not entirely clear to me from the Intel manual, but in the AMD
manual (e.g., Volume 2, chapter on Memory System) it is quite clear
that write-combining `WC' regions may issue stores out of order -- and
that's what you get from BUS_SPACE_MAP_PREFETCHABLE.

So it seems to me we really do need

switch (flags) {
case BUS_SPACE_BARRIER_READ:
	lfence or locked instruction;
	break;
case BUS_SPACE_BARRIER_WRITE:
	sfence or locked instruction;
	break;
case BUS_SPACE_BARRIER_READ|BUS_SPACE_BARRIER_WRITE:
	mfence or locked instruction;
	break;
}

and, really, have needed it for the past 15-20 years, which would be a
little embarrassing.


Home | Main Index | Thread Index | Old Index