Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: AVX support



Le 26/12/2016 à 23:04, David Laight a écrit :
On Mon, Aug 15, 2016 at 09:57:05AM +0200, Maxime Villard wrote:
Le 12/08/2016 ? 18:22, Erik Fair a ?crit :
Has AVX support in NetBSD (context switching) been revisited yet?


I did check for that two or three weeks ago, and as far as I can tell,
NetBSD supports AVX-256 and AVX-512 (contrary to the other BSDs).

I added support for everything that would trigger 'fpu exception'
interrupts.
I think the latest extensions are fpu ones - so won't be saved
correctly.

They are saved correctly, since xcr0 also has the AVX flags (see XCR0_FPU), and
the size of the save area is computed with %ecx of leaf 0x0D, which is the
maximum save area available on the CPU (see x86_fpu_save_size).

The latter means that if the CPU supports more instructions than are enabled
in our XCR0_FPU, we end up with a x86_fpu_save_size that is bigger than the
actual saved states, which means we are copying some garbage. But I remember
convincing myself it wasn't harmful.

Back then I had written a patch to switch to a dynamic FPU area - that is, the
kernel allocates dynamically an fpu area for each lwp, the first time they use
the fpu. But for some reason I've never committed it.


When context-switching, the fpu is saved, and the AVX states are part of
this since they are explicitly enabled in XCR0 [1].
By the way, there are already definitions for the AVX structures
in the XSAVE area [2] (which are never used
since they don't need to be).

One huge problem that struck me was that the fpu state is placed in the
kernel stack, which means we lose ~2500 bytes of stack.
With the latest MPX states and the future extensions, we will lose
even more memory. The priority is moving the fpu out of the pcb.
...

There isn't really a problem.

There, I'm not sure I understand correctly:

All of the AVX (and later) registers are defined to be caller saved.

?

The system call entry path can 'invalidate' all the extra state so
that a sleep with a deep kernel stack will only ever save the x87
registers.

I'm not sure what you say is correct (unless I am completely mistaken about how
our fpu stuff works). We don't use a "compact" save area, and it means that each
register state is stored at a fixed offset in the stack. If two lwps use AVX on
the same cpu, the second lwp will trigger a dna, which will force the CPU to
store data at that fixed offset. And given that the AVX states are stored at the
end of the area, we are writing closer to the kernel stack than if we just had
to save x87.

Also, storing fpu data is done regardless of how big the kernel stack is; so
even the compact format wouldn't quite solve our problem. In short, I don't see
what you mean.

The full 2.5k+ would only need saving after a context switch from
a hardware interrupt.

? Unless you include the DNA interrupt in the "hardware interrupt" class?

So the save area can overlap the stack.
I didn't code that in though.

?


Home | Main Index | Thread Index | Old Index