Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

New FPU code



Here is a patch [1] that reworks and greatly simplifies the FPU code.

The current API is:
	fpusave_lwp(l, bool)
	fpusave_cpu(bool)
It needs special care because the FPU state is context-switched in place, and
also needs IPL_HIGH to prevent IPI recursions. It's a big headache.

The new API is:
	fpu_save(void)
When the kernel wants to touch the FPU state of curlwp, it calls fpu_save(),
and it can then touch the in-memory state. If it's not curlwp but a remote
lwp, nothing to do, just touch the in-memory state directly.

The internal logic is:

 - We introduce a per-lwp flag called MDL_FPU_IN_CPU. When set, it means that
   the lwp's FPU state is in the CPU, and not in memory. Symmetrically, when
   not set, it means that the state is in memory and not in the CPU.

 - What fpu_save() does is, if MDL_FPU_IN_CPU is set, copy the FPU values
   from the CPU into the memory, and then clear MDL_FPU_IN_CPU.

 - When returning to userland, if MDL_FPU_IN_CPU is cleared, we restore the
   FPU state, ie we copy the in-memory state into the CPU registers, and then
   we set MDL_FPU_IN_CPU.

 - During context switches, we just do fpu_save(), which saves the state of
   the previous lwp. The FPU state of the new lwp will be installed next time
   the new lwp returns to userland.

In pseudo-code:

	fpu_save()
	{
		if (MDL_FPU_IN_CPU is set) {
			save the FPU state in memory
			clear MDL_FPU_IN_CPU
		}
	}

	context_switch()
	{
		fpu_save();
	}

	return_to_userland()
	{
		if (MDL_FPU_IN_CPU is cleared) {
			restore the FPU state from memory
			set MDL_FPU_IN_CPU
		}
	}

What's nice about this design, is:

 - That once you call fpu_save(), you are guaranteed to be able to access
   the in-memory state safely, without having to care about the preemption,
   spl and context; in the case of NVMM, it actually improves performance,
   we can now process several VMEXITs without having to switch the FPU state

 - That we can avoid data movements, for example if there are N context
   switches inside the kernel without FPU usage in between, the FPU will only
   get saved/restored once, and not N times

 - That the save/restore sequences are now a lot easier to understand at the
   CPU level

For now, I am keeping stts/clts/fpudna as debugging, but strictly speaking
they are not needed and can be removed. In the future we may also want to
hotpatch the return-to-userland code for better inlining.

I have tested this code on amd64-KASAN and i386, with full ATF for each, and
also mlelstv's stream.c [2]. I have compile-tested it on i386-pae-dom0 and
amd64-dom0, but I don't have Xen setups, so I can't test more than that.

I'd like to commit this change within a week or two. This is a prerequisite
for future changes. Note that a kernel bump should come with it.

Maxime

[1] https://m00nbsd.net/garbage/fpu/fpu-new.diff
[2] https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=52966


Home | Main Index | Thread Index | Old Index