Subject: Re: radeon driver design (was Re: generic virtual consoles)
To: None <tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 12/28/2005 17:56:43
> My understanding with the kernel/user boundary is that there are two
> reasons for it being slow: 1) data copy across the boundary, and 2)
> the context switch is expensive largely due to the rescheduling of
> processes.

Also, I understand that on some CPUs, merely taking the syscall trap
(regardless of data copy and scheduling overhead) can be expensive.
I've never done measurements, so I don't know how true this actually is
on various CPUs, but it certainly seems plausible.  Switching
protection rings can involve massive cache pushing and reloading, all
sorts of instruction pipeline bubbles, etc, etc....

Perhaps we need to invent some alternative form of syscall trap?  Maybe
a magic address that's never mapped, with the kernel foo done inside
the page fault handler?  Page faults are usually relatively cheap as
far as the hardware is concerned....

> As I said, this is future work.  Right now I just want to get the
> linear framebuffer working, which won't have these problems.

Right.  I've been considering doing something vaguely similar myself,
but so far it hasn't got past the "pipe dream" stage; I'm definitely
going to be watching your progress, since you will almost certainly get
something working sooner than I will.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B