Subject: Re: radeon driver design (was Re: generic virtual consoles)
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Garrett D'Amore <garrett_damore@tadpole.com>
List: tech-kern
Date: 12/28/2005 12:08:23
der Mouse wrote:

>>I'm also a little reassured that no one has found serious flaws with
>>my design, only refinements.  It sounds like this is the way for me
>>to proceed.
>>    
>>
>
>Well, I would really hate to have to do an ioctl every time I want to
>draw a line, or display a character, or, or, or...that kernel/user
>crossing penalty is a real performance killer.
>
>It wasn't clear to me whether your design allowed batching across the
>kernel/user interface or not, but if not, *I* see that as a flaw.
>  
>
Agreed.  We need some kind of batching support to make performance
good.  The first version will just export a memory mapped framebuffer
anyway, but future versions will need to provide some kind of interface
for acceleration functions, and it should probably be batched --
probably across a shared memory region (maybe exporting a ring structure
or something.)

What I don't want to have to expose to userland is the intricate details
of the chip, otherwise we wind up writing multiple graphics drivers --
one for user space and one for kernel space.

My understanding with the kernel/user boundary is that there are two
reasons for it being slow: 1) data copy across the boundary, and 2) the
context switch is expensive largely due to the rescheduling of processes.

It seems that having memory exported via DMA or some such solves the
first problem.  The second problem is harder, but maybe we need some
form of syscall that is "light-weight" and doesn't automatically force
the scheduler to reschedule processes.  (I don't know NetBSD's syscall
layer that well, but for example, Solaris has as a "fast syscall" that
is used for performance measurement -- e.g. the high resolution timer,
that has these kinds of attributes.  It is an extremely cheap call.  We
could use something like this as a signaling mechanism to kick start
graphics in the driver.  Then the calling application only needs to
sleep if the ring is full or it wants to wait for all jobs to complete.

As I said, this is future work.  Right now I just want to get the linear
framebuffer working, which won't have these problems.

    -- Garrett

>/~\ The ASCII				der Mouse
>\ / Ribbon Campaign
> X  Against HTML	       mouse@rodents.montreal.qc.ca
>/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>  
>


-- 
Garrett D'Amore                          http://www.tadpolecomputer.com/
Sr. Staff Engineer          Extending the Power of 64-bit UNIX Computing
Tadpole Computer, Inc.                             Phone: (951) 325-2134