tech-kern: Re: Stuff

Subject: Re: Stuff
To: Bill Paul <wpaul@FreeBSD.ORG>
From: Alan Ritter <rittera@cc.wwu.edu>
List: tech-kern
Date: 08/20/2005 15:35:20
Hi, thanks for your email :-)

> Couple of things:
> I realized it might really useful if you had the manual for the Intel 82559
chipset so you could better understand just what the sample e100bex driver is going.
Luckily, Intel has an 'open source developer's manual' available here:
> http://sourceforge.net/projects/e1000/
> You want the document called "8255x Developer Manual Revision 1.0." It's actually
missing some info related to checksum offload support, but the e100bex driver
doesn't implement checksum oddload so it
> doesn't matter.

Thanks, I was looking at the data sheet for the 82559
(http://www.intel.com/design/network/datashts/738259.htm) which looks to have some
similar information, but I don't think it's exactly the same chip.  This looks
really helpful

> Also, please don't make a big deal out of the fact that you can
> ping/ssh/whatever to the ndis0 interface on the local host. I don't know why
people ascrie any important to this, yet for some reason everyone seems to think it
means something. It doesn't: all you're doing is exercising the loopback interface
and the routing code. You aren't going anywhere near the driver code. Only data
transfers between two machines matter. I cringe whenever I see someone say "...but,
I can ping foo0 on my machine -- that means something,
> right?"

No, I actually can ssh from one computer to another that's plugged into an Ethernet
hub.  I just statically set the IP of the machine without ndis0 (machine A) to
something like 179.168.1.10, then on the other machine (machine B) do something like
"ifconfig ndis0 179.168.1.11", then from machine A I can do ssh 179.168.1.11, and it
connects, I can do all the normal stuff (cd, ls, dmesg, etc...)  I can also connect
and disconnect multiple times, and I tried scping some files this way too (I think
the biggest I tried was a /netbsd kernel).  I can also ssh from the machine with the
ndis driver for the e100bex to machine A.

Unfortuanately I can't seem to get DHCP to work, I'm not quite sure why.  I don't
know that much about networking stuff (I haven't taken the networking class yet. 
Actuially it's the last one I need to graduate - I should have done this earlier)
anyway I suspect that this might have a problem to do with multicast.
UnfortunatelyActually
I tried comparing the e100bex DbgPrint() trace output from FreeBSD to that from
NetBSD, and I noticed that in FreeBSD NICSetMulticastList() was getting called a
bunch, but in NetBSD it isn't.  I think this might just have to do with a difference
in the FreeBSD/NetBSD networking code specifically in ndis_setmulti() I'm using
sc->arpcom.ec_multiaddrs in place of ifp->if_multiaddrs (the NetBSD ifnet structure
dosen't have this field) and when ndis_setmulti() gets called in FreeBSD
ifp->if_multiaddrs seems to be non-empty, but sc->arpcom.ec_multiaddrs appears to
always be empty in NetBSD.

> Next is the subject of Windows spinlocks and DISPATCH_LEVEL. I struggled for a
while to simulate dispatch level and the Windows spinlock mechanism in FreeBSD. The
main problem is that native FreeBSD spinlocks didn't do exactly what Windows
spinlocks do. The basic premise of spinlocks is to a) block out preemption on the
CPU and then b) test and set a lock variable. In the case where you have more than
one CPU, both steps are required. On a uniprocessor system, you can skip step b) and
just block out preemption. (In fact, it used to be that Windows had two versions of
NTOSKRNL.EXE: one for SMP that did both steps a) and b) when acquiring a spinlock,
and other just for UP systems which did only step a). This was done to provide
better performance on UP systems. I'm not sure if they still do this.) This insures
no other thread can run and modify the critical data you're working with.
> The question is what "blocking preemption" entails. On FreeBSD, it means doing a
critical_enter(), which boils down to a cli instruction that blocks all interrupts.
(critical_exit() does a sti instruction to restore them). But on Windows, it means
"raise the CPU interrupt priority level to 'dispatch level'" which is not the same
thing. When you're at DISPATCH_LEVEL, hardware interrupts may still occur, and they
can make other threads runnable, but the threads themselves can't run until you
leave DISPATCH_LEVEL. Basically, you're running in the thread dispatcher code
itself.

I just wrote functions (orriginally #defines) mtx_lock(), mtx_unlock(),
mtx_lock_spin(), mtx_unlock_spin() to simulate the FreeBSD functions using NetBSD
calls.  I don't think NetBSD spinlocks (simplelock's) block off interrupts
automatically when they are held.  Right now in place of mtx_lock_spin() I'm raising
the IPL to IPL_NET (using splnet()), and acquiring the lock.  I don't think NetBSD
kernel threads can be preempted by anything other than interrupts right now, so I
don't think there's any reason to raise the IPL higher than IPL_NET, but I'm not an
expert on this.  Also right now I'm just using a uniprocessor system, so I have no
idea if this will work on anything else.

> Furthermore, in Windoworiginallyllowed to allocate memory from the heap using
ExAllocatePoolWithTag() when you're at DISPATCH_LEVEL, provided you allocate from
NonPagedPool. This means allocating from NonPagedPool will _NOT_ result in your
thread sleeping, i.e. you will not acquire any sleep locks. In FreeBSD, this is not
the case: interrupts are implemented using interrupt threads, and you are
> allowed to sleep in an interrupt thread. Consequently, FreeBSD's malloc(9) will
always acquire sleep locks. You can't even swing a dead cat without acquiring a
sleep lock, even if you use M_NOWAIT. (This is sort of counter-intuitive: one
expects M_NOWAIT to mean no sleeping at all, but in FreeBSD it means 'no sleeping to
wait for enough memory to be free()d to satisfy the malloc() request.)

I don't think NetBSD interrupts occour in a thread context...

> In FreeBSD, locks are represented by struct mtx, even spin locks. In Windows,
spinlocks are represented by a uint32_t. Also, in FreeBSD, locks must be initialized
with mtx_init() and discarded with
> mtx_destroy().occur
> In Windows, spinlocks are just initialized with KeInitializeSpinLock() (which just
does lock = 0;) and don't need any special action to be destroyed (after you give up
the spinlock the last time, you just release the memory in which it resides back to
the OS). All of these things drove me a little nuts.
> I finally settled on the the following mechanism:
> - To block pre-emption, I would raise the current thread priority as
>   high as it would go. This blocks any thread from running until the priority is
restored, except for interrupt threads, which are
>   considered special.

I assume you mean raising the scheduling priority to PI_REALTIME, NetBSD doesn't
have anything like this, I think because it's kernel threads run to completion, so
this behavior is normal.  But again I'm not 100% sure about this.

> - To test and set the spinlock variable, I would use the atomic op
>   routines in atomic.h.

I copied some of these into compat/ndis/nbcompat.h for use on NetBSD, although I'm
not sure I'm doing this correctly...

> I don't know how thread priorities are manipulated in NetBSD, or how NetBSD
spinlocks and sleep locks work. FreeBSD no longer implements the spl mechanism. If
NetBSD still has it, you may be able to use splhigh() to block out pre-emption, but
check if you can still do malloc(9) when at splhigh().

Yes, NetBSD definately uses the splXXX() funcitons.  This is what I have been using.
 In fact what got ssh working for me was raising the IPL to splnet() while at
DISPATCH_LEVEL (I know this isn't really the right thing to do).
definitelyfunctions
> I thought up an alternate method using sleep locks once which seems like it might
work, though I haven't tested it extensively. The idea is that we only need the
Windows spinlock mechanism to lock data which will be used within Project Evil and
the Windows drivers themselves. This being the case, we can create a locking
mechanism that will provide the proper protection within the context of Project Evil
like this:
> - When Project Evil starts up, create and initialize a sleep lock
>   for each CPU, called disp_lock.
> - When calling KeRaiseIrql(DISPATCH_LEVEL), acquire the disp_lock
>   for the current CPU.
> - Now test and set the spinlock variable
> - <do work inside locked section>
> - Clear the spinlock variable
> - Release the disp_lock for the current CPU.
> So if CPU0 acquires disp_lock0, any other thread running on CPU0 that tries to
acquire disp_lock0 will sleep (until the first thread gives up the lock). If a
thread on CPU1 tries to acquire the lock, it will acquire disp_lock1 (which will
block out any other threads on CPU1 that try to acquire disp_lock1) and spin waiting
for the lock variable to be cleared by the other thread running on CPU0. Note that
one potential problem is a thread on CPU0 acquiring disp_lock0, then doing something
which makes it sleep (like a malloc()) and then being scheduled onto CPU1 before it
releases the lock). In FreeBSD, I think you can avoid this with sched_pin(), which
is supposed to pin you to the current CPU (until you do sched_unpin()).
> I'm not sure how feasible this is on NetBSD, but it may work for you.

This sounds like a good idea.  I haven't had a chance yet to test out what I've got
on a multiprocessor system, as I don't have one, but I can probably get access to
one at scool.

> I'm puzzled by the problems you're having with running out of TX packets. You can
query an NDIS driver to find out how many packets it can queue up internally at any
one time. ndis_attach() does this and then creates a pool of TX packets that it can
use for transmission. When a transmission completes, which is signalled by a TX done
> interrupt, ndis_txeof() should be called to release the packet back to the TX
pool. (The BSD mbufs associated with the NDIS_PACKET are also released.)school

Yes, I've noticed that if after getting a seg-fault in NdisAllocatePacket() I force
it to return using GDB, I no longer run into any problems.  I was thinking about
putting a check in to test if pool is null, and if so just returning, but I'm not
sure this is the way to do it...

Also I can see that ndis_txeof() is getting called a bunch by putting a breakpoint
on it.

> The way the interrupt handling works, ndis_intr() should be invoked at interrupt
context (it's the routine directly attached to the
> interrupt. This routine in turn calls MiniportISR() in the driver to find out if
the underlying device really did signal an interrupt even that needs to be handled.
(Bear in mind that PCI devices can share interrupts, so it's possible for
ndis_intr() to be invoked because some other device on the same IRQ asserted the
interrupt line.) MiniportISR() is very small. If it sets is_our_intr and
> call_isr to 1, we then schedule MiniportHandleInterrupt() to run in a DPC. This is
what IoRequestDpc() does. This is really just
> a macro that calls KeInsertQueueDpc(). In Windows, all devices
> have one DPC structure embedded in their DEVICE_OBJECTs. This DPC is reserved
exclusively for running the device's interrupt handler. It's initialized with
IoInitializeDpcRequest() (which is also just a macro).
> MiniportHandleInterrupt() should eventually realize that yes, a
> transmission was completed and it's now safe to release the NDIS_PACKET associated
with the transmission. It does this by calling
> NdisMSendComplete(), which is another macro that invokes the
> 'nmb_senddone_func' handler from the miniport block. We install
> ndis_txeof() as the nmb_senddone_func routine.
> This is how things _should_ work. I'm not sure why they're not
> working that way, I would check to make sure your DPC is being
> called to run MiniportInterruptHandler() correctly.

I can see that MiniportHandleInterrupt is getting called after MPIsr() disables
interrupts and schedules it.  I think my original problem was that MPIsr() was
getting called in the middle of the MiniportHandleInterrupt() handler, and this was
somehow screwing things up.  This is originally why I decided to raise the IPL with
splnet() when at DISPATCH_LEVEL, and this got ssh working for me.  Now I don't raise
the IPL while at DISPLATCH_LEVEL, and instead do so when acqiring a spin lock, and
this seems to work (there must be a spin lock that is held while running
MiniportHandleInterupt().)

Anyway, thanks again you've been very helpfull :-)