Subject: Re: Sheesh. More LKMs. And some firewall stupidity.
To: gabriel rosenkoetter <gr@cs.swarthmore.edu>
From: Bill Studenmund <wrstuden@zembu.com>
List: port-macppc
Date: 08/21/2000 12:49:22
On Sat, 19 Aug 2000, gabriel rosenkoetter wrote:

> So, my last email to the list from eclipsed.net doesn't seem to have
> come through (though I sent it to about half of you privately, I
> guess, so...). My SMTP server is fine and BIND is perfectly happy, and
> will even do zone transfers to machines outside of 130.58.0.0/16, but
> no nslookup from outside that IP range can get a response from it.
> (Again, not the DNS machine's fault; it never sees the request.) I
> expect that netbsd.org's MX (intelligently) refuses to deliver mail
> from servers with irresolvable addresses. Oh well.
> 
> This is all because of a fancy Cisco PIX firewall swarthmore.edu
> recently installed. So, DNS is broken, so mail to (and from, with smart
> SMTP daemons) eclipsed.net won't work. Just so no one gets confused
> about bounced messages and such. (If anyone has some experience that
> resembles this, preferably including a fix, feel free to let me know.)
> 
> Anyway, back to topic...
> 
> A few replies to messages I missed (I'm grabbing these off of
> mail-index.netbsd.org):
> 
> At 08/14/2000 18:58:59, Bill Studenmund <wrstuden@zembu.com> wrote:
> > It looks like there are two open issues. 1) getting -mlong-call working,
> 
> Sure. Great. I'm basically content stomping on prototypes, largely
> because...
> 
> > and 2) getting it to work for built-ins (mainly memcpy).
> 
> ... the only way to make this situation better (and I have done so) is
> to define the modified prototype as an extern in the definition
> section of the function in which memcpy() is used. This works, and the
> relocations all go through ld just dandy. So, miscmod.c (originally
> from /usr/share/lkm/misc/module/miscmod.c) looks like
> this, starting around line 86:
> 
> static int 
> miscmod_handle( lkmtp, cmd)
> struct lkm_table    *lkmtp;
> int         cmd;
> {       
>     int         i;
>     struct lkm_misc     *args = lkmtp->private.lkm_misc;
>     int         err = 0;    /* default = success*/
>     extern int sys_lkmnosys __P((struct proc *, void *, register_t *));
>     extern void   *memcpy __P((void *, const void *, size_t)) __attribute__((longcall));
> [...]
> 
> Anyway, ld's content now, but when I do a modload, I get this gem:
> 
> achemar:misc/module# make load
> modload -o miscmod -emiscmod combined.o
> modload: error loading buffer: Cannot allocate memory
> *** Error code 11
> 
> Stop.
> 
> Using my modified copy of modload with DEBUG=1 is a little more
> illustrative:
> 
> achemar:misc/module# ~gr/src/modload/modload -o miscmod -emiscmod combined.o
> ld -R /netbsd -e miscmod -o miscmod -Ttext 0x0 combined.o
> .text: addr = 0x0 size = 0x460 align = 0x4
> .rodata: addr = 0x460 size = 0x118 align = 0x4
> .sdata2: addr = 0x578 size = 0 align = 0x4
> .data: addr = 0x40578 size = 0x10 align = 0x4
> .data section forced to offset 0x578 (was 0x40578)
> .got: addr = 0x40588 size = 0x10 align = 0x4
> .sdata: addr = 0x40598 size = 0x10 align = 0x4
> .bss: addr = 0x405a8 size = 0 align = 0x1
> ld -R /netbsd -e miscmod -o miscmod -Ttext 0xe9209000 -Tdata
> 0xe9209578 combined.o
> loading `.text': addr = 0xe9209000, size = 0x460
> loading `.rodata': addr = 0xe9209460, size = 0x118
> loading `.data': addr = 0xe9209578, size = 0x10
> loading `.sdata2': addr = 0xe9209578, size = 0
> modload: error loading buffer: Cannot allocate memory
> 
> Um... what's .sdata2? (I wasn't aware that was even a legal name for a
> segment.) Why are we having difficulty allocating memory for a zero-
> size buffer?

You will probably need to add debugging printf's to modload for this one.
From my skimming of the code, I can't see where that's coming from. Oh, I
think I see where it's coming from, loadbuf(), but the routine issuing the
message, elf_mod_load(), looks like it shouldn't call loadbuf() on a
0-length section.. All I can think is it might be an optimizer bug, or
memory error (like something else is scribbling on this memory). debugging
printfs are the key here. :-)

Another option might be "objcopy -R sdata2 <lkm> <new lkm name>".

Oh, .sdata2 is a ppc-specific section name. I'm not sure what it's good
for.

> I remain confused, and I'm spending time on this that I really ought
> to be spending on my work, so I'm pulling a couple of sparcs up to
> -current and will move my development there. It hurts though. They're
> so slow compared to these 7x00s...

[snip, as Todd can answer better]

> On 08/16/2000 16:24:57, Henry B. Hotz <hotz@jpl.nasa.gov> wrote:
> 
> > There is already support for swapping to a vnode filesystem.  Sounds
> > like what you should be doing is defining the vnode calls to swap
> > page data, not modifying UVM itself.
> 
> Now, maybe I'm missing something, but I don't understand how vnodes
> help my situation.
> 
> As I understand it UVM won't use the vnode pager for standard memory,
> it uses aobj/uvm_obj pairs for that, and making it use something else
> would be more work than what I'm thinking of.

You really should talk to the Chucks here as they trapse through this code
all the time. :-) Chuck Cranor made uvm as a dissertation, so get a copy
of that. :-)

> Also, I don't see how a vnode FS (which exists as a file in the local
> file system, no?) helps me get pages of data into the live memory of
> another machine on the network.

There's no requirement that an FS use local disk. :-) You get to set up
all of the things it does, so it can read data from another disk if you
want.

From the point of view of the vm system, all the vnode needs to do is
respond to _VOP_READ() and VOP_WRITE(), and a few other calls.

> Could you describe how you would use a vnode fs to do this? If you
> meant to use a file that was NFS mounted for the vnode, I don't think
> this is a good idea. First off, I'm ditching NFS for a modified
> version of Berkeley's xFS in my cluster, second, I don't want file
> system interaction at all when paging, that's the point. Dealing with
> an fs lookups is a slow down I don't want, especially when network
> transactions are added to the mess (even if they are asynchronous,
> excess network traffic is simply something I don't want in a cluster).

No, this would be using drivers of your own making to do read & write. :-)
So it would have whatever latencies you designed in.

If you went with an fs type interface, you'd be making something more like
kernfs or procfs than an on-disk fs. Your fs doesn't really have to
support lookups - you could just have an empty / directory (the "root"
vnode is a directory which doesn't support VOP_LOOKUP, and always reads as
an empty directory). Then you make vnodes as you need them which will read
& write to the other process's address space.

> My plan is to modify a few functions in uvm_aobj.c (uh, should be in
> src/sys/uvm, but I'm looking at a printout I've scribbled on, not at
> the file). Especially of interest are uao_get(), which, when it checks
> for the swap slot a requested page resides in, could be made to check
> a separate table for a record of placing the page on another machine,
> and retrieve it from there, and uao_pagein, of which a modified copy
> could be made (uao_pagein_NW?) to get pages back from the network
> rather than swap.

More on this in a bit.

> All of this brings me to a message a while ago from Bill, which has
> been nagging at the back of my mind...
> 
> On 08/14/2000 18:16:12, Bill Studenmund <wrstuden@zembu.com> wrote:
> 
> > Also, I think that it's hard to change existing kernel routines. The
> > linking process, as I understand it, will modify unresolved symbols in
> > your lkm to point to the in-kernel ones. But it will not modify existing
> > kernel references to other kernel functions (like existing calls to
> > uvm_page()) to point out to your lkm code.
> 
> Yes, that is a profound problem. If that's true, then it's time to
> give up on LKMs entirely.

No it's not. It's time to think about doing thins a little differently.
:-)

lkm's ADD things. So to use them, you need to make what you're doing
involve ADDing things to the kernel.

> I would ask how LKMs would ever be useful for developing new additions
> to the kernel, then... which I thought was the point of their
> existence. If you can't add calls to symbols from your LKM or modify
> in-kernel symbols to behave differently, how could you ever add
> anything to an existing kernel except by recompiling the kernel?

You find all of the places where the kernel jumps to a routine whose
address is in a table. Those you can change. For instance, all the file
systems have tables with their entry points, in a linked list. So you just
add your vfs to the list, and presto, it works. All socket interfaces go
through a function whose address is in a table. All of the device
interface, both block and character, go through tables of driver entry
points. All system calls are addressed via a table. So there are lots of
ways to add to the kernel. :-)

Getting back to what you were talking about. Yes, you can't change an
existing kerne routine. But I don't think you want to. I think it would be
easier if you just provided routines which provide and consume pages,
either via a swap device or a vnode, and leave the uvm system as-is. It
makes your maintenance easier, and makes a cleaner integration.

You were talking about something which would have a seperate table of
blocks on another computer. I don't think you need that. Partly because
you can make your own vnode which just reads from or writes to the other
process space. :-) It's your fs after all. Also, as I understand it, you
(or a daemon) will initiate the process migration. So all of these pages
will get created/added by roitines you write. So you can enter page
mapping such that when they are used on the vnode you made, the right
things get read & written. :-)

The reason I didn't recomend making a new swap device, the other way to
add storage to the vm system, is that as I understand it, the vm system
will use that space for generalized storage. You want only your processes
doing this.

> On second thought, maybe I won't move to the sparcs. Maybe I'll just
> toss the whole LKM towel in and live with building kernels I do have
> nine macppc machines to play with, no reason I can't move to another
> to try something new while one's building.

You could do that too. :-) Though I think you could get this working with
a system call & a modified file system, which could all be done with
lkm's. If you do go with full kernel building, you can also do a "make -j
6" which will fork 6 processes to make the kernel. Especially if you have
only changed .c files, you'll need one process for each of the libraries
(there usually are 2), and one per .c file. They will all complete in
about the same time.

Take care & good luck!

Bill