port-macppc: Sheesh. More LKMs. And some firewall stupidity.

Subject: Sheesh. More LKMs. And some firewall stupidity.
To: None <wrstuden@zembu.com, tv@wasabisystems.com, hotz@jpl.nasa.gov>
From: gabriel rosenkoetter <gr@cs.swarthmore.edu>
List: port-macppc
Date: 08/19/2000 03:12:37
So, my last email to the list from eclipsed.net doesn't seem to have
come through (though I sent it to about half of you privately, I
guess, so...). My SMTP server is fine and BIND is perfectly happy, and
will even do zone transfers to machines outside of 130.58.0.0/16, but
no nslookup from outside that IP range can get a response from it.
(Again, not the DNS machine's fault; it never sees the request.) I
expect that netbsd.org's MX (intelligently) refuses to deliver mail
from servers with irresolvable addresses. Oh well.

This is all because of a fancy Cisco PIX firewall swarthmore.edu
recently installed. So, DNS is broken, so mail to (and from, with smart
SMTP daemons) eclipsed.net won't work. Just so no one gets confused
about bounced messages and such. (If anyone has some experience that
resembles this, preferably including a fix, feel free to let me know.)

Anyway, back to topic...

A few replies to messages I missed (I'm grabbing these off of
mail-index.netbsd.org):

At 08/14/2000 18:58:59, Bill Studenmund <wrstuden@zembu.com> wrote:
> It looks like there are two open issues. 1) getting -mlong-call working,

Sure. Great. I'm basically content stomping on prototypes, largely
because...

> and 2) getting it to work for built-ins (mainly memcpy).

... the only way to make this situation better (and I have done so) is
to define the modified prototype as an extern in the definition
section of the function in which memcpy() is used. This works, and the
relocations all go through ld just dandy. So, miscmod.c (originally
from /usr/share/lkm/misc/module/miscmod.c) looks like
this, starting around line 86:

static int 
miscmod_handle( lkmtp, cmd)
struct lkm_table    *lkmtp;
int         cmd;
{       
    int         i;
    struct lkm_misc     *args = lkmtp->private.lkm_misc;
    int         err = 0;    /* default = success*/
    extern int sys_lkmnosys __P((struct proc *, void *, register_t *));
    extern void   *memcpy __P((void *, const void *, size_t)) __attribute__((longcall));
[...]

Anyway, ld's content now, but when I do a modload, I get this gem:

achemar:misc/module# make load
modload -o miscmod -emiscmod combined.o
modload: error loading buffer: Cannot allocate memory
*** Error code 11

Stop.

Using my modified copy of modload with DEBUG=1 is a little more
illustrative:

achemar:misc/module# ~gr/src/modload/modload -o miscmod -emiscmod combined.o
ld -R /netbsd -e miscmod -o miscmod -Ttext 0x0 combined.o
.text: addr = 0x0 size = 0x460 align = 0x4
.rodata: addr = 0x460 size = 0x118 align = 0x4
.sdata2: addr = 0x578 size = 0 align = 0x4
.data: addr = 0x40578 size = 0x10 align = 0x4
.data section forced to offset 0x578 (was 0x40578)
.got: addr = 0x40588 size = 0x10 align = 0x4
.sdata: addr = 0x40598 size = 0x10 align = 0x4
.bss: addr = 0x405a8 size = 0 align = 0x1
ld -R /netbsd -e miscmod -o miscmod -Ttext 0xe9209000 -Tdata
0xe9209578 combined.o
loading `.text': addr = 0xe9209000, size = 0x460
loading `.rodata': addr = 0xe9209460, size = 0x118
loading `.data': addr = 0xe9209578, size = 0x10
loading `.sdata2': addr = 0xe9209578, size = 0
modload: error loading buffer: Cannot allocate memory

Um... what's .sdata2? (I wasn't aware that was even a legal name for a
segment.) Why are we having difficulty allocating memory for a zero-
size buffer?

I remain confused, and I'm spending time on this that I really ought
to be spending on my work, so I'm pulling a couple of sparcs up to
-current and will move my development there. It hurts though. They're
so slow compared to these 7x00s...

On 08/14/2000 23:58:17 Todd Vierling <tv@wasabisystems.com> wrote:

> On Mon, 14 Aug 2000, Bill Studenmund wrote:
> 
> : It looks like there are two open issues. 1) getting -mlong-call working,
>    
> This means incorporating the public patch to do so,

What are the material problems with this code as it was released that
kept it from being added to the distribution at the time? It does too
much? It doesn't follow our style guide? If it's some trivial
modifications to get this in, I'm willing to do them, provided they're
only moderately time-consuming (I do have course work and a thesis to
worry about too), but if it's just a lost cause, I don't think I could
care much less.

Since I can get down to just memory errors using changed prototypes,
I can (probably) do what I want to do now without beating on gcc, and
switch over to -mlong-call when someone else gets around to
incorporating it. (Even more, I can just ditch working on the ppc
platform, but I'd rather not have to do that.)

Speaking of that work...

On 08/16/2000 16:24:57, Henry B. Hotz <hotz@jpl.nasa.gov> wrote:

> There is already support for swapping to a vnode filesystem.  Sounds
> like what you should be doing is defining the vnode calls to swap
> page data, not modifying UVM itself.

Now, maybe I'm missing something, but I don't understand how vnodes
help my situation.

As I understand it UVM won't use the vnode pager for standard memory,
it uses aobj/uvm_obj pairs for that, and making it use something else
would be more work than what I'm thinking of.

Also, I don't see how a vnode FS (which exists as a file in the local
file system, no?) helps me get pages of data into the live memory of
another machine on the network.

Could you describe how you would use a vnode fs to do this? If you
meant to use a file that was NFS mounted for the vnode, I don't think
this is a good idea. First off, I'm ditching NFS for a modified
version of Berkeley's xFS in my cluster, second, I don't want file
system interaction at all when paging, that's the point. Dealing with
an fs lookups is a slow down I don't want, especially when network
transactions are added to the mess (even if they are asynchronous,
excess network traffic is simply something I don't want in a cluster).

My plan is to modify a few functions in uvm_aobj.c (uh, should be in
src/sys/uvm, but I'm looking at a printout I've scribbled on, not at
the file). Especially of interest are uao_get(), which, when it checks
for the swap slot a requested page resides in, could be made to check
a separate table for a record of placing the page on another machine,
and retrieve it from there, and uao_pagein, of which a modified copy
could be made (uao_pagein_NW?) to get pages back from the network
rather than swap.

All of this brings me to a message a while ago from Bill, which has
been nagging at the back of my mind...

On 08/14/2000 18:16:12, Bill Studenmund <wrstuden@zembu.com> wrote:

> Also, I think that it's hard to change existing kernel routines. The
> linking process, as I understand it, will modify unresolved symbols in
> your lkm to point to the in-kernel ones. But it will not modify existing
> kernel references to other kernel functions (like existing calls to
> uvm_page()) to point out to your lkm code.

Yes, that is a profound problem. If that's true, then it's time to
give up on LKMs entirely.

I would ask how LKMs would ever be useful for developing new additions
to the kernel, then... which I thought was the point of their
existence. If you can't add calls to symbols from your LKM or modify
in-kernel symbols to behave differently, how could you ever add
anything to an existing kernel except by recompiling the kernel?

On second thought, maybe I won't move to the sparcs. Maybe I'll just
toss the whole LKM towel in and live with building kernels I do have
nine macppc machines to play with, no reason I can't move to another
to try something new while one's building.

       ~ g r @cs.swarthmore.edu

PS, again, including newhall@cs.swarthmore.edu and
hinojosa@cs.swarthmore.edu in replies would be appreciated.