Subject: Adventures in Assembly (Is it even possible? Yes! Here's the magic
To: None <port-i386@netbsd.org>
From: Marc Tooley <sudog@sudog.com>
List: port-i386
Date: 08/21/2001 11:01:20
Another story in the adventures of Marc,

So in an effort to learn exactly how to make a syscall to output
"Hello World" to the screen--in assembly--I've been trying to use
gdb's built-in disassembler and stepi routines to trace through a C
version of int main(){fprintf(stdout,"hello world\n");} ... in a vain
attempt to learn something about how I can begin to write hand-coded
assembly under NetBSD.

Endless (and I do mean endless, I've had my finger on the return key
to repeat stepi command for literally 10 minutes now) iterations
later, I'm no further along than when I started, and the following
routines are being called endlessly, over and over, looking up
symbols, fiddling with hashes, binding this, starting that..

_rtld_elf_hash ()
_rtld_find_symdef ()
_rtld_symlook_list ()
_rtld_symlook_obj ()
strcmp ()
_rtld_relocate_plt_object ()
_rtld_bind ()
_rtld_bind_start ()
__divdi3 ()
sysconf ()
_sysctl ()

I realize fprintf() probably isn't the most efficient way to find my
answers. Fifteen minutes after the start of this process, I re-write
hello.c to:

int main(){write(0,"Hey there\n",10);}

..and gdb stepi through it some more. God hates a coward, after all.
In my travails, I take a side-trip into the rtld source out of
curiosity.. in libexec, ld.elf_so source.. There's plenty in there..
what does it do? Beats me. About 75% of what I see is uncommented,
arcane references to undocumented structures, #ifdef's for various
architectures, a debug routine that does nothing
(_rtld_debug_state()), a separate error function that turns out to be
an independent implementation of snprintf that further uses a function
I've never seen before (xvsnprintf()--try a man on that one), and a
message that states, "This isn't correct."

Hurt me..! I know the gist of it--the name is Dynamic Linker for ELF
after all. Leave that behind and close the door on it for now. It's
not helping us in our quest of Understanding.

But perhaps the taste of success is nigh--I see a return to write()
and set an address breakpoint at it. Disassembly the next time through
gives up:

0x480d8ab4 <write>:     movl   $0x4,%eax
0x480d8ab9 <write+5>:   int    $0x80
0x480d8abb <write+7>:   jb     0x480d8a9c <getpid+8>

int 0x80 is a generic "let's visit the kernel" interrupt, a 0x4 is the
write() kernel syscall according to /usr/src/sys/kern/syscalls.master

info registers shows me:

eax            0x4      4
ebx            0x8049918        134519064
ecx            0x480e5620       1208899104
edx            0x480e5628       1208899112
esp            0xbfbfda54       0xbfbfda54
ebp            0xbfbfda64       0xbfbfda64
esi            0xbfbfdaac       -1077945684
edi            0xbfbfdff0       -1077944336

let's see... ebx is pointing at __ps_strings, .. no, that's all wrong
and probably irrelevant. The important thing is the stack for
syscalls--we're not Linux after all. esp is pointing at 0xbfbfda54,
so:

(gdb) x/20xw 0xbfbfda54
0xbfbfda54:     0x08048809      0x00000002      0x080488e7      0x0000000a
0xbfbfda64:     0xbfbfda88      0x080485b9      0x00000001      0xbfbfdaac
[...etc...]

The 0x02 looks like the file descriptor I gave it (2 for stderr),
0x080488e7 is the "Hey there\n" string (x/10c 0x080488e7), and
0x0000000a is most definitely the length. (10 bytes). So what the heck
is 0x08048809? That looks like the return code in main (run bt) where
we invoked write() to begin with.

Okay, fine. Looks good. "Hey there" pops out to the screen.

Well?! Bloody heck. What's the syscall invocation again? (less
/usr/src/sys/kern/syscalls.master)

ssize sys_write (int fd, const void *buf, size_t nbyte);

Now you'd think the syscalls.master would be enough to get something
in assembly working. Let's resurrect the old hello.s:

------------------------------
section .data
msg   db "Hello World!",0xa
len   equ   $-msg

section .text
   global _start
_start:
   push  dword len
   push  dword msg
   push  dword 0x02
   mov   eax,0x04
   int   0x80
   add   esp,4
   push  dword 0
   mov   eax,0x1
   int   0x80
----------------

Compile with NASM and execute. Nothing. I smell victory not far off
though: Remember the stack from the hello.c, above? It had a
mysterious extra number on it--the return code for the main program
just after the write() invocation.

Let's alter the assembly program one more time:

-----------------------------
section .data
msg     db      "Hello World!",0xa
len     equ     $-msg
section .text
        global _start
_start:
        push    dword len
        push    dword msg
        push    dword 0x02
; now we're pushing the address right after the int 0x80 call as well
        push    dword myret
        mov     eax,0x04
        int     0x80
myret:
        add     esp,4
        push    dword 0
        mov     eax,0x1
        int     0x80
-----------------------

Make:

nasm -f elf hello.s
ld -s -o hello hello.o

run it:

./hello

Hello World!

SUCCESS! Just goes to show you, perseverence pays off in the end. I'm
forwarding this little "journal" to the mailing list in case others
want to adapt the "Hello World" that's all over the place in the
NASM/Linux/FreeBSD assembly howtos and tutorials to NetBSD.
Currently their instructions are incorrect, as are all the other
tutorials that lump NetBSD in with the rest of the rabble with
their sample hello world routines.

Looks like there was a change recently in the "way" to use syscalls in
NetBSD, because on the older kernels (around 1.5) the routine works
fine without pushing a return address onto the stack.

An extremely helpful program in implementing the above assembly
routines is called "ALD". It's sort of like a lower-level gdb that
works a better on assembly programs and gives you cleaner access to
the gritty details without all the overhead.

Getting ALD running under NetBSD was an exercise in itself.

Hope this helps someone else, someday,

Marc Tooley