Subject: Re: Adventures in Assembly (Is it even possible? Yes! Here's the magic formula.)
To: Marc Tooley <sudog@sudog.com>
From: Nathan J. Williams <nathanw@MIT.EDU>
List: port-i386
Date: 08/21/2001 14:44:21
Marc Tooley <sudog@sudog.com> writes:

> So in an effort to learn exactly how to make a syscall to output
> "Hello World" to the screen--in assembly--I've been trying to use
> gdb's built-in disassembler and stepi routines to trace through a C
> version of int main(){fprintf(stdout,"hello world\n");} ... in a vain
> attempt to learn something about how I can begin to write hand-coded
> assembly under NetBSD.

Why in the name of anything holy do you want to write in assembly?
It's painful, it's tedious, there's no error checking, and you have to
rewrite it for each of NetBSD's 14-odd CPU types. If you really enjoy
assembly this much, divert your energies somewhere more useful, like
writing compilers.

Maybe you shouldn't answer that. I'm going to try to help you anyway.

> Endless (and I do mean endless, I've had my finger on the return key
> to repeat stepi command for literally 10 minutes now) iterations
> later, I'm no further along than when I started, and the following
> routines are being called endlessly, over and over, looking up
> symbols, fiddling with hashes, binding this, starting that..

If this bothers you, link static binaries.

> int main(){write(0,"Hey there\n",10);}
> 
> ..and gdb stepi through it some more. God hates a coward, after all.

Unnecessairly painful. Use gdb, or better yet objdump --source
--disassemble on a binary compiled with -g, to show you where the call
you want is and set a breakpoint there. Using your code as an example:

38 crash-test-dummy:nathanw>gdb foo
...
(gdb) disassemble main
Dump of assembler code for function main:
0x804878c <main>:       pushl  %ebp
0x804878d <main+1>:     movl   %esp,%ebp
0x804878f <main+3>:     pushl  $0xa
0x8048791 <main+5>:     pushl  $0x80488e7
0x8048796 <main+10>:    pushl  $0x0
0x8048798 <main+12>:    call   0x80484dc <write>
0x804879d <main+17>:    addl   $0xc,%esp
0x80487a0 <main+20>:    leave  
0x80487a1 <main+21>:    ret    
End of assembler dump.
(gdb) b *0x8048798
Breakpoint 1 at 0x8048798: file foo.c, line 3.
(gdb) display/i $eip
(gdb) run
Starting program: /d/home/nathanw/foo 

Breakpoint 1, 0x8048798 in main () at foo.c:3
3               write(0,"Hey there\n",10);
1: x/i $eip  0x8048798 <main+12>:       call   0x80484dc <write>

> But perhaps the taste of success is nigh--I see a return to write()
> and set an address breakpoint at it. Disassembly the next time through
> gives up:
> 
> 0x480d8ab4 <write>:     movl   $0x4,%eax
> 0x480d8ab9 <write+5>:   int    $0x80
> 0x480d8abb <write+7>:   jb     0x480d8a9c <getpid+8>
> 
> int 0x80 is a generic "let's visit the kernel" interrupt, a 0x4 is the
> write() kernel syscall according to /usr/src/sys/kern/syscalls.master

You should look at src/lib/libc/arch/<whatever>/SYS.h to see how
system calls are defined on a given platform.

> The 0x02 looks like the file descriptor I gave it (2 for stderr),
> 0x080488e7 is the "Hey there\n" string (x/10c 0x080488e7), and
> 0x0000000a is most definitely the length. (10 bytes). So what the heck
> is 0x08048809? That looks like the return code in main (run bt) where
> we invoked write() to begin with.

Yes. main() doesn't invoke a syscall directly; it calls the function
write() in libc, which invokes a syscall. The call instruction puts
the return address on the stack.

You have missed the forest for the trees.

> ssize sys_write (int fd, const void *buf, size_t nbyte);
> 
> Now you'd think the syscalls.master would be enough to get something
> in assembly working.

Here's where you're wrong. System calls are complex beasts, and you
need to know everything that the processor does with the trap invoked
by the program and everything that the kernel does with that
information before passing it off to the sys_write() C routine. In the
i386 case, you'd need to look at IDTVEC(syscall) in
src/sys/arch/i386/i386/locore.s, line 2413 in my version, and at
syscall.c in the same directory.

        - Nathan