Subject: Re: questions about various .S files in i386/boot
To: None <port-i386@NetBSD.ORG>
From: Craig M. Chase <chase@orac.ece.utexas.edu>
List: port-i386
Date: 03/21/1996 09:47:25
A reasonable person once wrote:
> 
> > I've been scampering around some more in i386/boot of late, and I
> > discovered that there are lots of things in there that I simply do not
> > understand at all.

Welcome to the club.

I just hacked my own boot/loader for an operating systems class I am
teaching.  Real-mode x86 code is a pain.

At power-up, the cpu is essentially emulating an 8086.  All registers
are 16 bits.  All addresses are 20-bit segment:offset register pairs.

Anyway, the problem is, gas knows nothing about such a brain damaged
mode of execution.  And when told to generate code for something like:

    movl        $0x42, %eax

generates the machine code for "execute a move immediate instruction
using the natural word size of the machine (32-bits) with immediate
data '0x00000042'"

This makes perfect sense to gas, because it thinks that the CPU is a
32-bit machine.  Alas, when we run the bootstrap code, we are not
using a 32-bit computer, and the CPU interpretes the machine code as:

"execute a move immediate instruction using the natural word size of
the machine (16-bits) with the immediate data '0x0042', and then
execute the machine code '0x0000'" (which happens to be the instruction
"addb   %al,(%ax)").  So, we're off executing our data.


On the other hand, if we tell gas to generate code for:

    movw        $0x42, %ax

gas will insert the 0x66 byte prefix automatically.  But when the
machine code is run, the CPU is *actually* in 16-bit mode, so the byte
prefix is interpreted to switch the CPU into 32-bit mode, and we 
are now loading part of our code into %eax (the next assembly language
instruction(s) is loaded into the high-order bytes of eax).

The solution in the boot code is to fake out the assembler.
by interleaving data with the code, the programmers trick gas into
producing the correct machine code for a CPU running in 16-bit mode
and acting on 32-bit data.

So, when we want to do a 32-bit instruction, we insert data32 or
addr32 (or both) in front of the mnemonic, and use a 32-bit register
name.

When we want to do a 16-bit instruction (e.g. movw %ax,%bx), then we
leave the data32 out, and *still* use the 32-bit register name.

That's what's going on in the code.  

> > 
> > For instance, take the enclosed snippet of code, from start.S. All
> > through the code, we find these data32 macros. Could someone please
> > explain to me what they mean and why they are there? I feel sort of
> > stupid, but I just "don't get it" I suppose. Does it have something to
> > do with 16 bit vs. 32 bit *86 code?
> > 
> > [...]
> > #define	addr32	.byte 0x67
> > #define	data32	.byte 0x66
> > [...]
> > data32
> > movl	$BOOTSEG, %eax
> 

and Matt Beal had explained:
> Exactly (16- vs 32-bit). On the x86's, when opcode 0x66 preceeds an
> instruction, such as the one above, it tells the CPU to use 16-bit
> accesses in 32-bit mode, and vice versa. In x86 instruction encodings,
> 16- and 32-bit instructions look exactly the same, and depend on which
> mode (16- or 32-bit) the CPU is in. To change to the meaning of the
> instruction, from a 16-bit access to a 32-bit one, you prefix that opcode.
> In the case of the initial boot code, which runs in 16-bit mode, the opcode
> is required for 32-bit register accesses.
> 
> matt
> 

-- 
Craig Chase --- Assistant Professor      |  my PGP public key is available 
Electrical and Computer Engineering      |  upon request, or
The University of Texas at Austin        |  finger chase@orac.ece.utexas.edu
Austin, TX 78721   --- (512) 471-7457    |