Subject: Fixing Linux emulated brk()
To: None <tech-kern@netbsd.org>
From: Emmanuel Dreyfus <p99dreyf@criens.u-psud.fr>
List: tech-kern
Date: 03/14/2001 23:04:51
Hello everybody

With the help of Kevin B. Hendricks of the Blackdown team, I'm trying to
fix the way Linux brk() is emulated, so that Linux's JDK-1.3 will work
in emulation on the PowerPC.

We use a sample program that reproduces the problem occuring in the JVM.
The program is basically doing two sbrk(), the second being adjusted so
that it ends up on a page boundary. [A sbrk() library call result in one
brk(0) system call to get the brak value, and another brk() system call
to set the new value with the appropriate offset. Is that right?]

Here is a Linux emulated by NetBSD trace:
brk(0)          = 0x10011000
brk(0x10011021) = 0x10011021
brk(0)          = 0x10012000
brk(0x12012fdf) = 0x12012fdf

and here is the native Linux counterpart:
brk(0)          = 0x100109d8
brk(0x100109f9) = 0x100109f9
brk(0)          = 0x100109f9
brk(0x20021000) = 0x20021000

The problem in the emulated trace is that the break value moves between
the two sbrk(). After setting the break, we end up with 0x10011021 (on
the second line), and when we read it again, we get 0x1001200 (on the
third line). And the adjustment fails. Here is the explanation with more
details

The program first calls sbrk(), it wants to allocate some memory.

brk(0)          = 0x10011000
brk(0x10011021) = 0x10011021

The program want to add a 0xfdf so that we reach a page alligned
address: 0x10011021 + 0xfdf = 0x10012000. It calls sbrk() again.

sbrk() calls brk(0) to get the break address

brk(0)          = 0x10012000

But it's not at 0x10011021 anymore, anyway, the library call sbrk() does
not care, it has just been asked to add 0xfdf.

brk(0x12012fdf) = 0x12012fdf

And the result is not page aligned, which causes the JDK to fail.

The problem is caused by the linux_sys_brk(), in
/sys/compat/linux/common/linux_misc.c. And it is annotated as being a
problem:

   oldbrk = vm->vm_daddr + ctob(vm->vm_dsize);
   /*
    * XXX inconsistent.. Linux always returns at least the old
    * brk value, but it will be page-aligned if this fails, 
    * and possibly not page aligned if it succeeds (the user 
    * supplied pointer is returned).
    */
   SCARG(&oba, nsize) = nbrk;
 
   if ((caddr_t) nbrk > vm->vm_daddr && sys_obreak(p, &oba, retval) ==
0)
      retval[0] = (register_t)nbrk;
   else
      retval[0] = (register_t)oldbrk;
 
   return 0;

I need some information about this ctob(vm->vm_dsize). What is it for?
As I understood vm->vm_daddr is the current break value, why do we add
something to it?

-- 
Emmanuel Dreyfus.  
Avec Windows 3.1 ils etaient au bord du gouffre...
Avec Windows 95 ils ont fait un grand bon en avant.
p99dreyf@criens.u-psud.fr