Subject: Re: bad kdump output with 64bit syscalls under 32bit emul
To: Nicolas Joly <njoly@pasteur.fr>
From: David Laight <david@l8s.co.uk>
List: tech-kern
Date: 10/31/2007 22:20:09
On Wed, Oct 31, 2007 at 02:11:18AM +0100, Nicolas Joly wrote:
> 
> Hi,
> 
> While tracing some 32bit linux process on my -current NetBSD/amd64
> workstation, i noticed that kdump(1) show an incorrect number of
> arguments for some syscalls.
> 
>   4190      1 exit     CALL  close(3,3)
>   4190      1 exit     RET   close 0
> 
> After looking at it, i discovered that kdump(1) argument count is bad
> when dealing with 64bit syscalls under 32bit emulation.
> 
> 6       NOARGS  { int sys_close(int fd); }
> 
> The problem is that ktrace argsize depends on syscall args struct
> size, which depends on syscallarg macro definition. this size which
> should be, under compat linux32, a multiple of register32_t. But with
> a 64bit syscall, it will be a multiple of register_t; leading to an
> argsize two times larger than expected.
> 
> It looks like that 64bit syscalls should not be used directly under
> 32bit emulation in order have correct kdump output.
> 
> Unless there is a better way to fix it, that i'm currently
> missing. I'll plan to fix this by replacing all native syscalls with
> the netbsd32 equivalents, likewise for linux32 vs. linux calls.

I've looked at the code....
It really shouldn't work at all.
If you look in src/sys/compat/linux32/arch/amd64/linux32_sysent.c
there shouldn't be any structures that don't contain '32'.

For netbsd the syscall argument structures have to match the stack frame
for the user function call.  Each argument is assumed to promoted to
(at least) the size of a register_t (traditionally 32 bits).
But if you look at the i386 code a chunk of user stack is copied into
the kernel 'args' buffer.

However Linux passes the arguments in registers (probably to save the
copyin).

Consider what happens for the close() system call under linx32 (amd64):
1) userspace puts the fd into %ebx
2) system call entry has args of 'struct sys_close_args' which is 8 bytes.
3) args[0] = %ebx (from user trap frame)
4) args[1] = %ecx (from user trap frame)
5) sys_close() is called, passing args[] which is register32_t [] where
   is expects register_t [] (aka uintr64_t []).
6) A 64bit value for 'fd' would be collected from args[0]:args[1], but
   fortunately it is a 32bit arg followed (since we are little endian)
   by 32bit pad.
7) The correct fd is closed!

If there were (supposed to be) 2 arguments, then the 2nd one would be
garbage.

ktrace traces the syscall 'as seen' by the syscall entry code - which is
why it traces two arguments.

So the linux32 syscalls.master files need fixing to call the
netbsd32_sys_xxx() functions and never sys_xxx() for anything with
arguments.

	David

-- 
David Laight: david@l8s.co.uk