tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: alignement or compiler bug?

> Another possibility: *resid is, like max_write, signed, and is
> negative.  (If ps->ps_max_write is less than sizeof(*fwi), max_write
> could be negative, but your "bigger than max_write" makes it sound as
> though that's not it.)

Everything is size_t, which is unsigned AFAIK
size_t max_write;
size_t *resid;
size_t data_len;
> I'm assuming data_len is of unsigned type.  Try printing it out in hex.
> Are the low bits zero?  If not, this increases the plausibility of the
> "corruption" theory, because of the clearing of the low bits if it's
> larger than PAGE_SIZE.

In a core dump left from a previous attempt, data_len is 0xbb5d7050 so
it would say it is really corrupted.

> - Look at the assembly/machine code.  See if it looks broken.  (What
>    hardwarwe is this on?  If it's one I know, I can have a look.)

This is i386. You can build it using  pkgsrc/filesystems/perfused
The mess happens in perfuse_node_write()

Seeing the bug live is a bit more complicated. I mount a glusterfs
volume (pkgsrc/filesystems/glusterfs) and do tar -xzvf src.tgz in it.
The bug pops up after about half an hour.

Here is the assembly leading to memcpy. The 0x28 is sizeof(*fwi), which
suggests a correct (fwi + 1)

0xbbbe14dc <perfuse_node_write+460>:    mov    %eax,0x20(%esi)
0xbbbe14df <perfuse_node_write+463>:    lea    0x28(%esi),%edx
0xbbbe14e2 <perfuse_node_write+466>:    mov    0x10(%ebp),%eax
0xbbbe14e5 <perfuse_node_write+469>:    add    0xffffffe8(%ebp),%eax
0xbbbe14e8 <perfuse_node_write+472>:    push   %edi
0xbbbe14e9 <perfuse_node_write+473>:    push   %eax
0xbbbe14ea <perfuse_node_write+474>:    push   %edx
0xbbbe14eb <perfuse_node_write+475>:    call   0xbbbdfd90 <memcpy@plt>
> - Leave the "data" variable there, including the code you added to set
>    it, but still pass fwi+1 to the memcpy.

I tried passing data, it still crashed. It seems to be the test that
saves my day:
 if (data != ((char *)fwi) + sizeof(*fwi))

> - If it doesn't break the semantics, make data have static storage
>    duration rather than automatic.

It would break semantics.

Emmanuel Dreyfus

Home | Main Index | Thread Index | Old Index