tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: zfs crash on amd64




On Nov,Saturday 7 2009, at 10:05 AM, David Laight wrote:

On Mon, Nov 02, 2009 at 09:10:44AM +0100, Adam Hamsik wrote:

I talked with joerg@ and he
suggested that it can be some problem with stack overflow on amd64.
This problem can be seen on i386 on machines with >3Gb.

NetBSD i386 won't use memory that gets mapped above 4GB - so I don't
suppose you meant i386 did you??

If the problem doesn't happen when the system has < 3GB memory
it is more likely to do with some code failing to handle the phyaddr.

I'm using DIAGNOSTIC modules build with -O0 -g.

You really don't want to use -O0, it will cause the code to be
much, much larger, run much, much slower and use more stack.
If it hides any bugs, when you change to -O2 for production use
you'll have to debug the code again.

Otherwise gcc will inline functions for me makes almost impossible to find out what is going on.


It should be trivial to look at the code for the active functions
to find the size of their stack frames. Easiest using objdump and
reading the asm.
You also need to check any called functions.

Yes I saw results of something like this and these functions were not there.

If you are exploding the stack by a small amount - then calling
printf might be that last straw!

I'd look at the entire data areas involved (in ddb) to see how
far back the corruption goes.  It may be that the somthing
has got overwritten - if you can find the bounds of the
overwrite, and maybe recognise the contents, you stand 1/2 a chance.

10 lines of code above panic line it worked just fine [1] panic [2]. What I don't understand is why this code works on i386 and doesn't on amd64.

[1] 
http://nxr.aydogan.net/xref/src/external/cddl/osnet/dist/uts/common/fs/zfs/sha256.c#87
[2] 
http://nxr.aydogan.net/xref/src/external/cddl/osnet/dist/uts/common/fs/zfs/sha256.c#97
Regards

Adam.



Home | Main Index | Thread Index | Old Index