Port-alpha archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Self baked kernel panics



On Fri, 8 Feb 2008, Michael L. Hitch wrote:

On Fri, 8 Feb 2008, Ede Wolf wrote:

Still, here are the traces. I admit, I have no clue what I am doing, but may
it be helpful. If you need other data, just let me know.

 I sort of know what I'm doing, and it's a good start.

db> x/i 0xfffffc00004ea978
netbsd:isp_start+0x440: stq_u   zero,4(t0)

This is the faulting instruction, 0xfffffc00004ea978 is the PC at the time of the fault.

db> x/i 0xfffffc00004ea978,20
netbsd:isp_start+0x440: stq_u   zero,4(t0)
netbsd:isp_start+0x444: stq_u   zero,c(t0)

 This should help me figure out the corresponding location in the source
code, and provide some clue as to what when wrong. Gcc 4 scatters the code around much more than gcc 3 did, and it can be fun trying to figure out how the source code matches up with the instructions. A gdb version of the kernel with the source files and a kernel core dump file makes that much simpler (using the "list *address" command), but I'm used to doing it the hard way.

  And I think I may have found it.

That sequence of code is zeroing out a structure, and appears to be the MEMZERO((void *) reqp, QENTRY_LEN) about line 3231 in sys/dev/ic/isp.c (although it could also be about line 3214). The structure reqp is actually a local u_int8_t array.

  In the 4.0 source, it's declared like:
        struct ispsoftc *isp;
        u_int16_t nxti, optr, handle;
        u_int8_t local[QENTRY_LEN];
        ispreq_t *reqp, *qep;

   In current it's declared like:
        ispsoftc_t *isp;
        uint32_t nxti, optr, handle;
        uint8_t local[QENTRY_LEN];
        ispreq_t *reqp, *qep;

For current, it's probably on a 4 byte boundary, being preceeded by a pointer and 3 32 bit integers. In 4.0, there are 3 16 bit integers, and the array may easily be allocated on a non-4 byte boundary. Because the MEMZERO() macro is using a pointer to a structure, gcc is going to assume the pointer in the structure is aligned like the structure definition and is free to use quad operations to zero it out.

As for why it works with different compile options, my guess would be that the different options may cause the byte array to be allocated on a different byte boundary, or the compiler may use a different method to zero out the structure. In particular, -Os may replace that with a call to bzero() or memset().

--
Michael L. Hitch                        mhitch%montana.edu@localhost
Computer Consultant
Information Technology Center
Montana State University        Bozeman, MT     USA



Home | Main Index | Thread Index | Old Index