Subject: Re: port-alpha/35448: memory management fault trap during heavy
To: None <gnats-bugs@NetBSD.org>
From: Michael L. Hitch <mhitch@lightning.msu.montana.edu>
List: netbsd-bugs
Date: 01/29/2007 11:09:01
On Mon, 22 Jan 2007, Michael L. Hitch wrote:

> fails, so it's a little hard to figure out where it came from.  I'm going
> to start groveling through the stack myself to see if I can dig out the
> parameters to the in4_cksum() call, and if I can follow the traceback
> manually.

   OK, I've dug out more information from the raw stack dump.  I located 
the address of the mbuf and found that it has the same bad address in 
mh_data:

(gdb) print (struct mbuf)*0xfffffc000ef7be18
can not access 0x8266000, invalid translation (invalid L2 PTE)
can not access 0x8266000, invalid translation (invalid L2 PTE)
can not access 0x8266000, invalid translation (invalid L2 PTE)
can not access 0x8266000, invalid translation (invalid L2 PTE)
can not access 0x8266000, invalid translation (invalid L2 PTE)
can not access 0x8266000, invalid translation (invalid L2 PTE)
$2 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0,
     mh_data = 0xfffffe0108266000 <Address 0xfffffe0108266000 out of 
bounds>,
     mh_owner = 0x4e4f5a414d412d58, mh_len = 4096, mh_flags = 67108865,
     mh_paddr = 251117080, mh_type = 1}, M_dat = {MH = {MH_pkthdr = {
         rcvif = 0xfffffe000005a080, tags = {slh_first = 0x0}, len = 188,
         csum_flags = 0, csum_data = 0, segsz = 0}, MH_dat = {MH_ext = {
can not access 0x8266000, invalid translation (invalid L2 PTE)
can not access 0x8266000, invalid translation (invalid L2 PTE)
can not access 0x8266000, invalid translation (invalid L2 PTE)
can not access 0x8266000, invalid translation (invalid L2 PTE)
can not access 0x8266000, invalid translation (invalid L2 PTE)
can not access 0x8266000, invalid translation (invalid L2 PTE)
           ext_buf = 0xfffffe0108266000 <Address 0xfffffe0108266000 out of 
bounds>, ext_fr$
           ext_arg = 0xfffffe000c617cb8, ext_size = 4096,
           ext_type = 0xfffffc0000a62558, ext_nextref = 0xfffffc000ef7b118,
           ext_prevref = 0xfffffc000ef7a218, ext_un = {
             extun_paddr = 14733978372531027968, extun_pgs = {

   On a whim, I took a look at the data located at 0xfffffe0008266000 and 
found what looks like data that might be expected, and Aaron confirmed 
that the data was part of a mailbox file that was being synched.  So it 
looked like something had corrupted the address used by the mbuf.  I 
followed the stack back to nfs_writerpc, which can use the address of data 
being sent as the external data address for the mbuf.  I dug out the 
address of the uio and iovec structures used at that point and found:

(gdb) print (struct uio)*0xfffffe000c617e70
$8 = {uio_iov = 0xfffffe000c617e60, uio_iovcnt = 1, uio_offset = 102400,
   uio_resid = 18446744069414588416, uio_rw = UIO_WRITE,
   uio_vmspace = 0xfffffc0000abc018}
(gdb) print (struct iovec)*0xfffffe000c617e60
$9 = {iov_base = 0xfffffe0108267000, iov_len = 18446744069414588416}
(gdb) x/2gx 0xfffffe000c617e60
0xfffffe000c617e60:     0xfffffe0108267000      0xffffffff00001000

   The buffer address in iov_base is corrupt as well.  In addition, the 
iov_len field appears corrupted.

   Following the stack back further, I get to nfs_doio and get the address 
of the struct buf that was used to generate the uio/iovec data:

(gdb) print (struct buf)*0xfffffc00052b8dc0
$3 = {b_u = {u_actq = {tqe_next = 0xdeadbeef, tqe_prev = 
0xfffffc00052b88b8},
     u_work = {wk_entry = {sqe_next = 0xdeadbeef}}}, b_interlock = {
     lock_data = 86745072}, b_flags = 85, b_error = 0, b_prio = 0,
   b_bufsize = 8192, b_bcount = 8192, b_resid = 8192, b_dev = 4294967295,
   b_un = {
     b_addr = 0xfffffe0008266000 "ntent-Transfer-Encoding:Message-ID;\n 
b=T2nY8PninSOLy9W$
   b_iodone = 0xfffffc00005bd600 <uvm_aio_biodone>,
   b_proc = 0xfffffc0000abc4a0, b_vp = 0xfffffc000bea53c0, b_dep = {
     lh_first = 0x0}, b_saveaddr = 0x0, b_fspriv = {
     bf_private = 0xfffffc00052b95a8, bf_dcookie = -4397959768664}, b_hash 
= {
     le_next = 0x16, le_prev = 0x0}, b_vnbufs = {le_next = 0x87654321,
     le_prev = 0x4}, b_freelist = {tqe_next = 0x0,
     tqe_prev = 0xfffffe0000263700}, b_lblkno = 0, b_freelistindex = 0}

   Lo and behold, it has the correct address of the data!  So somwhere 
between nfs_doio() and nfs_writeprc(), the iov_base and iov_len values
get clobbered (in an apparently fairly consistant way).

   Since the bad address was easy to check for, I inserted a number of 
KASSERT() statements in nfs_doio(), nfs_doio_write, and nfs_writerpc().
I was able to induce this failure on my own alpha at this point.  I found 
that the address was good at the entry of nfs_writerpc(), but had been 
corrupted at the start of the loop sending out the data.  This seemed odd,
since there didn't appear to be anything that would cause the type of 
corruption I was seeing.  While trying to figure out where some of the
local variables in nfs_writerpc() were located on the stack, I noticed 
there was a 'retry:' label before the output loop.  Finding where that 
label was used shed some light on things.  Certain conditions (which I'm 
not too clear on, since I don't understand NFS all that well) would cause 
a resend of the entire data buffer, and if that clobbered the data address 
and length, would result in what I was seeing.  Indeed, that was the case;
a few more KASSERT() statements showed that the UIO_ADVANCE() at line 1547 
of nfs_vnops.c was clobbering the iovec data.

   Closer examinination of what UIO_ADVANCE() was doing, and examination of 
the generated code show what the problem was.

   The alpha has 64 bit pointers, and the iov_len values was also 64 bits. 
The variable backup used to adjust the iovec data is an unsigned 32 bit 
value.  The changes for version 1.225 appear to have intruduced a problem 
that only showed up on the alpha.  Prior to that, the unsigned value of 
'backup' was being subtracted from iov_base, and added to iov_len.  In 
version 1.225, that was changed to use the macro UIO_ADVANCE() and passing 
a negated value of 'backup' to the macro.  The compiler thus negated the 
32 bit unsigned value of 'backup' and zero-extended the result to 64 bits 
which was added to iov_base, and subtracted fro iov_len. resulting in the 
clobbered values.

   Changing the UIO_ADVANCE() to a UIO_RETREAT() which passed 'backup' 
directly and subtracted that from iov_base, and added it to iov_len gave 
me a kernel which did not crash when nfs_writerpc() resent the data.  I've 
also just verified that simply making 'backup' a signed 32 bit also works 
using the UIO_ADVANCE() macro.

---
Michael L. Hitch			mhitch@montana.edu
Computer Consultant
Information Technology Center
Montana State University	Bozeman, MT	USA