netbsd-bugs: Re: port-alpha/35448: memory management fault trap during heavy

Subject: Re: port-alpha/35448: memory management fault trap during heavy
To: None <port-alpha-maintainer@netbsd.org, gnats-admin@netbsd.org,>
From: Michael L. Hitch <mhitch@lightning.msu.montana.edu>
List: netbsd-bugs
Date: 01/29/2007 18:10:02
The following reply was made to PR port-alpha/35448; it has been noted by GNATS.

From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: port-alpha-maintainer@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, agrier@poofygoof.com
Subject: Re: port-alpha/35448: memory management fault trap during heavy
 network I/O
Date: Mon, 29 Jan 2007 11:09:01 -0700 (MST)

 On Mon, 22 Jan 2007, Michael L. Hitch wrote:
 
 > fails, so it's a little hard to figure out where it came from.  I'm going
 > to start groveling through the stack myself to see if I can dig out the
 > parameters to the in4_cksum() call, and if I can follow the traceback
 > manually.
 
    OK, I've dug out more information from the raw stack dump.  I located 
 the address of the mbuf and found that it has the same bad address in 
 mh_data:
 
 (gdb) print (struct mbuf)*0xfffffc000ef7be18
 can not access 0x8266000, invalid translation (invalid L2 PTE)
 can not access 0x8266000, invalid translation (invalid L2 PTE)
 can not access 0x8266000, invalid translation (invalid L2 PTE)
 can not access 0x8266000, invalid translation (invalid L2 PTE)
 can not access 0x8266000, invalid translation (invalid L2 PTE)
 can not access 0x8266000, invalid translation (invalid L2 PTE)
 $2 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0,
      mh_data = 0xfffffe0108266000 <Address 0xfffffe0108266000 out of 
 bounds>,
      mh_owner = 0x4e4f5a414d412d58, mh_len = 4096, mh_flags = 67108865,
      mh_paddr = 251117080, mh_type = 1}, M_dat = {MH = {MH_pkthdr = {
          rcvif = 0xfffffe000005a080, tags = {slh_first = 0x0}, len = 188,
          csum_flags = 0, csum_data = 0, segsz = 0}, MH_dat = {MH_ext = {
 can not access 0x8266000, invalid translation (invalid L2 PTE)
 can not access 0x8266000, invalid translation (invalid L2 PTE)
 can not access 0x8266000, invalid translation (invalid L2 PTE)
 can not access 0x8266000, invalid translation (invalid L2 PTE)
 can not access 0x8266000, invalid translation (invalid L2 PTE)
 can not access 0x8266000, invalid translation (invalid L2 PTE)
            ext_buf = 0xfffffe0108266000 <Address 0xfffffe0108266000 out of 
 bounds>, ext_fr$
            ext_arg = 0xfffffe000c617cb8, ext_size = 4096,
            ext_type = 0xfffffc0000a62558, ext_nextref = 0xfffffc000ef7b118,
            ext_prevref = 0xfffffc000ef7a218, ext_un = {
              extun_paddr = 14733978372531027968, extun_pgs = {
 
    On a whim, I took a look at the data located at 0xfffffe0008266000 and 
 found what looks like data that might be expected, and Aaron confirmed 
 that the data was part of a mailbox file that was being synched.  So it 
 looked like something had corrupted the address used by the mbuf.  I 
 followed the stack back to nfs_writerpc, which can use the address of data 
 being sent as the external data address for the mbuf.  I dug out the 
 address of the uio and iovec structures used at that point and found:
 
 (gdb) print (struct uio)*0xfffffe000c617e70
 $8 = {uio_iov = 0xfffffe000c617e60, uio_iovcnt = 1, uio_offset = 102400,
    uio_resid = 18446744069414588416, uio_rw = UIO_WRITE,
    uio_vmspace = 0xfffffc0000abc018}
 (gdb) print (struct iovec)*0xfffffe000c617e60
 $9 = {iov_base = 0xfffffe0108267000, iov_len = 18446744069414588416}
 (gdb) x/2gx 0xfffffe000c617e60
 0xfffffe000c617e60:     0xfffffe0108267000      0xffffffff00001000
 
    The buffer address in iov_base is corrupt as well.  In addition, the 
 iov_len field appears corrupted.
 
    Following the stack back further, I get to nfs_doio and get the address 
 of the struct buf that was used to generate the uio/iovec data:
 
 (gdb) print (struct buf)*0xfffffc00052b8dc0
 $3 = {b_u = {u_actq = {tqe_next = 0xdeadbeef, tqe_prev = 
 0xfffffc00052b88b8},
      u_work = {wk_entry = {sqe_next = 0xdeadbeef}}}, b_interlock = {
      lock_data = 86745072}, b_flags = 85, b_error = 0, b_prio = 0,
    b_bufsize = 8192, b_bcount = 8192, b_resid = 8192, b_dev = 4294967295,
    b_un = {
      b_addr = 0xfffffe0008266000 "ntent-Transfer-Encoding:Message-ID;\n 
 b=T2nY8PninSOLy9W$
    b_iodone = 0xfffffc00005bd600 <uvm_aio_biodone>,
    b_proc = 0xfffffc0000abc4a0, b_vp = 0xfffffc000bea53c0, b_dep = {
      lh_first = 0x0}, b_saveaddr = 0x0, b_fspriv = {
      bf_private = 0xfffffc00052b95a8, bf_dcookie = -4397959768664}, b_hash 
 = {
      le_next = 0x16, le_prev = 0x0}, b_vnbufs = {le_next = 0x87654321,
      le_prev = 0x4}, b_freelist = {tqe_next = 0x0,
      tqe_prev = 0xfffffe0000263700}, b_lblkno = 0, b_freelistindex = 0}
 
    Lo and behold, it has the correct address of the data!  So somwhere 
 between nfs_doio() and nfs_writeprc(), the iov_base and iov_len values
 get clobbered (in an apparently fairly consistant way).
 
    Since the bad address was easy to check for, I inserted a number of 
 KASSERT() statements in nfs_doio(), nfs_doio_write, and nfs_writerpc().
 I was able to induce this failure on my own alpha at this point.  I found 
 that the address was good at the entry of nfs_writerpc(), but had been 
 corrupted at the start of the loop sending out the data.  This seemed odd,
 since there didn't appear to be anything that would cause the type of 
 corruption I was seeing.  While trying to figure out where some of the
 local variables in nfs_writerpc() were located on the stack, I noticed 
 there was a 'retry:' label before the output loop.  Finding where that 
 label was used shed some light on things.  Certain conditions (which I'm 
 not too clear on, since I don't understand NFS all that well) would cause 
 a resend of the entire data buffer, and if that clobbered the data address 
 and length, would result in what I was seeing.  Indeed, that was the case;
 a few more KASSERT() statements showed that the UIO_ADVANCE() at line 1547 
 of nfs_vnops.c was clobbering the iovec data.
 
    Closer examinination of what UIO_ADVANCE() was doing, and examination of 
 the generated code show what the problem was.
 
    The alpha has 64 bit pointers, and the iov_len values was also 64 bits. 
 The variable backup used to adjust the iovec data is an unsigned 32 bit 
 value.  The changes for version 1.225 appear to have intruduced a problem 
 that only showed up on the alpha.  Prior to that, the unsigned value of 
 'backup' was being subtracted from iov_base, and added to iov_len.  In 
 version 1.225, that was changed to use the macro UIO_ADVANCE() and passing 
 a negated value of 'backup' to the macro.  The compiler thus negated the 
 32 bit unsigned value of 'backup' and zero-extended the result to 64 bits 
 which was added to iov_base, and subtracted fro iov_len. resulting in the 
 clobbered values.
 
    Changing the UIO_ADVANCE() to a UIO_RETREAT() which passed 'backup' 
 directly and subtracted that from iov_base, and added it to iov_len gave 
 me a kernel which did not crash when nfs_writerpc() resent the data.  I've 
 also just verified that simply making 'backup' a signed 32 bit also works 
 using the UIO_ADVANCE() macro.
 
 ---
 Michael L. Hitch			mhitch@montana.edu
 Computer Consultant
 Information Technology Center
 Montana State University	Bozeman, MT	USA