Subject: Re: kern/33986: UFS_DIRHASH causes rampant kernel memory corruption
To: None <,,>
From: David Brownlee <>
List: netbsd-bugs
Date: 07/16/2006 00:35:02
The following reply was made to PR kern/33986; it has been noted by GNATS.

From: David Brownlee <>
Subject: Re: kern/33986: UFS_DIRHASH causes rampant kernel memory corruption
Date: Sun, 16 Jul 2006 02:33:33 +0100 (BST)

 On Wed, 12 Jul 2006, wrote:
 > A typical symptom is a panic due to junk pointer dereference in UDP
 > input or (in a kernel with FAST_IPSEC) a bad attempt to zeroize a
 > nonexistent ESP or AH key due to corruption of a security association
 > data structure.  Another common panic is in the ipfilter rule matching
 > code.  Many of these code paths share the property of invocation via
 > the soft network "interrupt".  However, we have observed panics throughout
 > the kernel always due to uvm_fault (-> 0xe) on a kernel address.  Adjusting
 > other kernel options (e.g. removing FAST_IPSEC or substituting pf for ip)
 > may make the problem occur less _often_, but it still occurs, and the
 > symptom is still the same: page fault in supervisor mode due to a corrupted
 > pointer in a kernel datastructure, wherever in the kernel it may occur.
  	This _could_ be unrelated, but it feels not.
  	I have very similar symptoms (panic in uvm_fault (-> 0xe)
  	on a kernel address) on two systems without UFS_DIRHASH.
  	Both tend to panic overnight, sometimes in 'find'. One
  	tends to be running large scale rsync's and the other
  	postgres backup & rsyncs. Both pass memtester without issue.
  	Both were running relatively heavily tuned 3_STABLE kernels,
  	the same kernels being in use on around fifteen other boxes,
  	of which about eight were close to identical hardware.
  	On one machine I switched to a recent current GENERIC + PF
  	and it still happened. In no case has UFS_DIRHASH been in
  	any kernel. One common factor could be PF.
  	I would go along with the theory that UFS_DIRHASH is
  	triggering some extant issue elsewhere in the kernel.
  	I seem to recall all occurances have been since 3.0, so
  	just for reference I'm going to switch the 'most commonly
  	failing' box to the GENERIC 3.0 release kernel to see if
  	it still happens.
  		David/absolute       -- No hype required --