Subject: Re: kernel message: Data modified on freelist
To: None <current-users@netbsd.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: current-users
Date: 02/28/2000 17:04:45
On Mon, Feb 28, 2000 at 10:55:51PM +0100, Markus Kilbinger wrote:
> >>>>> "Thor" == Thor Lancelot Simon <tls@rek.tjls.com> writes:
> 
>     >> > Can anybody explain to me the following kernel message I get
>     >> > on one of our NetBSD/i386 v1.4.2 machine:
>     >> >
>     >> >   Data modified on freelist: word 6 of object 0xc041f000 size
>     >> >   80 previous type temp (0xdead9eef != 0xdeadbeef)
>     >> >
>     >> > Something severe? Hardware/RAM problem?
>     >>
>     >> Probably a software bug.... I see a similar message if I bring
>     >> my firewall up and down a few times (ip filter...) on NetBSD
>     >> 1.4.1.
> 
>     Thor> Not in this case, I don't think.
> 
>     Thor> 0xdead9eef is only one bit (out of 32) away from 0xdeadbeef.
>     Thor> I think this one's a hardware error -- do you have ECC
>     Thor> memory?
> 
> hmm, the bios boot survey says: Both memory banks contain EDO ram. No
> ECC mentioned here. -> How to find out?

Well, one thing you can _try_ is to count the chips.  If you have eight per
side, or a multiple thereof, it's not ECC.  If you have nine, it is.  If
you have some other number, you can't tell for certain.

ECC EDO memory was rather uncommon.  You probably don't have ECC memory.

Single-bit errors _are_ fairly common, in memories of the size of most
of today's.  It's not surprising that you're experiencing them; this is
why you should use ECC memory!

(note that *this* *particular* single-bit error was detected by debugging
 code in the kernel, but it's quite likely that you've suffered others
 which were not.)

-- 
Thor Lancelot Simon	                                      tls@rek.tjls.com
	"And where do all these highways go, now that we are free?"