port-arm32: Re: ffs_update error

Subject: Re: ffs_update error
To: None <emw4maba@rghx50.gp.fht-esslingen.de>
From: Mark Brinicombe <amb@physig4.ph.kcl.ac.uk>
List: port-arm32
Date: 08/02/1996 18:27:30
>What does this error message mean:
>
>ffs_update: bad indirect addr on entry (1): inode=3408 4194304/00400000
>ip=f1501200 adr=f15012b0
>inode vp: type VBLK, usecount 118, writecount 0, refcount 4, tag VT_UFS, ino
>3383, on dev 24, 0
>
>It first showed up with bsd-4444. It is reported during file accesses, in this
>case it came when I shut down the machine (while unmounting /usr).
>It happened with different inodes. The files to which these inodes correspond
>are all perfectly readable. fsck does not report any problems.
>I tried to locate the error message in the kernel sources, but with no luck.
>
>So what does the whole thing mean?
>Hope somebody can enlighten me,

Ok this means that a file on the HD was about to be trashed but the problem was
caught.

There is a ellusive bug in RiscBSD that was at one point thought to be linked
with the vnode bug (where /usr could end up in lost+found).
In fact there were two separate bugs.

1. One of the indirect block address fields in a inode gets trashed with the
value 0x00400000

2. the kernel could have waiting on vnodes.

The second one meant rebooting with out syncing and thus forcing an fsck on
reboot. This fsck would find all the trashed inodes and remove them. If the
inode trashed happened to be inode 2 then the entire filesystem would end up in
lost+found.

The vnode bug was fixed. This meant that most of the time the discs will be
unmounted cleanly and thus not checked next boot. This meant that trashed inode
could remain.

Now for the inodes... For the majority of files the indirect inode address that
gets trashed is normally zero (only non zero on really big files) This means
that the inode can be trashed and that the files is still fine (This indirect
address will not be used as the file is too small).
The only way you know the inode is bad is either via fsck checking all the
inodes or if you get a bad block message when you remove the file or truncate
it. (You may have seen odd messages about bad block 4194304)

Now something is the kernel occasionally is writing 0x00400000 (always this
value) into one of the inode indirect addresses. So far I have not been able to
trace it. I tend to only hit the problem when there has been a lot of disc
activity (e.g. running /etc/daily)

The bsd-4444 kernel has a bit of code added to the ffs_update() function.
It checks the inode that ffs_update wants to write to disc and prints a warning
if it finds 0x00400000 in one of the indirect block addresses.
If the inode is actually going to be written to disc the kernel will patch the
inode to 0 prior to writing. (Note ffs_update does not always write the inodes,
a patched message will be printed if the patch takes place).

Note: 0x00400000 is not likely to be a valid address.

One could question how safe patching the inode is ...
If not patched then fsck will remove it when it gets the chance.
Only indirect block address 0x00400000 will ever be patched.
0x00400000 won't be valid on IDE drives. (If you have a RAID then maybe it
could be valid)
If the address was nonzero before the corruption then the inode is trashed
anyway so zeroing the address will not matter (fsck would pick this up as well)
If the address was zero before the corruption then the inode has been correctly
fixed.

Since adding this patch I have not lost a single file due to bad inodes.

There is a question as to whether this is a MD or MI bug. The vnode problem was
in fact MI. A while ago I believe people reported getting bad block errors from
block 4194304 on the i386 port so this trash may be MI as well.

More info when I get it

Cheers,
					Mark

-- 
Mark Brinicombe				amb@physig.ph.kcl.ac.uk
Research Associate			http://www.ph.kcl.ac.uk/~amb/
Department of Physics			tel: 0171 873 2894
King's College London			fax: 0171 873 2716