tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: FS corruption because of bufio_cache pool depletion?
On Tue, Jan 26, 2010 at 03:32:23PM +0100, Manuel Bouyer wrote:
> Can you give more details on the corruption ?
> Was it only directory entries that were corrupted, or did you notice
> corruptions in the data block too ?
I was seeing corruption in data blocks too. That's what I meant, when I
mentioned corrupt CVS/Root files. Fsck complained about directories that
were corrupted right at the start of the data block. I think I didn't save
the error messages. But "." and ".." were corrupt or missing.
I have a netbsd-3/Xen 2 based server that runs on the same hardware and we
have seen FS corruption in a particular domU on that system taqt seems to be
related to the file system running out of space. That's what the co-admin
running that domU tells me anyway. But I haven't seen the damage or the
error messages in the domU personally.
> > raid1: IO failed after 5 retries.
> > cgd1: error 5
> > xbd IO domain 1: error 5
>
> It seems raidframe doesn't do anything special for memory failure.
Greg tells me that raidframe does retry several times. And the above error
indicates that it retried 5 times.
Note that I only got the above message exactly once. But the pool stats
indicated several hundred allocation failures.
I am contemplating collecting stack traces when getiobuf can't get a buf from
the pool and maybe checking that it does always get a buf when it is called
with waitok==true.
I wonder if the b_iodone issues you are investigating have an impact on this.
--chris
Home |
Main Index |
Thread Index |
Old Index