Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: current & V5 crashes with non-raw sata reads



Point taken,

One question arises,
my understanding of /dev/rxx vs /dev/xx is that the non raw device is doing caching and
accesses from different blocks are first looked up in the cache (which makes them slow for large continuous accesses,
but avoids disks seeks for distributed accesses)
Is that true?  (I thought from sys 7 onwards it was at like that - maybe I'm out of date here :-)

My limited means of testing are that I cannot afford to debug a production system which runs lots of services and with a few tb of storage
on a raid and have that coring before booting.

If you have an idea on how to track that issue I would appreciate this. Debugging memory corruptions,
is something I know how to do in userland applications (valgrind/purify/coverity and efence just to name a few),
but I wouldn't know how to do this on the kernel. I can gdb the kernel, but I doubt I would know where to stop
to see what dangling pointer/failed malloc and the like caused it.

I believe there is such a bug in the system and my interest is to get this fixed, so that I have a stable netbsd!
(I actually like netbsd)

But it seems strange that fixing a wrong behavior in your opinion is to wait a year when the device might be gone.

And why is a dd that illegally causes the system to crash not a particular good test of anything?
I think that is the quickest and clearest way to describe how to reproduce the issue! (let alone that fsck takes 15 minutes to crash the system and dd does that under a second)
I don't know what system calls fsck uses and I was asking for exactly those from guys who understand that architecture better than I.


cheers thilo



Thor Lancelot Simon wrote:
On Mon, Jun 08, 2009 at 10:49:33AM +1000, Thilo Jeremias wrote:
  
Thor,

Thanks for telling me I'm doing silly things (maybe I would like to
focus on the problem?).
    

The problem is what you decided to do.  I accurately described to you
how the bug you reported is most likely to be fixed -- I will be
surprised if within a year block devices for disks aren't removed
from NetBSD entirely.

If you're trying to debug fsck, or a kernel problem that appears
when you run fsck, debug system facilites that fsck actually uses!
Doing I/O through the metadata cache from userspace just forces
the kernel to do a lot of memory allocation and freelist
manipulation and isn't really a particularly good test of anything.

Thor
  


Home | Main Index | Thread Index | Old Index