Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Netra X1 hardware flakiness



    Date:        Sun, 9 May 2010 16:13:09 -0400
    From:        Chris Ross <cross+netbsd%distal.com@localhost>
    Message-ID:  <97F01E34-B12B-439D-9832-ED5F0D126473%distal.com@localhost>

  |    Looking at the crash below, clearly, disk is an issue and worthy of  
  | analysis.

What I see from your output is that the filesystem on the disk in question
is clearly very badly broken - I see no immediate evidence that the disk
itself has any issues.

Of course, something caused the filesystem to get messed up that way
(the "bad block" messages you're seeing are filesystem errors, not I/O
errors, if you had I/O errors you'd see hardware related messages,
bad crc, or retry messages, or similar, and you reported none of those).
Errors doing the panic dump (the "dma error" can be caused by lots of
things - a panic rarely happens at a "nice" time, and the assumptions made
by the panic code are sometimes not met by the rest of the system at the
time of the panic - so I'd ignore that one as a symptom of anything.)

What caused the filesystem to get so badly messed up could be almost
anything - and bad hardware of one kind or another is a possibility (the
disk, or the power supply, or the motherboard, or cables, or ... almost
anything).

It might have also just been operator error (using a filesystem without
ensuring that fsck is happy with it first, or overlapping filesystems, or ...)

You need a single user mode (or at least unmounted filesystem mode) fsck
of that filesystem, and let it correct everything it finds that is wrong.
It will delete (probably many) files.   After that, if you still have
future problems with that disk, then start investigating the hardware.

You should probably also install sysutils/smartmontools from pkgsrc
(after fixing all filesystem problems) and see if it reports any problems
with your drive - a 40GB drive is going to be old by now, so it certainly
might be failing - but it should be new enough to support SMART.  That is,
if that package works on sparc systems, I've only ever used it on i386.

kre



Home | Main Index | Thread Index | Old Index