Subject: Re: filesystem issues after rude powerdown
To: Ignatios Souvatzis <ignatios@cs.uni-bonn.de>
From: Brian <bmcewen@comcast.net>
List: netbsd-users
Date: 08/03/2004 07:50:59
On Tuesday, August 3, 2004, at 05:34 AM, Ignatios Souvatzis wrote:
>
> Then you should run something that listens to the UPS and shutdowns the
> machine cleanly when the time is near. All a battery-powered UPS is
> supposed
> to do is to help you survive short power failures, give you time to
> start
> your diesel generators, or do a clean shutdown. If you insist to
> survive
> longer outages on battery, you need ... bigger batteries.
>
What exists that would let me configure output to the serial port?
This is a headless, non-USB Cobalt Qube, it would have to be something
that would take for input the USB output from the UPS, and then on the
serial console, issue the shutdown commands.
I'd read of issues (likely here) trying to get USB-capable NetBSD boxes
to trigger shutdown based on UPS output; I didn't think that was
working well even on systems with working USB.
>
> Uhm... mapping out bad blocks is a function of modern disks (IDE as
> well
> as SCSI). However, this might be configures off for your driver, or
> might
> only happen when you _write_ them, as the disk can not know what to
> write
> into the remapped blocks when it can't read the original ones.
>
I might have to pull the (IDE) HD out, put it in a desktop, and
reformat the partition using appropriate tools that way. But as you
say, I would have expected bad blocks should get remapped automagically
using the reserved areas.
> Assuming (check that!) that the error message was from the driver, and
> refers to disk block numbers (as from the file system, and refers to
> filesystem sectors), you could try to
>
> umount /tmp (in single user mode, obviously)
>
> /sbin/sysctl kern.rawpartition
>
> if it is 3:
> dd bs=512 count=13 if=/dev/zero seek=756 of=/dev/rwd0d (on i387
>
> if it is 2:
> dd bs=512 count=13 if=/dev/zero seek=756 of=/dev/rwd0c
>
> After that you'll have to "fsck -f" the affected file system.
>
> You do this at your own risk; read the manual pages until you
> understand
> what those commands do. Especially, as you didn't show the original
> error message, I have no idea whether it really referred to disk blocks
> or filesystem sectors (which would be relative to the partition
> boundary,
> and using units!)
>
At bootup, I get this (my system hung again, after bootup I captured
the console output this time):
Starting file system checks:
/dev/rwd0a: file system is clean; not checking
/dev/rwd0f: file system is clean; not checking
wd0: transfer error, downgrading to Ultra-DMA mode 1
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 1 (using DMA data
transfers)
wd0g: error reading fsbn 736 of 736-847 (wd0 bn 4782688; cn 4744 tn 11
sn 43), retrying
bootup fails; I end up in single user mode, running fsck_ffs on
/dev/rwd0g with output:
** Phase 1 - Check Blocks and Sizes
wd0g: error reading fsbn 752 of 736-847 (wd0 bn 4782704; cn 4744 tn 11
sn 59), retrying
wd0: (uncorrectable data error)
wd0g: error reading fsbn 752 of 736-847 (wd0 bn 4782704; cn 4744 tn 11
sn 59), retrying
wd0: (uncorrectable data error)
wd0g: error reading fsbn 752 of 736-847 (wd0 bn 4782704; cn 4744 tn 11
sn 59), retrying
wd0: (uncorrectable data error)
wd0g: error reading fsbn 752 of 736-847 (wd0 bn 4782704; cn 4744 tn 11
sn 59), retrying
wd0: (uncorrectable data error)
wd0g: error reading fsbn 756 of 736-847 (wd0 bn 4782708; cn 4744 tn 12
sn 0), retrying
wd0: (uncorrectable data error)
wd0g: error reading fsbn 756 of 736-847 (wd0 bn 4782708; cn 4744 tn 12
sn 0)wd0: (uncorrectable data error)
CANNOT READ: BLK 736
CONTINUE? [yn] y
wd0g: error reading fsbn 756 (wd0 bn 4782708; cn 4744 tn 12 sn 0),
retrying
wd0: (uncorrectable data error)
[...]
wd0g: error reading fsbn 756 (wd0 bn 4782708; cn 4744 tn 12 sn 0)wd0:
(uncorrectable data error)
wd0g: error reading fsbn 757 (wd0 bn 4782709; cn 4744 tn 12 sn 1),
retrying
wd0g: error reading fsbn 768 (wd0 bn 4782720; cn 4744 tn 12 sn 12)wd0:
(uncorrectable data error)
[...]
THE FOLLOWING DISK SECTORS COULD NOT BE READ: 756, 757, 758, 759, 760,
761, 762, 763, 764, 765, 766, 767, 768,
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
1 files, 1 used, 496302 free (14 frags, 62036 blocks, 0.0%
fragmentation)
MARK FILE SYSTEM CLEAN? [yn] y
***** FILE SYSTEM MARKED CLEAN *****
***** FILE SYSTEM WAS MODIFIED *****
I tried the dd copy (the point being to force write to the blocks, to
see if the IDE drive hardware would remap the bad stuff using reserved
areas, yes?) and the copy reportedly completed successfully, but the
areas remain bad during fsck. (I did overwrite the magic number for the
parition but that's fixable. I guess "disk sector 756" equals "block
736" i.e. the first one of the /tmp partition).
What's the best way to reformat /tmp or rebuild the partition map for
this partition from within NetBSD? Just fdisk it, or is there a more
thorough way to format & test? I didn't notice any I built this
bootable image using a the Cobalt netboot CD from the cobalt-support
area (so I didn't have to set up the partitions and prep them myself).
I could pull the drive, put it in a Win98 desktop, and reformat just
the /tmp partition- but I'm not sure I have any tools that know about
BSD filesystems, I'd have to know how to relabel it properly (it's not
just getting fstab set up properly, is it?)
And now that I look at it, the partitioning is:
Qube: {4} df -k
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/wd0a 54842010 9624912 42474996 18% /
/dev/wd0f 2064766 15876 1945650 0% /var
/dev/wd0g 496303 1 471486 0% /tmp
Qube: {1} fdisk
Disk: /dev/rwd0d
NetBSD disklabel disk geometry:
cylinders: 16383 heads: 16 sectors/track: 63 (1008 sectors/cylinder)
BIOS disk geometry:
cylinders: 16383 heads: 16 sectors/track: 63 (1008 sectors/cylinder)
Partition table:
0: sysid 131 (Linux native)
start 1, size 61488 (30 MB), flag 0x0
beg: cylinder 0, head 0, sector 2
end: cylinder 61, head 0, sector 1
1: sysid 130 (Linux swap or Prime or Solaris)
start 61488, size 525168 (256 MB), flag 0x0
beg: cylinder 61, head 0, sector 1
end: cylinder 581, head 15, sector 63
2: sysid 169 (NetBSD)
start 586656, size 116644752 (56955 MB), flag 0x0
beg: cylinder 582, head 0, sector 1
end: cylinder 588, head 15, sector 63
3: <UNUSED>.
Thanks for help! At this time, I have a machine that boots, runs, but
eventually hang after a couple days, and fails to reboot until I run
fsck_ffs manually, always with the same issues in /tmp.
Brian