NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: swap space in file on inconsistent file system



On 14 June 2018 at 16:46, Steve Blinkhorn <steve%prd.co.uk@localhost> wrote:
> You wrote:
>>
>> On 7 June 2018 at 14:03, Steve Blinkhorn <steve%prd.co.uk@localhost> wrote:
>> > I have a remote server (about to be replaced, but still in service and
>> > needs to stay that way until a replacement is fully commissioned) that
>> > has just developed a single bad sector.  The result has been that
>> > automatic backups using rsync have failed, and manual intervention is
>> > needed.
>> >
>> > There are also numerous sleeping processes that refuse to be killed,
>> > almost all in the 'tstile' state (this is i386 7.0).
>>>snip<<
>> > How should I proceed?
>>
>> First action might be to add a --exclude to the rsync (or move the
>> affected file to a different location on the filesystem excluded from
>> rsync).
>>
>> You could work out the affected block and dd zeros to it via the raw
>> device, but if the system is going away I'd probably not worry about
>> that.
>>
>> Other questions which might affect approach include:
>> - How long before the new system is deployed
>> - Do you know if the system would reboot cleanly
>> - Is the root filesystem clean
>>
>> David
>
>
> The root filesystem is clean, but /var is not.   I'm arranging a new
> colo provider for the replacement servers after shockingly bad service
> from Easynet/Interoute (now GTT) - they emailed me today to say they
> have no record of our having colo space with them, but that they are
> "progressing internally" our request to replace our servers with new
> ones, two and a half *months* since we had to remove one after it
> failed.

Thats... not terribly fast service :)

> I am calculating the risks associated with a reboot, and contemplating
> editing /etc/fstab so that /var  and /opt (where the bad sector is)
> are not fsck'd at reboot.  If it drops down to single-user mode I have
> no way of recovering the situation (no remote console), so for the
> time being I'm nursing the system along - and to be fair to it it is
> running normally from a user's point of vie.

I remember having a pair of colo x86 servers when serial ports were a
thing and having two cables between com0<->com1 for remote console :)

I would be tempted to mark opt as noauto as well as fsck pass 0 - if
the system reboots you will need to manually login to mount/fsck it,
but then you can deal with any fallout.

For /var one option (if you have space) would be to copy everything
you need from /var to a new /var2 on root, then comment out /var mount
in fstab, and 'mv var var-old; mv var2 var' so you could safely
reboot, but it may be safer to leave well alone.

One useful tool to keep to hand is a USB key with a standard install
that runs dhcpcd and sshd (and optionally openvpn back to a known
server), so as long as the BIOS is set to boot USB first and you can
get someone to plug it in you always have a remote accessible fallback
boot option

David


Home | Main Index | Thread Index | Old Index