Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: building netbsd-9 2 'sync' processes stuck in 'tstile'



    Date:        Fri, 7 May 2021 20:34:55 -0500 (CDT)
    From:        "John D. Baker" <jdbaker%consolidated.net@localhost>
    Message-ID:  <Pine.NEB.4.64.2105072017320.1246%spike.technoskunk.fur@localhost>

  | I expected as much.  It will be interesting to see what happens.  I've
  | had systems with stuck processes get stuck in the shutdown sequence
  | requiring a hard reset (or forced power-cycle) to recover.

It might go either way.  But when you do shutdown/reboot give it time
(measured in multiple minutes perhaps, no need to wait hours) for it to
possible abandon attempts to sync fully (ie: don't expect it to shutdown
as quickly as normal, but only wait a rational amount of time, not forever).

  | I just ran a full forced 'fsck -yf' on it just prior to these events.
  | That was prompted by CVS failing to clean up a directory.

That seems like an unusual response, using fsck to fix things (I assume
on an ummounted filesystem, otherwise it is definitely wrong) isn't typically
needed - that is required after the system has crashed,  possibly
leaving unsaved updates, which need to be repaired (made consistent
at least).   But as long as the system is still running, nothing is
lost, and the filesystems should all be fine (if not there are far more
serious problems - booting after an unclean shutdown without having done
a fsck can get you into that kind of situation).

cvs is just another userspace program, it does nothing that any other program
cannot also do (it just tends to do a lot of file system operations).
While it certainly may leave behind a mess, fsck isn't the way to fix it,
rather "rm -rf" (of the messy part, or perhaps all) followed by a
new cvs update (or cvs checkout if the whole thing was discarded) should
fix things (save any locally modified files first).

  | I get those
  | from time to time after the near-catastrophic events that prompted
  | kern/55115.  I used to get them frequently.  Now they are less common.
  | The carnage might still have caught the build this time.

First, that PR is apparently fixed now right?   It is still waiting feedback
from you to confirm that.

If the disk controller is still not working properly, then almost anything
is possible.  If it is, then provided everything looks clean to fsck, there
should be nothing which would trigger a kernel locking problem - those tend
to be more caused by internal race conditions (sometimes by little used error
paths forgetting to release a semaphore).

kre



Home | Main Index | Thread Index | Old Index