Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: continuous crashing 5.99.39



2010/10/5 Jens Rehsack <rehsack%googlemail.com@localhost>:
> 2010/10/5 Juergen Hannken-Illjes <hannken%eis.cs.tu-bs.de@localhost>:
>> On Tue, Oct 05, 2010 at 05:47:42PM +0200, Jens Rehsack wrote:
>>> 2010/10/5 Juergen Hannken-Illjes <hannken%eis.cs.tu-bs.de@localhost>:
>>> > On Tue, Oct 05, 2010 at 04:04:02PM +0200, Jens Rehsack wrote:
>>> >> 2010/10/5 Juergen Hannken-Illjes <hannken%eis.cs.tu-bs.de@localhost>:
>>> >> > On Tue, Oct 05, 2010 at 03:17:22PM +0200, Jens Rehsack wrote:
>>> >> >> 2010/10/5 Jens Rehsack <rehsack%googlemail.com@localhost>:
>>> >> >> > 2010/10/5 Juergen Hannken-Illjes 
>>> >> >> > <hannken%eis.cs.tu-bs.de@localhost>:
>>> >> >> >> On Tue, Oct 05, 2010 at 12:27:44PM +0200, Jens Rehsack wrote:
>>> >> >> >>> 2010/10/5 Juergen Hannken-Illjes 
>>> >> >> >>> <hannken%eis.cs.tu-bs.de@localhost>:
>>> >> >> >>> > On Tue, Oct 05, 2010 at 10:27:05AM +0200, Jens Rehsack wrote:
>>> >> >> >>> >> Hi,
>>> >> >> >>> >>
>>> >> >> >>> >> every night when I load my iPod on my NetBSD laptop, the 
>>> >> >> >>> >> machine is rebooted
>>> >> >> >>> >> at the next morning, This happens now 4 times, I rate this as a
>>> >> >> >>> >> pattern meanwhile.
>>> >> >> >>> >> But there are no cores available at the morning.
>>> >> >> >>> >>
>>> >> >> >>> >> But today, when I was going to start X, it dumps (see gdb.txt).
>>> >> >> >>> >>
>>> >> >> >>> >> I tried to "vi gdb.txt" before sending, and it crashs again 
>>> >> >> >>> >> (but it
>>> >> >> >>> >> doesn't crash when
>>> >> >> >>> >> doing gdb, mount, fsck, cp, ...). The backtraces of the dumps 
>>> >> >> >>> >> are
>>> >> >> >>> >> attached in vi1.txt
>>> >> >> >>> >> and vi2.txt.
>>> >> >> >>> >>
>>> >> >> >>> >> Is there anything I can try? Currently I assume the last iPod 
>>> >> >> >>> >> crash
>>> >> >> >>> >> corrupts something
>>> >> >> >>> >> and a rebuild/reinstall of the base system hopefully solves it.
>>> >> >> >>> > [snip]
>>> >> >> >>> >> #1  0xffffffff8044403d in panic (
>>> >> >> >>> >>     fmt=0xffffffff805c3ca0 "ffs_valloc: dup alloc")
>>> >> >> >>> >>     at /usr/src/sys/kern/subr_prf.c:302
>>> >> >> >>> >
>>> >> >> >>> > At least one of your file systems is corrupt.  Any errors from 
>>> >> >> >>> > fsck while
>>> >> >> >>> > booting?
>>> >> >> >>>
>>> >> >> >>> Well, I checked the /var/log/messages and /var/run/dmesg.log, 
>>> >> >> >>> nothing in
>>> >> >> >>> there. Than I rebooted (shutdown -r now) and booted in single 
>>> >> >> >>> user mode,
>>> >> >> >>> doing an fsck -y (reported all ffs filesystems are clean) and a
>>> >> >> >>> fsck -y /dev/rwd1e (my ext2 shared disk for data exchange between 
>>> >> >> >>> NetBSD
>>> >> >> >>> and Linux and Win32). This volume was not clean unmounted (as 
>>> >> >> >>> usual after
>>> >> >> >>> a crash, but no errors).
>>> >> >> >>>
>>> >> >> >>> After all filesystems were marked clean, I continued the boot 
>>> >> >> >>> process and
>>> >> >> >>> tried again to vi one of above text files. Same panic, same 
>>> >> >> >>> backtrace.
>>> >> >> >>>
>>> >> >> >>> Jens
>>> >> >> >>
>>> >> >> >> Please add -f to fsck (like fsck -y -f ...) to force a check on 
>>> >> >> >> file systems
>>> >> >> >> currently marked clean.
>>> >> >> >
>>> >> >> > All three without errors. But kernel-rebuild forced panic again :(
>>> >> >>
>>> >> >> In single-user-mode (when dump device wasn't set), I reach the ddb
>>> >> >> when compiling
>>> >> >> sources. Until I'm going home (around 21:00), I can use the ddb to
>>> >> >> find out more, if
>>> >> >> there is something I can do.
>>> >> >>
>>> >> >> At home, I'm switching the disk to Ubuntu to continue my current work.
>>> >> >>
>>> >> >> /Jens
>>> >> >
>>> >> > Before the system panics with "ffs_valloc: dup alloc" it should print
>>> >> >
>>> >> >        dmode %x mode %x dgen %x gen %x
>>> >> >        size %llx blocks %llx
>>> >> >        ino %llu ipref %llu
>>> >> >
>>> >> > What do you get here? How do you mount (sync, async, log)?
>>> >>
>>> >> dmode 8180 mode 8180 dgen 4e gen 4e
>>> >> size 2b3 blocks 4
>>> >> ino 3735240 ipref 3734656
>>> >>
>>> >> mount is default:
>>> >> /dev/wd0a             /       ffs     rw      1 1
>>> >> /dev/wd0e             /usr    ffs     rw      1 2
>>> >> /dev/wd0f             /var    ffs     rw      1 2
>>> >
>>> > Looks ok, beside the fact that ffs_hashalloc() should not return an
>>> > allocated inode.  Sorry, have no more ideas.
>>>
>>> What options do I have to come out of this? Format disks and complete
>>> reinstall? Could it be a harddrive or memory issue?
>>
>> If you need a system to work on you should clear the disk and install
>> the stable release 5.0.2.
>
> I don't mind to have an unstable system - to report errors. But I can't fix
> most of them on my own, even if I'm willing to learn. Sure, I could use
> 5.0.2 - but this is one less tester for -CURRENT :)

I played a bit with the machine yesterday in hotel:

In "normal" multi-user-mode (/tmp, /var/tmp and /usr/obj mounted rw as tmpfs:):
1) vi works again (after it cried that it can't create /var/tmp/vi.recover/...)
2) cvs up (in /usr/src) works fine and updated one file (can't say
anymore which one it was ...)
3) ./build.sh -U tools crashed instantly at the line "How does the
compiler creates executables" (near top, 3rd or 4th test).

In single user mode, / mounted rw, /usr and /var mounted ro, /tmp,
/var/tmp and /usr/obj mounted rw as tmpfs:
1) ./build.sh -U tools kernel=BERT works fine
2) ./build.sh -U distribution failes, because it wants to touch files
below /usr/src (/etc/mk.conf defines
   MAKEOBJPREFIX etc., what is named in /usr/src/BUILDING)
3) mount -u /usr
4) ./build.sh -U distribution still runs (can tell more evening/tomorrow)

Summarizing so far:
- it seems to releated to /usr and /var filesystems (IIRC, both are
ffs2 while / is ffs1)
- it doesn't seem to happen on each file creation/modification (I
didn't try whether I can touch a new file without crashing the
machine)
- it doesn't happen when compiling the toolchain contained in
distribution, but it happens when compiling the /usr/src build
toolchain.

When no new ideas arriving, I'll try next to backup /usr and /var,
delete both file systems and recreate them and restore (weekend).

/Jens


Home | Main Index | Thread Index | Old Index